Histórico de Commits

Autor SHA1 Mensagem Data
  Tri Dao f907a13187 Tune tile sizes for fwd varlen on Sm80 and Sm86 há 1 mês atrás
  Tri Dao 76f14c61c9 Tune fwd tile sizes for Sm86 and Sm89 há 1 mês atrás
  Tri Dao 659a631f4c Rename bwd classes to include Sm90 suffix há 1 mês atrás
  Tri Dao a901c7eeda Make Sm80 forward pass work with persistent scheduler há 1 mês atrás
  Tri Dao 5171269dab Implement forward pass for Sm80 há 1 mês atrás
  Tri Dao da264e5742 Change file names and class names to include sm90 suffix há 1 mês atrás
  Tri Dao 5f25b9781f Make epilogue_fwd work for Ampere há 1 mês atrás
  Tri Dao 69bd392159 Merge bwd and bwd_varlen in the C++ API há 1 mês atrás
  Tri Dao c3cdc0fd88 Add sm_margin as an option for overlapping with communication há 1 mês atrás
  Tri Dao 147ac33a2e Tune num_splits for local, don't split when num_n_blocks is small há 1 mês atrás
  Tri Dao 3e5d77a102 Group instantiations for different hdims together há 1 mês atrás
  Tri Dao 6807b1ea37 Longest-processing-time-first scheduler for causal há 1 mês atrás
  Tri Dao fb9c9cbbe9 Support qkv_descale of shape (batch_size, nheads_kv) há 1 mês atrás
  Tri Dao 6293008748 Add option for Mma0_is_RS and Mma1_is_RS in attn fwd há 2 meses atrás
  Tri Dao 9c954f7021 Use num_split_heuristics in fwd and fwd_varlen há 2 meses atrás
  Tri Dao f6e165becf Change tile_size and local to avoid wgmma being serialized há 2 meses atrás
  Tri Dao 42fc4962f0 Uncomment tanh softcapping há 2 meses atrás
  Tri Dao 9553b2728f More env vars to disable features há 2 meses atrás
  Tri Dao 3248babb9e QOL: Use env var to selectively disable features há 2 meses atrás
  Tri Dao c9c40eba83 Uncomment local attn há 2 meses atrás
  Tri Dao 94657af3e8 Add option for not doing intra-WG overlapping of gemm and softmax há 2 meses atrás
  Tri Dao fc2fd95a18 Renable FP8 kernels há 2 meses atrás
  Tri Dao 64d92bce53 Split PagedKV into separate .cu files to speed up compilation há 2 meses atrás
  Tri Dao bc8a001d8d Load cos/sin by splitting the work among threads on the same row há 2 meses atrás
  Tri Dao 1dc3364774 Consolidate seqlen info into a struct há 2 meses atrás
  Tri Dao 586ba914bb Move fwd tile size to a separate file há 2 meses atrás
  Tri Dao 018b9af683 Move .cu files to instantiations, use generate_kernels.py há 2 meses atrás
  Tri Dao 0c49ac9a07 Implement rotary non-interleaved há 2 meses atrás
  Tri Dao b2d3fe92ff Move rotary to a separate file há 2 meses atrás
  Tri Dao 9f82a326ad Implement rotary for attn decode há 2 meses atrás