Cronologia Commit

Autore SHA1 Messaggio Data
  Tri Dao f907a13187 Tune tile sizes for fwd varlen on Sm80 and Sm86 4 settimane fa
  Tri Dao 76f14c61c9 Tune fwd tile sizes for Sm86 and Sm89 4 settimane fa
  Tri Dao 659a631f4c Rename bwd classes to include Sm90 suffix 1 mese fa
  Tri Dao a901c7eeda Make Sm80 forward pass work with persistent scheduler 1 mese fa
  Tri Dao 5171269dab Implement forward pass for Sm80 1 mese fa
  Tri Dao da264e5742 Change file names and class names to include sm90 suffix 1 mese fa
  Tri Dao 5f25b9781f Make epilogue_fwd work for Ampere 1 mese fa
  Tri Dao 69bd392159 Merge bwd and bwd_varlen in the C++ API 1 mese fa
  Tri Dao c3cdc0fd88 Add sm_margin as an option for overlapping with communication 1 mese fa
  Tri Dao 147ac33a2e Tune num_splits for local, don't split when num_n_blocks is small 1 mese fa
  Tri Dao 3e5d77a102 Group instantiations for different hdims together 1 mese fa
  Tri Dao 6807b1ea37 Longest-processing-time-first scheduler for causal 1 mese fa
  Tri Dao fb9c9cbbe9 Support qkv_descale of shape (batch_size, nheads_kv) 1 mese fa
  Tri Dao 6293008748 Add option for Mma0_is_RS and Mma1_is_RS in attn fwd 1 mese fa
  Tri Dao 9c954f7021 Use num_split_heuristics in fwd and fwd_varlen 2 mesi fa
  Tri Dao f6e165becf Change tile_size and local to avoid wgmma being serialized 2 mesi fa
  Tri Dao 42fc4962f0 Uncomment tanh softcapping 2 mesi fa
  Tri Dao 9553b2728f More env vars to disable features 2 mesi fa
  Tri Dao 3248babb9e QOL: Use env var to selectively disable features 2 mesi fa
  Tri Dao c9c40eba83 Uncomment local attn 2 mesi fa
  Tri Dao 94657af3e8 Add option for not doing intra-WG overlapping of gemm and softmax 2 mesi fa
  Tri Dao fc2fd95a18 Renable FP8 kernels 2 mesi fa
  Tri Dao 64d92bce53 Split PagedKV into separate .cu files to speed up compilation 2 mesi fa
  Tri Dao bc8a001d8d Load cos/sin by splitting the work among threads on the same row 2 mesi fa
  Tri Dao 1dc3364774 Consolidate seqlen info into a struct 2 mesi fa
  Tri Dao 586ba914bb Move fwd tile size to a separate file 2 mesi fa
  Tri Dao 018b9af683 Move .cu files to instantiations, use generate_kernels.py 2 mesi fa
  Tri Dao 0c49ac9a07 Implement rotary non-interleaved 2 mesi fa
  Tri Dao b2d3fe92ff Move rotary to a separate file 2 mesi fa
  Tri Dao 9f82a326ad Implement rotary for attn decode 2 mesi fa