Commit történet

Szerző SHA1 Üzenet Dátum
  Tri Dao 6807b1ea37 Longest-processing-time-first scheduler for causal 1 napja
  Tri Dao fb9c9cbbe9 Support qkv_descale of shape (batch_size, nheads_kv) 4 napja
  Tri Dao 6293008748 Add option for Mma0_is_RS and Mma1_is_RS in attn fwd 1 hete
  Tri Dao f5e89ff136 Tune tile size for bwd softcap 1 hete
  Tri Dao 29cdfedd80 Use Bulk reduce instead of TMA for dQaccum, split across WGs 1 hete
  Tri Dao 9c954f7021 Use num_split_heuristics in fwd and fwd_varlen 1 hete
  Tri Dao f6e165becf Change tile_size and local to avoid wgmma being serialized 2 hete
  Tri Dao 3ed79742fb Add option to shuffle LSE and dPsum in the bwd 2 hete
  Tri Dao 42fc4962f0 Uncomment tanh softcapping 2 hete
  Tri Dao 9553b2728f More env vars to disable features 3 hete
  Tri Dao 3248babb9e QOL: Use env var to selectively disable features 3 hete
  Tri Dao 94657af3e8 Add option for not doing intra-WG overlapping of gemm and softmax 3 hete
  Tri Dao fc2fd95a18 Renable FP8 kernels 3 hete
  Tri Dao 3d0d147940 Early stop on actual num_splits in mha_combine kernel 3 hete
  Tri Dao 9fd6b977bb Precompute the pointers in mha_combine kernel 3 hete
  Tri Dao 64d92bce53 Split PagedKV into separate .cu files to speed up compilation 3 hete
  Tri Dao 586ba914bb Move fwd tile size to a separate file 3 hete
  Tri Dao 5194d9b2e6 Include torch/version.h for TORCH_VERSION* macros 3 hete
  Tri Dao 25dbfa6452 Add heuristic for setting num_splits from FA2 3 hete
  Tri Dao 018b9af683 Move .cu files to instantiations, use generate_kernels.py 3 hete
  Tri Dao 0290574956 Put #if to avoid redefinition with torch >= 2.4 4 hete
  Tri Dao 9f82a326ad Implement rotary for attn decode 1 hónapja
  Tri Dao 4d00645c76 Implement appending new KV to KV cache 1 hónapja
  Tri Dao 4860b1068f Fix mha_combine tests 1 hónapja
  Tri Dao a65af55f4a Move mask_fn and load_Q into separate functions 1 hónapja
  Tri Dao df96486c31 Decode: varlen, paged KV, leftpad 1 hónapja
  Tri Dao ea7a98f15d Fix backward with softcap 2 hónapja
  Tri Dao 6e8b25e426 Refactor 2 hónapja
  Ying Zhang dff976a84a fixes 3 hónapja
  Ying Zhang 7b4e68e04f hopper local attention 3 hónapja