Историја ревизија

Аутор SHA1 Порука Датум
  Tri Dao f907a13187 Tune tile sizes for fwd varlen on Sm80 and Sm86 пре 4 недеља
  Tri Dao 76f14c61c9 Tune fwd tile sizes for Sm86 and Sm89 пре 4 недеља
  Tri Dao 659a631f4c Rename bwd classes to include Sm90 suffix пре 1 месец
  Tri Dao a901c7eeda Make Sm80 forward pass work with persistent scheduler пре 1 месец
  Tri Dao 5171269dab Implement forward pass for Sm80 пре 1 месец
  Tri Dao da264e5742 Change file names and class names to include sm90 suffix пре 1 месец
  Tri Dao 5f25b9781f Make epilogue_fwd work for Ampere пре 1 месец
  Tri Dao 69bd392159 Merge bwd and bwd_varlen in the C++ API пре 1 месец
  Tri Dao c3cdc0fd88 Add sm_margin as an option for overlapping with communication пре 1 месец
  Tri Dao 147ac33a2e Tune num_splits for local, don't split when num_n_blocks is small пре 1 месец
  Tri Dao 3e5d77a102 Group instantiations for different hdims together пре 1 месец
  Tri Dao 6807b1ea37 Longest-processing-time-first scheduler for causal пре 1 месец
  Tri Dao fb9c9cbbe9 Support qkv_descale of shape (batch_size, nheads_kv) пре 1 месец
  Tri Dao 6293008748 Add option for Mma0_is_RS and Mma1_is_RS in attn fwd пре 1 месец
  Tri Dao 9c954f7021 Use num_split_heuristics in fwd and fwd_varlen пре 2 месеци
  Tri Dao f6e165becf Change tile_size and local to avoid wgmma being serialized пре 2 месеци
  Tri Dao 42fc4962f0 Uncomment tanh softcapping пре 2 месеци
  Tri Dao 9553b2728f More env vars to disable features пре 2 месеци
  Tri Dao 3248babb9e QOL: Use env var to selectively disable features пре 2 месеци
  Tri Dao c9c40eba83 Uncomment local attn пре 2 месеци
  Tri Dao 94657af3e8 Add option for not doing intra-WG overlapping of gemm and softmax пре 2 месеци
  Tri Dao fc2fd95a18 Renable FP8 kernels пре 2 месеци
  Tri Dao 64d92bce53 Split PagedKV into separate .cu files to speed up compilation пре 2 месеци
  Tri Dao bc8a001d8d Load cos/sin by splitting the work among threads on the same row пре 2 месеци
  Tri Dao 1dc3364774 Consolidate seqlen info into a struct пре 2 месеци
  Tri Dao 586ba914bb Move fwd tile size to a separate file пре 2 месеци
  Tri Dao 018b9af683 Move .cu files to instantiations, use generate_kernels.py пре 2 месеци
  Tri Dao 0c49ac9a07 Implement rotary non-interleaved пре 2 месеци
  Tri Dao b2d3fe92ff Move rotary to a separate file пре 2 месеци
  Tri Dao 9f82a326ad Implement rotary for attn decode пре 2 месеци