Commit Verlauf

Autor SHA1 Nachricht Datum
  Tri Dao d5893f3c74 Merge branch 'main' into changes_for_fp8 vor 4 Monaten
  Tri Dao 59594f2a67 Bump to v2.6.2 vor 4 Monaten
  Tri Dao 299563626f Fix test with alibi and cache_leftpad vor 4 Monaten
  Tri Dao 4488acee8d [CI] Compile with torch 2.4.0.dev20240527 vor 4 Monaten
  Tri Dao 65f723bb9a Split bwd into more .cu files to speed up compilation vor 4 Monaten
  Tri Dao 5ca83a9c71 Clean up softcapping bwd a bit vor 4 Monaten
  Tri Dao 751c762c9c Don't specialize for hdim 224 to speed up compilation vor 4 Monaten
  Driss Guessous 1c275eb070 Fix ima for split-kv kernel (#1085) vor 4 Monaten
  janEbert 3c4053b75c Make FA3 externally importable (#1053) vor 4 Monaten
  rocking d8f104e97a Support AMD ROCm on FlashAttention 2 (#1010) vor 4 Monaten
  Ying Zhang dfe1a59e4b Add var-seq-len to FA3 fp16 / bf16 fwd (#1072) vor 4 Monaten
  Cameron Shinn cb516f855b Remove torchlib dependency from cpp files (#1083) vor 4 Monaten
  Phil Wang 5f1ae4a34b backwards for softcapping (#1033) vor 4 Monaten
  youkaichao ef3e358a25 remove lambda (#1056) vor 4 Monaten
  Jorge António 4df62e1440 catch typo (#1058) vor 4 Monaten
  Ganesh Bikshandi 81b379c54d minor reformatting. vor 4 Monaten
  Ganesh Bikshandi e0607bb3aa minor formatting. vor 4 Monaten
  Ganesh Bikshandi df66e974bc fixed odd-seq-len-k. vor 4 Monaten
  Ganesh Bikshandi 63e9277199 change to correct tile size for headdim=128. vor 4 Monaten
  Ganesh Bikshandi fe4c5b59df undid clang formatting. vor 4 Monaten
  Ganesh Bikshandi 4eacd5886d enable all tests except odd-seq-lengths, where it crashes now. vor 4 Monaten
  Ganesh Bikshandi d5c2d1aa18 removed contiguous check. vor 4 Monaten
  Ganesh Bikshandi cdc966e81a adding files for fp8 changes. vor 4 Monaten
  Tri Dao 74b0761ff7 [FA3] BF16 forward vor 5 Monaten
  Tri Dao 898dd4bbf2 Pass seqused_k to _flash_attn_varlen_forward vor 5 Monaten
  Tri Dao 7ef24848cf Add FA3 image vor 5 Monaten
  Tri Dao 7f67966cc7 FA3 initial code release vor 5 Monaten
  Tri Dao b4a9dd6c9c Temporarily switch to cutlass fork for more shapes vor 5 Monaten
  Tri Dao 7551202cb2 Bump to v2.6.1 vor 5 Monaten
  Tri Dao 844912dca0 [CI] Switch from CUDA 12.2 to 12.3 vor 5 Monaten