Cronologia Commit

Autore SHA1 Messaggio Data
  Tri Dao 9e5e8bc91e Change causal mask to be aligned to bottom-right instead of top-left 1 anno fa
  Tri Dao 0e8c46ae08 Run isort and black on test files 1 anno fa
  Tri Dao c65b5106ac Fix Bwd NaN for varlen when seqlen_q >> seqlen_k and causal 1 anno fa
  Tri Dao 3524e13c11 Update to Cutlass 3.1 1 anno fa
  Tri Dao 1c41d2b0e5 Fix race condition in bwd (overwriting sK) 1 anno fa
  Tri Dao a4f148b6ab Fix masking of bwd when seqlen is not divisible by 128 1 anno fa
  Tri Dao 4f285b3547 FlashAttention-2 release 1 anno fa
  Tri Dao a8fec99a9a Skip flash_attn_split test 2 anni fa
  Tri Dao 9d3116addf Don't enforce bitwise consistency for dq in race condition test 2 anni fa
  Tri Dao 6998e0ecdb Fix out-of-bound memory read 2 anni fa
  Tri Dao 7479757191 Fix pipelining bug in Triton bwd with bias_type=matrix 2 anni fa
  Tri Dao 557781933d Parallelize CUDA bwd along seqlen_k instead of seqlen_q 2 anni fa
  Tri Dao ff78ea4123 Fix race condition in Triton bwd when there's bias 2 anni fa
  Tri Dao 86862cfd7b Implement attention bias for Triton version 2 anni fa
  Tri Dao aacc10fbab Fix race condition in Triton bwd for non-po2 headdims 2 anni fa
  Tri Dao 1fb12afdfb Avoid memcpy in the Triton bwd 2 anni fa
  Tri Dao 9b0bc97872 Fix race condition in Triton fwd 2 anni fa
  Tri Dao 4f81aff46e Add debug_barrier for all headdims in Triton bwd 2 anni fa
  Tri Dao e78d509c64 [WIP] Support all head dimensions up to 128 in the Triton bwd 2 anni fa
  Tri Dao 008951f1d9 Support all head dimensions up to 128 in the Triton fwd 2 anni fa
  Tri Dao b910bf14c1 Support arbitrary seqlens (both q & k) in Triton bwd 2 anni fa
  Tri Dao dc55469355 Support arbitrary seqlen_k in Triton bwd 2 anni fa
  Tri Dao d11341fd1a Fix Triton fwd to support seqlen not multiples of 128 2 anni fa
  Tri Dao b0c0db81f6 Implement FlashAttention in Triton 2 anni fa
  Tri Dao 46fd2a20b2 Support all head dims that are multiples of 8, up to 128 2 anni fa
  Tri Dao a5a8806d1a Split bwd on the seqlen_q dimension 2 anni fa
  Tri Dao 1aa6d7d9b6 Rework dropout to decouple forward and backward 2 anni fa
  Tri Dao 52fb4b729b Fix #54: set device for multi-GPU case 2 anni fa
  Tri Dao 5badfb7848 Implement attention kernel that splits the batch into two 2 anni fa
  Tri Dao 0c01568daf Only run backward test for d=128 on A100 2 anni fa