david/flash-attention

Author	SHA1 Message	Date
Tri Dao	9a11f440d3 Bump to v2.5.8	8 months ago
Tri Dao	35060e7450 [CI] Compile for pytorch 2.2.2 and 2.3.0	8 months ago
Tri Dao	ec6d22143b [CrossEntropy] Change ignored_index -> ignore_index	8 months ago
Tri Dao	85881f547f Bump to v2.5.7	9 months ago
Tri Dao	2aea958f89 [CI] Compile with torch 2.3.0.dev20240207	9 months ago
Tri Dao	656daef4ea Use Cute's local_tile to get gQ, gK, gV	9 months ago
Tri Dao	9eb3d099c1 Transpose out when swapping seqlen_q and num_groups	9 months ago
Ivan Komarov	f692b98d80 Fix spurious re-compilations of `rotary_kernel` (#911)	9 months ago
Driss Guessous	23e8fa5a26 Add the option for the macro and note (#893)	9 months ago
ljss	3e9414f1c3 Minor fix in compute_attn_1rowblock_splitkv (#900)	9 months ago
Tri Dao	36587c01cb [LayerNorm] Update layer_norm_linear	10 months ago
Markus Krimmel	6bbc532388 fix: cast the alibi slopes to torch.float32 (#846)	10 months ago
Driss Guessous	4a73e903da Add in, macrosf for defining __grid_constant__ (#852)	10 months ago
Grigory Sizov	2a15840f09 Enable paged attention in varlen forward (#831)	10 months ago
Arvind Sundararajan	26c9e82743 Support ARM builds (#757)	10 months ago
Chirag Jain	50896ec574 Make nvcc threads configurable via environment variable (#885)	10 months ago
Tri Dao	6c9e60de56 Bump to v2.5.6	10 months ago
Tri Dao	6e2fa30797 [CI] Change torch 2.3.0.dev20240126 to 20240105 for nvcr 24.02	10 months ago
Tri Dao	87a1277653 Bump to v2.5.5	11 months ago
Tri Dao	2406f28805 Enable headdim 256 backward on consumer GPUs (Ampere, Ada)	11 months ago
Tri Dao	43950dda45 Bump to v2.5.4	11 months ago
Tri Dao	4d6b794b3c Update Cutlass to v3.4.1	11 months ago
Tri Dao	b32efb1a4d Don't need to reduce row_sum during online softmax	11 months ago
Qubitium	f45bbb4c94 Optimize compile to 1: avoid oom 2: minimize swap usage 3: avoid threads starvation when letting ninja decide how many workers to spawn or manual MAX_JOBS "guesses". Logic is to take the min value of MAX_JOBS auto-calculated by two metrics: 1: cpu cores 2: free memory. This should allow flash-attn to compile close to the most efficient manner under any consumer/server env. (#832)	11 months ago
Tri Dao	5cdabc2809 Bump to v2.5.3	11 months ago
Tri Dao	d9a5cb291c Fix dv = torch::empty_like(k) for mha_bwd_varlen as well	11 months ago
Tri Dao	a190df011c Add window_size option to ParallelMHA	11 months ago
Brian Hirsh	2423cca3ad fix backward for when query and key have different contiguity (#818)	11 months ago
Grigory Sizov	4687936413 Fix Windows build (#816)	11 months ago
Tri Dao	61a7772479 Bump to v2.5.2	11 months ago

Newer Older

Commit History Find

Commit History