david/flash-attention

作者	SHA1 备注	提交日期
Tri Dao	40e534a7f6 Implement cache_leftpad	5 月之前
Tri Dao	dca6d89da4 Don't support softcap and dropout at the same time	5 月之前
Tri Dao	908511b2b6 Split into more .cu files to speed up compilation	5 月之前
Tri Dao	1d536d7de5 Minor cleanup of softcapping	5 月之前
Nicolas Patry	8f873cc6ac Implement softcapping. (#1025)	5 月之前
Nicolas Patry	5bf201966a Fixing argument checking when using `seqlenq_ngroups_swapped`. (#976)	5 月之前
Grigory Sizov	f816dee63c Support unpadded LSE layout (#970)	5 月之前
Tri Dao	9eb3d099c1 Transpose out when swapping seqlen_q and num_groups	8 月之前
Driss Guessous	4a73e903da Add in, macrosf for defining __grid_constant__ (#852)	9 月之前
Grigory Sizov	2a15840f09 Enable paged attention in varlen forward (#831)	9 月之前
Tri Dao	2406f28805 Enable headdim 256 backward on consumer GPUs (Ampere, Ada)	9 月之前
Tri Dao	d9a5cb291c Fix dv = torch::empty_like(k) for mha_bwd_varlen as well	10 月之前
Brian Hirsh	2423cca3ad fix backward for when query and key have different contiguity (#818)	10 月之前
Grigory Sizov	4687936413 Fix Windows build (#816)	10 月之前
Jeremy Reizenstein	0658e320f6 Preprocessor switches to control functionality (#788)	10 月之前
Tri Dao	54e80a3829 Implement page KV cache	10 月之前
Tri Dao	ea8a25ca38 Remove configure in bwd kernel launch	10 月之前
Grigory Sizov	af01244ddd Add split-kv and M<->H swap to varlen forward decoding attention (#754)	10 月之前
Tri Dao	0842ec0da4 Don't dispatch to local if window size >= seqlen_k	11 月之前
Tri Dao	732654583c Implement deterministic backward (thanks to Meituan)	11 月之前
Tri Dao	5ab9b3667b Clean up alibi, implement non-causal alibi	1 年之前
Sanghun Cho	e4f726fc44 Support alibi, by Sanghun Cho from Kakao Brain	1 年之前
Jeremy Reizenstein	ce3e7280f8 Allow varlen_fwd to take optional seqused_k (#647)	1 年之前
Tri Dao	db2f80692c Write zero to out / grad if seqlen_q or seqlen_k is zero	1 年之前
Tri Dao	e279bf8ed9 [Gen] Accept cache_batch_idx to index into the KV cache	1 年之前
Tri Dao	083e8f525f Implement local attention	1 年之前
Tri Dao	65c234ed90 Don't over-allocate dq_accum in case of varlen	1 年之前
Tri Dao	2d8ea9a530 Swap seqlen_q and ngroups when seqlen_q=1 (h/t Daniel Haziza)	1 年之前
Tri Dao	3250ff3d82 Swap seqlen_q, nheads for MQA when seqlen_q=1 for fwd (h/t Daniel H)	1 年之前
Tri Dao	ccbb14f38e Implement rotary embedding in flash_attn_with_kvcache	1 年之前

更新的提交更旧的提交

提交历史 查找

提交历史