david/flash-attention

Auteur	SHA1 Message	Date
Ying Zhang	cdbbe844b1 minor changes to unpad_input test util func	il y a 3 mois
Tri Dao	299563626f Fix test with alibi and cache_leftpad	il y a 4 mois
Tri Dao	751c762c9c Don't specialize for hdim 224 to speed up compilation	il y a 4 mois
Phil Wang	5f1ae4a34b backwards for softcapping (#1033)	il y a 4 mois
Tri Dao	40e534a7f6 Implement cache_leftpad	il y a 5 mois
Tri Dao	d0787acc16 Relax dropout_fraction test	il y a 5 mois
Tri Dao	dca6d89da4 Don't support softcap and dropout at the same time	il y a 5 mois
Tri Dao	81e01efd4b More typo fixes	il y a 5 mois
Tri Dao	3d41db3e2c Only test backward if there's no softcapping	il y a 5 mois
Nicolas Patry	8f873cc6ac Implement softcapping. (#1025)	il y a 5 mois
muoshuosha	6df7e0a02e Fix the varlen deterministic test (#1023)	il y a 5 mois
cao lei	6a2a16e994 fix typo (#974)	il y a 5 mois
Grigory Sizov	f816dee63c Support unpadded LSE layout (#970)	il y a 5 mois
Grigory Sizov	2a15840f09 Enable paged attention in varlen forward (#831)	il y a 9 mois
Tri Dao	2406f28805 Enable headdim 256 backward on consumer GPUs (Ampere, Ada)	il y a 9 mois
Tri Dao	54e80a3829 Implement page KV cache	il y a 10 mois
Tri Dao	10dad61277 apply_dropout now takes tensor of rowcol layout	il y a 11 mois
Tri Dao	a7b66ae25a Simplify writing softmax to gmem	il y a 11 mois
Tri Dao	732654583c Implement deterministic backward (thanks to Meituan)	il y a 11 mois
Tri Dao	5ab9b3667b Clean up alibi, implement non-causal alibi	il y a 1 an
Tri Dao	e279bf8ed9 [Gen] Accept cache_batch_idx to index into the KV cache	il y a 1 an
Tri Dao	083e8f525f Implement local attention	il y a 1 an
Tri Dao	65c234ed90 Don't over-allocate dq_accum in case of varlen	il y a 1 an
Tri Dao	2d8ea9a530 Swap seqlen_q and ngroups when seqlen_q=1 (h/t Daniel Haziza)	il y a 1 an
Tri Dao	3250ff3d82 Swap seqlen_q, nheads for MQA when seqlen_q=1 for fwd (h/t Daniel H)	il y a 1 an
Tri Dao	ccbb14f38e Implement rotary embedding in flash_attn_with_kvcache	il y a 1 an
Tri Dao	56b7fc6ee0 Simplify the implementation of KVcache attn by appending KV first	il y a 1 an
Tri Dao	37c6e05406 Implement flash_attn_with_kvcache	il y a 1 an
Tri Dao	0c04943fa2 Require CUDA 11.6+, clean up setup.py	il y a 1 an
Tri Dao	b1fbbd8337 Implement splitKV attention	il y a 1 an

Récemment Précédemment

Historique des commits Trouver

Historique des commits