david/flash-attention

Autor	SHA1 Mensagem	Data
Ying Zhang	cdbbe844b1 minor changes to unpad_input test util func	há 4 meses atrás
Tri Dao	299563626f Fix test with alibi and cache_leftpad	há 5 meses atrás
Tri Dao	751c762c9c Don't specialize for hdim 224 to speed up compilation	há 5 meses atrás
Phil Wang	5f1ae4a34b backwards for softcapping (#1033)	há 6 meses atrás
Tri Dao	40e534a7f6 Implement cache_leftpad	há 6 meses atrás
Tri Dao	d0787acc16 Relax dropout_fraction test	há 6 meses atrás
Tri Dao	dca6d89da4 Don't support softcap and dropout at the same time	há 6 meses atrás
Tri Dao	81e01efd4b More typo fixes	há 6 meses atrás
Tri Dao	3d41db3e2c Only test backward if there's no softcapping	há 6 meses atrás
Nicolas Patry	8f873cc6ac Implement softcapping. (#1025)	há 6 meses atrás
muoshuosha	6df7e0a02e Fix the varlen deterministic test (#1023)	há 6 meses atrás
cao lei	6a2a16e994 fix typo (#974)	há 6 meses atrás
Grigory Sizov	f816dee63c Support unpadded LSE layout (#970)	há 6 meses atrás
Grigory Sizov	2a15840f09 Enable paged attention in varlen forward (#831)	há 10 meses atrás
Tri Dao	2406f28805 Enable headdim 256 backward on consumer GPUs (Ampere, Ada)	há 11 meses atrás
Tri Dao	54e80a3829 Implement page KV cache	há 1 ano atrás
Tri Dao	10dad61277 apply_dropout now takes tensor of rowcol layout	há 1 ano atrás
Tri Dao	a7b66ae25a Simplify writing softmax to gmem	há 1 ano atrás
Tri Dao	732654583c Implement deterministic backward (thanks to Meituan)	há 1 ano atrás
Tri Dao	5ab9b3667b Clean up alibi, implement non-causal alibi	há 1 ano atrás
Tri Dao	e279bf8ed9 [Gen] Accept cache_batch_idx to index into the KV cache	há 1 ano atrás
Tri Dao	083e8f525f Implement local attention	há 1 ano atrás
Tri Dao	65c234ed90 Don't over-allocate dq_accum in case of varlen	há 1 ano atrás
Tri Dao	2d8ea9a530 Swap seqlen_q and ngroups when seqlen_q=1 (h/t Daniel Haziza)	há 1 ano atrás
Tri Dao	3250ff3d82 Swap seqlen_q, nheads for MQA when seqlen_q=1 for fwd (h/t Daniel H)	há 1 ano atrás
Tri Dao	ccbb14f38e Implement rotary embedding in flash_attn_with_kvcache	há 1 ano atrás
Tri Dao	56b7fc6ee0 Simplify the implementation of KVcache attn by appending KV first	há 1 ano atrás
Tri Dao	37c6e05406 Implement flash_attn_with_kvcache	há 1 ano atrás
Tri Dao	0c04943fa2 Require CUDA 11.6+, clean up setup.py	há 1 ano atrás
Tri Dao	b1fbbd8337 Implement splitKV attention	há 1 ano atrás

Recente Antigo

Histórico de Commits Pesquisar

Histórico de Commits