david/flash-attention

Yazar	SHA1 Mesaj	Tarih
SueJane	3f1b4d38e7 Fix: check the type of max_seqlen_k instead of checking max_seqlen twice (#1127)	4 ay önce
Markus Krimmel	6bbc532388 fix: cast the alibi slopes to torch.float32 (#846)	9 ay önce
Tri Dao	a190df011c Add window_size option to ParallelMHA	10 ay önce
Tri Dao	ef0ed10622 Add window_size option to MHA and GPT	10 ay önce
jiaxingli	386e391117 Fix: implement deterministic backward in mha (#748)	11 ay önce
Tri Dao	3f7d5786ba Pass alibi slopes to flash_attn_with_kvcache during generation	11 ay önce
Tri Dao	c3b2196652 Add Alibi to MHA, test with Baichuan-13B	1 yıl önce
Tri Dao	0705d2718d [Llama] Fix some tests, add tests for Llama 2 and CodeLlama	1 yıl önce
Tri Dao	e6a8026489 [Gen] Rename max_sequence_len->max_seqlen, sequence_len_offset->seqlen_offset	1 yıl önce
Tri Dao	dfe29f5e2b [Gen] Don't use ft_attention, use flash_attn_with_kvcache instead	1 yıl önce
Tri Dao	8a733cbd53 [Gen] Fix calling update_graph_cache in tests	1 yıl önce
Tri Dao	a86442f0f3 [Gen] Use flash_attn_with_kvcache in generation	1 yıl önce
Tri Dao	798858f9f1 Fix test_baichuan	1 yıl önce
Tri Dao	de2949f37d [Rotary] Pass max_seqlen from mha.py to rotary during inference	1 yıl önce
dan_the_3rd	011ec323d6 Support MQA + MP for decoding (#490)	1 yıl önce
Tri Dao	a2974e850a Change causal for CrossAttention in mha.py to align to bottom right	1 yıl önce
Tri Dao	f1a73d0740 Run isort and black on python files	1 yıl önce
Xuechen Li	bb4cded17b support when num_heads is not divisible by world_size; resolves #459 (#461)	1 yıl önce
Tri Dao	bec5b3d374 [MHA] Run black on mha.py	1 yıl önce
Tri Dao	425dbcb6c6 [MHA] Implement MQA/GQA	1 yıl önce
Tri Dao	4f285b3547 FlashAttention-2 release	1 yıl önce
Tri Dao	62e9814466 [Rotary] Make sure frequency calculation is in fp32	1 yıl önce
Tri Dao	48bc6eacd6 [Gen] Add rotary base as an argument to FT attention kernel	1 yıl önce
Tri Dao	ba2fe7f378 [Gen] Move allocate_inference_cache to within the model	1 yıl önce
Tri Dao	311d6606bf [Gen] Fix FT kernel smem size, CG when batch size changed	1 yıl önce
Tri Dao	ac3b684cdb Have a separate nn.Dropout module in SelfAttention module	1 yıl önce
Tri Dao	605655bc66 [Gen] Fix FT kernel when using CG	1 yıl önce
Tri Dao	f5d0fbd468 [FT] Fix FT's single query attention for bf16 hdim128 rotary	1 yıl önce
Tri Dao	4d87e4d875 Implement GPT-J	1 yıl önce
Tri Dao	780e8eeabb [ViT] Support timm checkpoint, add tests	1 yıl önce

Daha yeni Daha Eski

Geçmişin Kaydedilmesi Bul

Geçmişin Kaydedilmesi