SueJane
|
3f1b4d38e7
Fix: check the type of max_seqlen_k instead of checking max_seqlen twice (#1127)
|
4 ay önce |
Markus Krimmel
|
6bbc532388
fix: cast the alibi slopes to torch.float32 (#846)
|
9 ay önce |
Tri Dao
|
a190df011c
Add window_size option to ParallelMHA
|
10 ay önce |
Tri Dao
|
ef0ed10622
Add window_size option to MHA and GPT
|
10 ay önce |
jiaxingli
|
386e391117
Fix: implement deterministic backward in mha (#748)
|
11 ay önce |
Tri Dao
|
3f7d5786ba
Pass alibi slopes to flash_attn_with_kvcache during generation
|
11 ay önce |
Tri Dao
|
c3b2196652
Add Alibi to MHA, test with Baichuan-13B
|
1 yıl önce |
Tri Dao
|
0705d2718d
[Llama] Fix some tests, add tests for Llama 2 and CodeLlama
|
1 yıl önce |
Tri Dao
|
e6a8026489
[Gen] Rename max_sequence_len->max_seqlen, sequence_len_offset->seqlen_offset
|
1 yıl önce |
Tri Dao
|
dfe29f5e2b
[Gen] Don't use ft_attention, use flash_attn_with_kvcache instead
|
1 yıl önce |
Tri Dao
|
8a733cbd53
[Gen] Fix calling update_graph_cache in tests
|
1 yıl önce |
Tri Dao
|
a86442f0f3
[Gen] Use flash_attn_with_kvcache in generation
|
1 yıl önce |
Tri Dao
|
798858f9f1
Fix test_baichuan
|
1 yıl önce |
Tri Dao
|
de2949f37d
[Rotary] Pass max_seqlen from mha.py to rotary during inference
|
1 yıl önce |
dan_the_3rd
|
011ec323d6
Support MQA + MP for decoding (#490)
|
1 yıl önce |
Tri Dao
|
a2974e850a
Change causal for CrossAttention in mha.py to align to bottom right
|
1 yıl önce |
Tri Dao
|
f1a73d0740
Run isort and black on python files
|
1 yıl önce |
Xuechen Li
|
bb4cded17b
support when num_heads is not divisible by world_size; resolves #459 (#461)
|
1 yıl önce |
Tri Dao
|
bec5b3d374
[MHA] Run black on mha.py
|
1 yıl önce |
Tri Dao
|
425dbcb6c6
[MHA] Implement MQA/GQA
|
1 yıl önce |
Tri Dao
|
4f285b3547
FlashAttention-2 release
|
1 yıl önce |
Tri Dao
|
62e9814466
[Rotary] Make sure frequency calculation is in fp32
|
1 yıl önce |
Tri Dao
|
48bc6eacd6
[Gen] Add rotary base as an argument to FT attention kernel
|
1 yıl önce |
Tri Dao
|
ba2fe7f378
[Gen] Move allocate_inference_cache to within the model
|
1 yıl önce |
Tri Dao
|
311d6606bf
[Gen] Fix FT kernel smem size, CG when batch size changed
|
1 yıl önce |
Tri Dao
|
ac3b684cdb
Have a separate nn.Dropout module in SelfAttention module
|
1 yıl önce |
Tri Dao
|
605655bc66
[Gen] Fix FT kernel when using CG
|
1 yıl önce |
Tri Dao
|
f5d0fbd468
[FT] Fix FT's single query attention for bf16 hdim128 rotary
|
1 yıl önce |
Tri Dao
|
4d87e4d875
Implement GPT-J
|
1 yıl önce |
Tri Dao
|
780e8eeabb
[ViT] Support timm checkpoint, add tests
|
1 yıl önce |