Tri Dao
|
e0fbaa7016
[Gen] Simplify decode_speculative
|
1 éve |
Tri Dao
|
e6a8026489
[Gen] Rename max_sequence_len->max_seqlen, sequence_len_offset->seqlen_offset
|
1 éve |
Tri Dao
|
dfe29f5e2b
[Gen] Don't use ft_attention, use flash_attn_with_kvcache instead
|
1 éve |
Tri Dao
|
ccbb14f38e
Implement rotary embedding in flash_attn_with_kvcache
|
1 éve |
Tri Dao
|
a86442f0f3
[Gen] Use flash_attn_with_kvcache in generation
|
1 éve |
Tri Dao
|
fd20f16a4e
Support cache_seqlens being integer
|
1 éve |
Tri Dao
|
913922cac5
[Gen] Refactor decoding function
|
1 éve |
dan_the_3rd
|
011ec323d6
Support MQA + MP for decoding (#490)
|
1 éve |
Tri Dao
|
9f42cb6e7a
[Gen] Clone logits before returning when cg=True
|
1 éve |
Tri Dao
|
f8aea6ead0
[GPT] Generalize last_token_only arg to num_last_tokens
|
1 éve |
Tri Dao
|
371e20658c
[GPT] Test generation when passing in multiple tokens
|
1 éve |
Tri Dao
|
c000c3a2c0
[GPT] Move more tests to test_gpt.py
|
1 éve |
Tri Dao
|
9b713872ea
[GPT] Move GPT and OPT generation tests to test_{gpt,opt}.py
|
1 éve |
Tri Dao
|
0e8c46ae08
Run isort and black on test files
|
1 éve |
Tri Dao
|
4d87e4d875
Implement GPT-J
|
1 éve |
Tri Dao
|
88173a1aaf
[FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP
|
2 éve |
Tri Dao
|
ff34123bd4
Reorder LN in Block, support OPT
|
2 éve |
Tri Dao
|
63670fd84a
Implement generation for GPT
|
2 éve |
Tri Dao
|
9d797d8848
Support loading GPT2 weights from Huggingface
|
2 éve |