Tri Dao
|
0705d2718d
[Llama] Fix some tests, add tests for Llama 2 and CodeLlama
|
1 year ago |
Tri Dao
|
dfe29f5e2b
[Gen] Don't use ft_attention, use flash_attn_with_kvcache instead
|
1 year ago |
Tri Dao
|
913922cac5
[Gen] Refactor decoding function
|
1 year ago |
Tri Dao
|
942fcbf046
[Rotary] Implement rotary in Triton
|
1 year ago |
Tri Dao
|
0e8c46ae08
Run isort and black on test files
|
1 year ago |
Tri Dao
|
4d87e4d875
Implement GPT-J
|
1 year ago |
Tri Dao
|
78b7a1dc18
[OPT] Load fp16 weights on CPU before moving to GPU
|
1 year ago |
Tri Dao
|
88173a1aaf
[FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP
|
1 year ago |