Tri Dao
|
abbc131173
[LayerNorm] Switch from CUDA to Triton implementation
|
1 year ago |
Tri Dao
|
f1a73d0740
Run isort and black on python files
|
1 year ago |
Tri Dao
|
75e334d407
[MLP] Add ParallelMLP
|
1 year ago |
Tri Dao
|
b3177dfaf6
[GPT] Enable FlashAttention for GPT-J
|
1 year ago |
Tri Dao
|
6fc1e07da2
[Block] Re-enable DropPath
|
1 year ago |
Tri Dao
|
4f285b3547
FlashAttention-2 release
|
1 year ago |
ljss
|
8e44c0eefb
Fix a bug
|
1 year ago |
Federico Berto
|
3889ba168b
[BugFix] cannot unpack non-iterable NoneType object
|
1 year ago |
Tri Dao
|
ba2fe7f378
[Gen] Move allocate_inference_cache to within the model
|
1 year ago |
Tri Dao
|
96d10f6545
Implement LLaMa
|
1 year ago |
Tri Dao
|
393882bc08
[LayerNorm] Implement LN with parallel residual, support dim 8k
|
1 year ago |
Tri Dao
|
4d87e4d875
Implement GPT-J
|
1 year ago |
Tri Dao
|
88173a1aaf
[FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP
|
1 year ago |
Tri Dao
|
780e8eeabb
[ViT] Support timm checkpoint, add tests
|
2 years ago |
Tri Dao
|
ef085cfcda
[ViT] Fix extra norm_0, use new LN order in Block
|
2 years ago |
Tri Dao
|
ff34123bd4
Reorder LN in Block, support OPT
|
2 years ago |
Tri Dao
|
93383bd55b
[TP] Implement TensorParallel without sequence parallel
|
2 years ago |
Tri Dao
|
a8cfe51551
Implement Tensor Parallel for transformer Block
|
2 years ago |
Tri Dao
|
5fb6df0e04
Implement BERT
|
2 years ago |
Tri Dao
|
d4b320b31f
Add MLP, MHA, Block, Embedding modules
|
2 years ago |