Tri Dao
|
7c2191542a
[Gen] Make generation work with Tensor Parallel
|
2 years ago |
Tri Dao
|
0938298e4c
[Gen] Adjust shape of kv_cache when using FT
|
2 years ago |
Tri Dao
|
11be742aa3
[Gen] Test generation with rotary embedding
|
2 years ago |
Tri Dao
|
8d9674ed08
Merge pull request #102 from Lamikins/main
|
2 years ago |
Tri Dao
|
93383bd55b
[TP] Implement TensorParallel without sequence parallel
|
2 years ago |
Darius Lam
|
aec35fd67c
fixed cross attention typeerror
|
2 years ago |
Tri Dao
|
a668890fcd
[Gen] Add option to run generation with FT attention kernel
|
2 years ago |
Tri Dao
|
65b4064b2a
[FusedDense] Kick off input all_gather before weight dtype conversion
|
2 years ago |
Tri Dao
|
a6ec1782dc
Bump to v0.2.6
|
2 years ago |
Tri Dao
|
63670fd84a
Implement generation for GPT
|
2 years ago |
Tri Dao
|
1e712ea8b0
Implement TensorParallel for MHA
|
2 years ago |
Tri Dao
|
e68ebbe89a
Simplify FusedDense
|
2 years ago |
Tri Dao
|
496e4f528c
Implement XPos (Sun et al.)
|
2 years ago |
Tri Dao
|
13cdceb377
Implement last_layer_subset optimization for BERT
|
2 years ago |
Tri Dao
|
5fb6df0e04
Implement BERT
|
2 years ago |
Tri Dao
|
d4b320b31f
Add MLP, MHA, Block, Embedding modules
|
2 years ago |