Zhihao Shen
|
30e1ef0f79
minify torch.torch.int32 to torch.int32 (#1237)
|
2 mesi fa |
Ying Zhang
|
cdbbe844b1
minor changes to unpad_input test util func
|
3 mesi fa |
Tri Dao
|
abbc131173
[LayerNorm] Switch from CUDA to Triton implementation
|
11 mesi fa |
Kevin Hu
|
07005806ff
Add BigCode converters (#532)
|
1 anno fa |
Kevin Hu
|
4c91621a5e
Inverse state dict for BERT (#527)
|
1 anno fa |
Tri Dao
|
f1a73d0740
Run isort and black on python files
|
1 anno fa |
Kiarash Jamali
|
684196b8c5
Allow rotary embeddings for Bert (#363)
|
1 anno fa |
Tri Dao
|
96d10f6545
Implement LLaMa
|
1 anno fa |
Tri Dao
|
88173a1aaf
[FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP
|
1 anno fa |
Tri Dao
|
ff34123bd4
Reorder LN in Block, support OPT
|
1 anno fa |
Tri Dao
|
714c1b4f0f
[Bert] Fix embedding layer norm before embedding dropout
|
1 anno fa |
Tri Dao
|
c6ecd40a59
Tweak CrossEntropyLoss to take process_group in init
|
2 anni fa |
Tri Dao
|
dff68c2b22
Add smoothing for CrossEntropyParallel, rename to CrossEntropyLoss
|
2 anni fa |
Tri Dao
|
e68ebbe89a
Simplify FusedDense
|
2 anni fa |
Tri Dao
|
13cdceb377
Implement last_layer_subset optimization for BERT
|
2 anni fa |
Tri Dao
|
5fb6df0e04
Implement BERT
|
2 anni fa |