sclarkson
|
1feb711f46
Fix compilation with clang on ARM64 (#1285)
|
1 week ago |
Tri Dao
|
27f8f890df
[FusedDense] Allocate lt_workspace on input device
|
1 year ago |
Tri Dao
|
88173a1aaf
[FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP
|
1 year ago |
Tri Dao
|
226a1b721d
Implement TensorParallel for FusedDense and FusedDenseGeluDense
|
2 years ago |
Tri Dao
|
e68ebbe89a
Simplify FusedDense
|
2 years ago |
Tri Dao
|
fa6d1ce44f
Add fused_dense and dropout_add_layernorm CUDA extensions
|
2 years ago |