Commit History

Author SHA1 Message Date
  sclarkson 1feb711f46 Fix compilation with clang on ARM64 (#1285) 1 week ago
  Tri Dao 27f8f890df [FusedDense] Allocate lt_workspace on input device 1 year ago
  Tri Dao 88173a1aaf [FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP 1 year ago
  Tri Dao 226a1b721d Implement TensorParallel for FusedDense and FusedDenseGeluDense 2 years ago
  Tri Dao e68ebbe89a Simplify FusedDense 2 years ago
  Tri Dao fa6d1ce44f Add fused_dense and dropout_add_layernorm CUDA extensions 2 years ago