Commit History

Author SHA1 Message Date
  Tri Dao 7c2191542a [Gen] Make generation work with Tensor Parallel 2 years ago
  Tri Dao 0938298e4c [Gen] Adjust shape of kv_cache when using FT 2 years ago
  Tri Dao 11be742aa3 [Gen] Test generation with rotary embedding 2 years ago
  Tri Dao 8d9674ed08 Merge pull request #102 from Lamikins/main 2 years ago
  Tri Dao 93383bd55b [TP] Implement TensorParallel without sequence parallel 2 years ago
  Darius Lam aec35fd67c fixed cross attention typeerror 2 years ago
  Tri Dao a668890fcd [Gen] Add option to run generation with FT attention kernel 2 years ago
  Tri Dao 65b4064b2a [FusedDense] Kick off input all_gather before weight dtype conversion 2 years ago
  Tri Dao a6ec1782dc Bump to v0.2.6 2 years ago
  Tri Dao 63670fd84a Implement generation for GPT 2 years ago
  Tri Dao 1e712ea8b0 Implement TensorParallel for MHA 2 years ago
  Tri Dao e68ebbe89a Simplify FusedDense 2 years ago
  Tri Dao 496e4f528c Implement XPos (Sun et al.) 2 years ago
  Tri Dao 13cdceb377 Implement last_layer_subset optimization for BERT 2 years ago
  Tri Dao 5fb6df0e04 Implement BERT 2 years ago
  Tri Dao d4b320b31f Add MLP, MHA, Block, Embedding modules 2 years ago