Commit History

Author SHA1 Message Date
  Tri Dao 3da42d24b1 [GPT] Add option to only return the logit for the last token 1 year ago
  Tri Dao 96d10f6545 Implement LLaMa 1 year ago
  Tri Dao b630aef53f Implement GatedMlp 1 year ago
  Tri Dao 6f6e9a9aaf [FusedDense] Enable sqrelu activation in FusedMLP 1 year ago
  Tri Dao 393882bc08 [LayerNorm] Implement LN with parallel residual, support dim 8k 1 year ago
  Tri Dao 993d12448e Implement GPT-NeoX 1 year ago
  Tri Dao 4d87e4d875 Implement GPT-J 1 year ago
  Tri Dao 78b7a1dc18 [OPT] Load fp16 weights on CPU before moving to GPU 1 year ago
  Tri Dao eb33e587e9 [LayerNorm] Rename x1 -> residual 1 year ago
  Tri Dao 88173a1aaf [FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP 1 year ago
  Tri Dao ff34123bd4 Reorder LN in Block, support OPT 1 year ago
  Tri Dao 7c2191542a [Gen] Make generation work with Tensor Parallel 1 year ago
  Tri Dao 11be742aa3 [Gen] Test generation with rotary embedding 1 year ago
  Tri Dao 93383bd55b [TP] Implement TensorParallel without sequence parallel 1 year ago
  Tri Dao 714c1b4f0f [Bert] Fix embedding layer norm before embedding dropout 1 year ago
  Tri Dao ef1ba918c6 [GPT] Refactor function to shard state_dict for TensorParallel 1 year ago
  Tri Dao 63670fd84a Implement generation for GPT 1 year ago
  Tri Dao 9d797d8848 Support loading GPT2 weights from Huggingface 2 years ago
  Tri Dao b4018a5028 Implement Tensor Parallel for GPT model 2 years ago
  Tri Dao e68ebbe89a Simplify FusedDense 2 years ago
  Tri Dao 496e4f528c Implement XPos (Sun et al.) 2 years ago
  Tri Dao 5fb6df0e04 Implement BERT 2 years ago
  Tri Dao 1feb94265c [ViT] Use dropout_add_ln for the 1st layer norm 2 years ago
  Tri Dao 2e33fc8e36 Add GPT and ViT models 2 years ago