david/flash-attention

Author	SHA1 Message	Date
Tri Dao	3da42d24b1 [GPT] Add option to only return the logit for the last token	1 year ago
Tri Dao	96d10f6545 Implement LLaMa	1 year ago
Tri Dao	b630aef53f Implement GatedMlp	1 year ago
Tri Dao	6f6e9a9aaf [FusedDense] Enable sqrelu activation in FusedMLP	1 year ago
Tri Dao	393882bc08 [LayerNorm] Implement LN with parallel residual, support dim 8k	1 year ago
Tri Dao	993d12448e Implement GPT-NeoX	1 year ago
Tri Dao	4d87e4d875 Implement GPT-J	1 year ago
Tri Dao	78b7a1dc18 [OPT] Load fp16 weights on CPU before moving to GPU	1 year ago
Tri Dao	eb33e587e9 [LayerNorm] Rename x1 -> residual	1 year ago
Tri Dao	88173a1aaf [FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP	1 year ago
Tri Dao	ff34123bd4 Reorder LN in Block, support OPT	1 year ago
Tri Dao	7c2191542a [Gen] Make generation work with Tensor Parallel	1 year ago
Tri Dao	11be742aa3 [Gen] Test generation with rotary embedding	1 year ago
Tri Dao	93383bd55b [TP] Implement TensorParallel without sequence parallel	1 year ago
Tri Dao	714c1b4f0f [Bert] Fix embedding layer norm before embedding dropout	1 year ago
Tri Dao	ef1ba918c6 [GPT] Refactor function to shard state_dict for TensorParallel	1 year ago
Tri Dao	63670fd84a Implement generation for GPT	1 year ago
Tri Dao	9d797d8848 Support loading GPT2 weights from Huggingface	2 years ago
Tri Dao	b4018a5028 Implement Tensor Parallel for GPT model	2 years ago
Tri Dao	e68ebbe89a Simplify FusedDense	2 years ago
Tri Dao	496e4f528c Implement XPos (Sun et al.)	2 years ago
Tri Dao	5fb6df0e04 Implement BERT	2 years ago
Tri Dao	1feb94265c [ViT] Use dropout_add_ln for the 1st layer norm	2 years ago
Tri Dao	2e33fc8e36 Add GPT and ViT models	2 years ago

Newer Older

Commit History Find

Commit History