david/flash-attention

Author	SHA1 Message	Date
Tri Dao	c3b2196652 Add Alibi to MHA, test with Baichuan-13B	1 year ago
Tri Dao	3557e0bb8f [MLP] Implement SwiGLU with torch jiterator	1 year ago
Tri Dao	f1a73d0740 Run isort and black on python files	1 year ago
Tri Dao	364a5b4a71 [MLP] Change the check for out_features being None	1 year ago
Tri Dao	4c98d0b41f [MLP] Edit ParallelGatedMlp	1 year ago
Haodong Lyu	8ee62efca3 Implement ParallelGatedMlp (#251)	1 year ago
Tri Dao	75e334d407 [MLP] Add ParallelMLP	1 year ago
Tri Dao	96d10f6545 Implement LLaMa	1 year ago
Tri Dao	b630aef53f Implement GatedMlp	1 year ago
Zhiyuan Chen	8c42415664 make mlp hidden_features defaults to 4*in_features	1 year ago
Tri Dao	88173a1aaf [FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP	1 year ago
Tri Dao	226a1b721d Implement TensorParallel for FusedDense and FusedDenseGeluDense	2 years ago
Tri Dao	e68ebbe89a Simplify FusedDense	2 years ago
Tri Dao	13cdceb377 Implement last_layer_subset optimization for BERT	2 years ago
Tri Dao	1feb94265c [ViT] Use dropout_add_ln for the 1st layer norm	2 years ago
Tri Dao	d4b320b31f Add MLP, MHA, Block, Embedding modules	2 years ago