david/flash-attention

Autor	SHA1 Mensaxe	Data
Tri Dao	abbc131173 [LayerNorm] Switch from CUDA to Triton implementation	hai 1 ano
Tri Dao	f1a73d0740 Run isort and black on python files	hai 1 ano
Tri Dao	75e334d407 [MLP] Add ParallelMLP	hai 1 ano
Tri Dao	b3177dfaf6 [GPT] Enable FlashAttention for GPT-J	hai 1 ano
Tri Dao	6fc1e07da2 [Block] Re-enable DropPath	hai 1 ano
Tri Dao	4f285b3547 FlashAttention-2 release	hai 1 ano
ljss	8e44c0eefb Fix a bug	hai 1 ano
Federico Berto	3889ba168b [BugFix] cannot unpack non-iterable NoneType object	hai 1 ano
Tri Dao	ba2fe7f378 [Gen] Move allocate_inference_cache to within the model	hai 1 ano
Tri Dao	96d10f6545 Implement LLaMa	hai 1 ano
Tri Dao	393882bc08 [LayerNorm] Implement LN with parallel residual, support dim 8k	hai 1 ano
Tri Dao	4d87e4d875 Implement GPT-J	hai 1 ano
Tri Dao	88173a1aaf [FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP	%!s(int64=2) %!d(string=hai) anos
Tri Dao	780e8eeabb [ViT] Support timm checkpoint, add tests	%!s(int64=2) %!d(string=hai) anos
Tri Dao	ef085cfcda [ViT] Fix extra norm_0, use new LN order in Block	%!s(int64=2) %!d(string=hai) anos
Tri Dao	ff34123bd4 Reorder LN in Block, support OPT	%!s(int64=2) %!d(string=hai) anos
Tri Dao	93383bd55b [TP] Implement TensorParallel without sequence parallel	%!s(int64=2) %!d(string=hai) anos
Tri Dao	a8cfe51551 Implement Tensor Parallel for transformer Block	%!s(int64=2) %!d(string=hai) anos
Tri Dao	5fb6df0e04 Implement BERT	%!s(int64=2) %!d(string=hai) anos
Tri Dao	d4b320b31f Add MLP, MHA, Block, Embedding modules	%!s(int64=2) %!d(string=hai) anos

Commit History Buscar

Commit History