david/flash-attention

Author	SHA1 Message	Date
Tri Dao	74af023316 Bump version to 1.0.0	1 year ago
Tri Dao	1b18f1b7a1 Support H100	1 year ago
Tri Dao	f28d61cb2a Update README on requirements (nvcc and Pytorch)	1 year ago
Tri Dao	57ee618170 Merge pull request #94 from calebthomas259/main	1 year ago
Tri Dao	2dc2a19589 Update roadmap	1 year ago
Caleb Thomas	c9a649805b Add a simple tutorial to README.md	1 year ago
Tri Dao	4a6eaa9f27 Update configs, add results	2 years ago
Tri Dao	45bcf37b97 [Docs] Capitalize the bibtex citation	2 years ago
Tri Dao	4040256b5e Update pip install instructions, bump to 0.2	2 years ago
Tri Dao	2e33fc8e36 Add GPT and ViT models	2 years ago
Tri Dao	3dda4f76de Update README	2 years ago
Tri Dao	46fd2a20b2 Support all head dims that are multiples of 8, up to 128	2 years ago
Tri Dao	2ed471ecc4 Add tests for numerical error	2 years ago
Tri Dao	42f54d8840 Edit mention of Triton implementation	2 years ago
Tri Dao	4577151ff8 Link to Triton implementation	2 years ago
Tri Dao	d1fc80a3bb Link to IEEE Spectrum article on MLPerf	2 years ago
Tri Dao	1bbebccc0a Edit README to mention bf16 support	2 years ago
Tri Dao	de19de7ab1 Implement for bf16	2 years ago
Tri Dao	6c3a8c65af Implement cross attention	2 years ago
Tri Dao	450b64fe44 Add README section on issues	2 years ago
Dan Fu	765741c1ee More explanation	2 years ago
Dan Fu	2d5b2483b8 Speedup graph for A100, d128	2 years ago
Tri Dao	d3e6440958 Implement bwd for head dim 128	2 years ago
Dan Fu	0a398dfc37 Broken link	2 years ago
Dan Fu	bd60750e0b T4	2 years ago
Tri Dao	f2d8d4104e Edit README: support Turing (SM75)	2 years ago
Dan Fu	ad6c694bb3 3090 speedup	2 years ago
Tri Dao	5a61cb7729 Rename src -> flash_attn	2 years ago
Tri Dao	c41479d66d Support SM86 GPUs	2 years ago
Dan Fu	4b7cfb5f45 Citation	2 years ago

Newer Older

Commit History Find

Commit History