Commit History

Author SHA1 Message Date
  Tri Dao 74af023316 Bump version to 1.0.0 1 year ago
  Tri Dao 1b18f1b7a1 Support H100 1 year ago
  Tri Dao f28d61cb2a Update README on requirements (nvcc and Pytorch) 1 year ago
  Tri Dao 57ee618170 Merge pull request #94 from calebthomas259/main 1 year ago
  Tri Dao 2dc2a19589 Update roadmap 1 year ago
  Caleb Thomas c9a649805b Add a simple tutorial to README.md 1 year ago
  Tri Dao 4a6eaa9f27 Update configs, add results 2 years ago
  Tri Dao 45bcf37b97 [Docs] Capitalize the bibtex citation 2 years ago
  Tri Dao 4040256b5e Update pip install instructions, bump to 0.2 2 years ago
  Tri Dao 2e33fc8e36 Add GPT and ViT models 2 years ago
  Tri Dao 3dda4f76de Update README 2 years ago
  Tri Dao 46fd2a20b2 Support all head dims that are multiples of 8, up to 128 2 years ago
  Tri Dao 2ed471ecc4 Add tests for numerical error 2 years ago
  Tri Dao 42f54d8840 Edit mention of Triton implementation 2 years ago
  Tri Dao 4577151ff8 Link to Triton implementation 2 years ago
  Tri Dao d1fc80a3bb Link to IEEE Spectrum article on MLPerf 2 years ago
  Tri Dao 1bbebccc0a Edit README to mention bf16 support 2 years ago
  Tri Dao de19de7ab1 Implement for bf16 2 years ago
  Tri Dao 6c3a8c65af Implement cross attention 2 years ago
  Tri Dao 450b64fe44 Add README section on issues 2 years ago
  Dan Fu 765741c1ee More explanation 2 years ago
  Dan Fu 2d5b2483b8 Speedup graph for A100, d128 2 years ago
  Tri Dao d3e6440958 Implement bwd for head dim 128 2 years ago
  Dan Fu 0a398dfc37 Broken link 2 years ago
  Dan Fu bd60750e0b T4 2 years ago
  Tri Dao f2d8d4104e Edit README: support Turing (SM75) 2 years ago
  Dan Fu ad6c694bb3 3090 speedup 2 years ago
  Tri Dao 5a61cb7729 Rename src -> flash_attn 2 years ago
  Tri Dao c41479d66d Support SM86 GPUs 2 years ago
  Dan Fu 4b7cfb5f45 Citation 2 years ago