Tri Dao
|
74af023316
Bump version to 1.0.0
|
1 year ago |
Tri Dao
|
1b18f1b7a1
Support H100
|
1 year ago |
Tri Dao
|
f28d61cb2a
Update README on requirements (nvcc and Pytorch)
|
1 year ago |
Tri Dao
|
57ee618170
Merge pull request #94 from calebthomas259/main
|
1 year ago |
Tri Dao
|
2dc2a19589
Update roadmap
|
1 year ago |
Caleb Thomas
|
c9a649805b
Add a simple tutorial to README.md
|
1 year ago |
Tri Dao
|
4a6eaa9f27
Update configs, add results
|
2 years ago |
Tri Dao
|
45bcf37b97
[Docs] Capitalize the bibtex citation
|
2 years ago |
Tri Dao
|
4040256b5e
Update pip install instructions, bump to 0.2
|
2 years ago |
Tri Dao
|
2e33fc8e36
Add GPT and ViT models
|
2 years ago |
Tri Dao
|
3dda4f76de
Update README
|
2 years ago |
Tri Dao
|
46fd2a20b2
Support all head dims that are multiples of 8, up to 128
|
2 years ago |
Tri Dao
|
2ed471ecc4
Add tests for numerical error
|
2 years ago |
Tri Dao
|
42f54d8840
Edit mention of Triton implementation
|
2 years ago |
Tri Dao
|
4577151ff8
Link to Triton implementation
|
2 years ago |
Tri Dao
|
d1fc80a3bb
Link to IEEE Spectrum article on MLPerf
|
2 years ago |
Tri Dao
|
1bbebccc0a
Edit README to mention bf16 support
|
2 years ago |
Tri Dao
|
de19de7ab1
Implement for bf16
|
2 years ago |
Tri Dao
|
6c3a8c65af
Implement cross attention
|
2 years ago |
Tri Dao
|
450b64fe44
Add README section on issues
|
2 years ago |
Dan Fu
|
765741c1ee
More explanation
|
2 years ago |
Dan Fu
|
2d5b2483b8
Speedup graph for A100, d128
|
2 years ago |
Tri Dao
|
d3e6440958
Implement bwd for head dim 128
|
2 years ago |
Dan Fu
|
0a398dfc37
Broken link
|
2 years ago |
Dan Fu
|
bd60750e0b
T4
|
2 years ago |
Tri Dao
|
f2d8d4104e
Edit README: support Turing (SM75)
|
2 years ago |
Dan Fu
|
ad6c694bb3
3090 speedup
|
2 years ago |
Tri Dao
|
5a61cb7729
Rename src -> flash_attn
|
2 years ago |
Tri Dao
|
c41479d66d
Support SM86 GPUs
|
2 years ago |
Dan Fu
|
4b7cfb5f45
Citation
|
2 years ago |