Commit History

Autor SHA1 Mensaxe Data
  rocking e2182cc21d Support page kvcache in AMD ROCm (#1198) hai 3 meses
  juejuezi e371bea04f feat: change minimal supported CUDA version to 11.7 (#1206) hai 3 meses
  Tri Dao 65f723bb9a Split bwd into more .cu files to speed up compilation hai 4 meses
  Tri Dao 751c762c9c Don't specialize for hdim 224 to speed up compilation hai 4 meses
  rocking d8f104e97a Support AMD ROCm on FlashAttention 2 (#1010) hai 4 meses
  Tri Dao 844912dca0 [CI] Switch from CUDA 12.2 to 12.3 hai 5 meses
  Tri Dao 908511b2b6 Split into more .cu files to speed up compilation hai 5 meses
  Tri Dao beb2bf2a32 Drop support for pytorch 1.12, 1.13, and python 3.7 hai 5 meses
  Nicolas Patry 8f873cc6ac Implement softcapping. (#1025) hai 5 meses
  Corey James Levinson beb8b8ba9f add exception to Timeout Error (#963) hai 6 meses
  Wei Ji 9c0e9ee86d Move packaging and ninja from install_requires to setup_requires (#937) hai 7 meses
  Tri Dao 2aea958f89 [CI] Compile with torch 2.3.0.dev20240207 hai 8 meses
  Arvind Sundararajan 26c9e82743 Support ARM builds (#757) hai 9 meses
  Chirag Jain 50896ec574 Make nvcc threads configurable via environment variable (#885) hai 9 meses
  Qubitium f45bbb4c94 Optimize compile to 1: avoid oom 2: minimize swap usage 3: avoid threads starvation when letting ninja decide how many workers to spawn or manual MAX_JOBS "guesses". Logic is to take the min value of MAX_JOBS auto-calculated by two metrics: 1: cpu cores 2: free memory. This should allow flash-attn to compile close to the most efficient manner under any consumer/server env. (#832) hai 10 meses
  Tri Dao d4a7c8ffbb [CI] Only compile for CUDA 11.8 & 12.2, MAX_JOBS=2,add torch-nightly hai 1 ano
  Tri Dao 5e525a8dc8 [CI] Use official Pytorch 2.1, add CUDA 11.8 for Pytorch 2.1 hai 1 ano
  Tri Dao 1879e089c7 Reduce number of templates for headdim > 128 hai 1 ano
  Tri Dao bff3147175 Re-enable compilation for Hopper hai 1 ano
  Tri Dao dfe29f5e2b [Gen] Don't use ft_attention, use flash_attn_with_kvcache instead hai 1 ano
  Federico Berto fa3ddcbaaa [Minor] add nvcc note on bare_metal_version `RuntimeError` (#552) hai 1 ano
  Tri Dao 799f56fa90 Don't compile for Pytorch 2.1 on CUDA 12.1 due to nvcc segfaults hai 1 ano
  Tri Dao bb9beb3645 Remove some unused headers hai 1 ano
  Tri Dao 0c04943fa2 Require CUDA 11.6+, clean up setup.py hai 1 ano
  Tri Dao b1fbbd8337 Implement splitKV attention hai 1 ano
  Tri Dao cbb4cf5f46 Don't need to set TORCH_CUDA_ARCH_LIST in setup.py hai 1 ano
  Aman Gupta Karmani aab603af4f fix binary wheel installation when nvcc is not available (#448) hai 1 ano
  Tri Dao 9c531bdc0a Use single thread compilation for cuda12.1, torch2.1 to avoid OOM CI hai 1 ano
  Tri Dao 2ddeaa406c Fix wheel building hai 1 ano
  Tri Dao 3c458cff77 Merge branch 'feature/demo-wheels' of https://github.com/piercefreeman/flash-attention into piercefreeman-feature/demo-wheels hai 1 ano