Tri Dao
|
f86e3dd919
[CI] Use MAX_JOBS=1 with nvcc 12.3, don't need OLD_GENERATOR_PATH
|
1 semana atrás |
Tri Dao
|
9375ac9322
[CI] Don't include <ATen/cuda/CUDAGraphsUtils.cuh>
|
1 semana atrás |
Tri Dao
|
073afd5931
[CI] Use torch 2.6.0.dev20241001, reduce torch #include
|
1 semana atrás |
Michael Melesse
|
b518517cb8
[AMD] Triton Backend for ROCm (#1203)
|
1 semana atrás |
Tri Dao
|
241c682c9f
[CI] Switch back to CUDA 12.4
|
1 mês atrás |
Tri Dao
|
6ffeb572b1
[CI] Still use CUDA 12.3 but pull the right pytorch version
|
1 mês atrás |
Ethan Steinberg
|
42f2b8be34
Use CUDA 12.4 in the build system (#1326)
|
1 mês atrás |
rocking
|
e2182cc21d
Support page kvcache in AMD ROCm (#1198)
|
3 meses atrás |
juejuezi
|
e371bea04f
feat: change minimal supported CUDA version to 11.7 (#1206)
|
3 meses atrás |
Tri Dao
|
65f723bb9a
Split bwd into more .cu files to speed up compilation
|
4 meses atrás |
Tri Dao
|
751c762c9c
Don't specialize for hdim 224 to speed up compilation
|
4 meses atrás |
rocking
|
d8f104e97a
Support AMD ROCm on FlashAttention 2 (#1010)
|
4 meses atrás |
Tri Dao
|
844912dca0
[CI] Switch from CUDA 12.2 to 12.3
|
5 meses atrás |
Tri Dao
|
908511b2b6
Split into more .cu files to speed up compilation
|
5 meses atrás |
Tri Dao
|
beb2bf2a32
Drop support for pytorch 1.12, 1.13, and python 3.7
|
5 meses atrás |
Nicolas Patry
|
8f873cc6ac
Implement softcapping. (#1025)
|
5 meses atrás |
Corey James Levinson
|
beb8b8ba9f
add exception to Timeout Error (#963)
|
6 meses atrás |
Wei Ji
|
9c0e9ee86d
Move packaging and ninja from install_requires to setup_requires (#937)
|
7 meses atrás |
Tri Dao
|
2aea958f89
[CI] Compile with torch 2.3.0.dev20240207
|
8 meses atrás |
Arvind Sundararajan
|
26c9e82743
Support ARM builds (#757)
|
9 meses atrás |
Chirag Jain
|
50896ec574
Make nvcc threads configurable via environment variable (#885)
|
9 meses atrás |
Qubitium
|
f45bbb4c94
Optimize compile to 1: avoid oom 2: minimize swap usage 3: avoid threads starvation when letting ninja decide how many workers to spawn or manual MAX_JOBS "guesses". Logic is to take the min value of MAX_JOBS auto-calculated by two metrics: 1: cpu cores 2: free memory. This should allow flash-attn to compile close to the most efficient manner under any consumer/server env. (#832)
|
10 meses atrás |
Tri Dao
|
d4a7c8ffbb
[CI] Only compile for CUDA 11.8 & 12.2, MAX_JOBS=2,add torch-nightly
|
1 ano atrás |
Tri Dao
|
5e525a8dc8
[CI] Use official Pytorch 2.1, add CUDA 11.8 for Pytorch 2.1
|
1 ano atrás |
Tri Dao
|
1879e089c7
Reduce number of templates for headdim > 128
|
1 ano atrás |
Tri Dao
|
bff3147175
Re-enable compilation for Hopper
|
1 ano atrás |
Tri Dao
|
dfe29f5e2b
[Gen] Don't use ft_attention, use flash_attn_with_kvcache instead
|
1 ano atrás |
Federico Berto
|
fa3ddcbaaa
[Minor] add nvcc note on bare_metal_version `RuntimeError` (#552)
|
1 ano atrás |
Tri Dao
|
799f56fa90
Don't compile for Pytorch 2.1 on CUDA 12.1 due to nvcc segfaults
|
1 ano atrás |
Tri Dao
|
bb9beb3645
Remove some unused headers
|
1 ano atrás |