david/flash-attention

Autor	SHA1 Správa	Dátum
Michał Górny	6b1d059eda Support ROCM builds from source distribution, and improve error handling (#1446)	18 hodín pred
Kirthi Shankar Sivamani	07bddf918a Blackwell support (#1436)	1 týždeň pred
Tri Dao	f86e3dd919 [CI] Use MAX_JOBS=1 with nvcc 12.3, don't need OLD_GENERATOR_PATH	1 mesiac pred
Tri Dao	9375ac9322 [CI] Don't include <ATen/cuda/CUDAGraphsUtils.cuh>	1 mesiac pred
Tri Dao	073afd5931 [CI] Use torch 2.6.0.dev20241001, reduce torch #include	1 mesiac pred
Michael Melesse	b518517cb8 [AMD] Triton Backend for ROCm (#1203)	1 mesiac pred
Tri Dao	241c682c9f [CI] Switch back to CUDA 12.4	2 mesiacov pred
Tri Dao	6ffeb572b1 [CI] Still use CUDA 12.3 but pull the right pytorch version	2 mesiacov pred
Ethan Steinberg	42f2b8be34 Use CUDA 12.4 in the build system (#1326)	2 mesiacov pred
rocking	e2182cc21d Support page kvcache in AMD ROCm (#1198)	4 mesiacov pred
juejuezi	e371bea04f feat: change minimal supported CUDA version to 11.7 (#1206)	4 mesiacov pred
Tri Dao	65f723bb9a Split bwd into more .cu files to speed up compilation	5 mesiacov pred
Tri Dao	751c762c9c Don't specialize for hdim 224 to speed up compilation	5 mesiacov pred
rocking	d8f104e97a Support AMD ROCm on FlashAttention 2 (#1010)	5 mesiacov pred
Tri Dao	844912dca0 [CI] Switch from CUDA 12.2 to 12.3	6 mesiacov pred
Tri Dao	908511b2b6 Split into more .cu files to speed up compilation	6 mesiacov pred
Tri Dao	beb2bf2a32 Drop support for pytorch 1.12, 1.13, and python 3.7	6 mesiacov pred
Nicolas Patry	8f873cc6ac Implement softcapping. (#1025)	6 mesiacov pred
Corey James Levinson	beb8b8ba9f add exception to Timeout Error (#963)	7 mesiacov pred
Wei Ji	9c0e9ee86d Move packaging and ninja from install_requires to setup_requires (#937)	8 mesiacov pred
Tri Dao	2aea958f89 [CI] Compile with torch 2.3.0.dev20240207	9 mesiacov pred
Arvind Sundararajan	26c9e82743 Support ARM builds (#757)	10 mesiacov pred
Chirag Jain	50896ec574 Make nvcc threads configurable via environment variable (#885)	10 mesiacov pred
Qubitium	f45bbb4c94 Optimize compile to 1: avoid oom 2: minimize swap usage 3: avoid threads starvation when letting ninja decide how many workers to spawn or manual MAX_JOBS "guesses". Logic is to take the min value of MAX_JOBS auto-calculated by two metrics: 1: cpu cores 2: free memory. This should allow flash-attn to compile close to the most efficient manner under any consumer/server env. (#832)	11 mesiacov pred
Tri Dao	d4a7c8ffbb [CI] Only compile for CUDA 11.8 & 12.2, MAX_JOBS=2,add torch-nightly	1 rok pred
Tri Dao	5e525a8dc8 [CI] Use official Pytorch 2.1, add CUDA 11.8 for Pytorch 2.1	1 rok pred
Tri Dao	1879e089c7 Reduce number of templates for headdim > 128	1 rok pred
Tri Dao	bff3147175 Re-enable compilation for Hopper	1 rok pred
Tri Dao	dfe29f5e2b [Gen] Don't use ft_attention, use flash_attn_with_kvcache instead	1 rok pred
Federico Berto	fa3ddcbaaa [Minor] add nvcc note on bare_metal_version `RuntimeError` (#552)	1 rok pred

Novšie Staršie

História revízii Nájsť

História revízii