.. |
instantiations
|
e94f7e89dc
Always enable PackGQA is Split to reduce compilation and binary size
|
3 weken geleden |
__init__.py
|
7f67966cc7
FA3 initial code release
|
6 maanden geleden |
benchmark_attn.py
|
5f525322ec
Only pass sm_90a compile flag to Sm90 kernels, same w Sm89 kernels
|
3 weken geleden |
benchmark_flash_attention_fp8.py
|
efbf19cd15
Fix incorrect torch dtype (#1399)
|
3 weken geleden |
benchmark_split_kv.py
|
a5a75274bc
FA3 kvcache + split kv + gqa parallelization (#1236)
|
3 maanden geleden |
combine.h
|
478ee666cc
Make namespace comment consistent (#1305)
|
3 maanden geleden |
copy_sm90_bulk_reduce.hpp
|
7a802796e1
Big refactor and update
|
3 weken geleden |
epilogue_bwd.hpp
|
7bc3f031a4
Compile for both Sm80 and Sm90
|
3 weken geleden |
epilogue_fwd.hpp
|
7bc3f031a4
Compile for both Sm80 and Sm90
|
3 weken geleden |
flash.h
|
a84a237d2a
Split bwd softcap compilation units for Sm80
|
3 weken geleden |
flash_api.cpp
|
74aed78373
Replace c10::optional with std::optional in flash_attn
|
2 weken geleden |
flash_attn_interface.py
|
7a802796e1
Big refactor and update
|
3 weken geleden |
flash_bwd_kernel_sm80.h
|
7a802796e1
Big refactor and update
|
3 weken geleden |
flash_bwd_kernel_sm90.h
|
7a802796e1
Big refactor and update
|
3 weken geleden |
flash_bwd_launch_template.h
|
a84a237d2a
Split bwd softcap compilation units for Sm80
|
3 weken geleden |
flash_bwd_postprocess_kernel.h
|
7a802796e1
Big refactor and update
|
3 weken geleden |
flash_bwd_preprocess_kernel.h
|
7a802796e1
Big refactor and update
|
3 weken geleden |
flash_fwd_combine.cu
|
5f525322ec
Only pass sm_90a compile flag to Sm90 kernels, same w Sm89 kernels
|
3 weken geleden |
flash_fwd_combine_kernel.h
|
7a802796e1
Big refactor and update
|
3 weken geleden |
flash_fwd_combine_launch_template.h
|
7a802796e1
Big refactor and update
|
3 weken geleden |
flash_fwd_kernel_sm80.h
|
7a802796e1
Big refactor and update
|
3 weken geleden |
flash_fwd_kernel_sm90.h
|
a93359a2bf
If PackGQA, use producer threads instead of Mma threads to load Q
|
3 weken geleden |
flash_fwd_launch_template.h
|
1e3208566a
Tune tile sizes for compilation
|
3 weken geleden |
generate_kernels.py
|
e94f7e89dc
Always enable PackGQA is Split to reduce compilation and binary size
|
3 weken geleden |
heuristics.h
|
7a802796e1
Big refactor and update
|
3 weken geleden |
mainloop_bwd_sm80.hpp
|
7a802796e1
Big refactor and update
|
3 weken geleden |
mainloop_bwd_sm90_tma_gmma_ws.hpp
|
7bc3f031a4
Compile for both Sm80 and Sm90
|
3 weken geleden |
mainloop_fwd_sm80.hpp
|
84f1287e42
Rename bool_constant<true> to true_type, same w bool_constant<false>
|
3 weken geleden |
mainloop_fwd_sm90_tma_gmma_ws.hpp
|
a93359a2bf
If PackGQA, use producer threads instead of Mma threads to load Q
|
3 weken geleden |
mask.h
|
7a802796e1
Big refactor and update
|
3 weken geleden |
named_barrier.hpp
|
7bc3f031a4
Compile for both Sm80 and Sm90
|
3 weken geleden |
pack_gqa.h
|
7a802796e1
Big refactor and update
|
3 weken geleden |
padding.py
|
7a802796e1
Big refactor and update
|
3 weken geleden |
paged_kv.h
|
7a802796e1
Big refactor and update
|
3 weken geleden |
rotary.h
|
7a802796e1
Big refactor and update
|
3 weken geleden |
seqlen.h
|
7a802796e1
Big refactor and update
|
3 weken geleden |
setup.py
|
22c0358f4b
Fix nvcc_from_env not found
|
2 weken geleden |
sm90_pipeline_no_cluster.hpp
|
68bf390920
Update Cutlass to fix mem fence
|
3 weken geleden |
softmax.h
|
7a802796e1
Big refactor and update
|
3 weken geleden |
static_switch.h
|
180ff782dd
Template for Sm86
|
3 weken geleden |
test_attn_kvcache.py
|
a5a75274bc
FA3 kvcache + split kv + gqa parallelization (#1236)
|
3 maanden geleden |
test_flash_attn.py
|
2ac6c986be
Fix Sm80 tile_count_semaphore, adjust test tolerance
|
2 weken geleden |
test_kvcache.py
|
a5a75274bc
FA3 kvcache + split kv + gqa parallelization (#1236)
|
3 maanden geleden |
test_util.py
|
7a802796e1
Big refactor and update
|
3 weken geleden |
tile_scheduler.hpp
|
7bc3f031a4
Compile for both Sm80 and Sm90
|
3 weken geleden |
tile_size.h
|
1e3208566a
Tune tile sizes for compilation
|
3 weken geleden |
utils.h
|
7bc3f031a4
Compile for both Sm80 and Sm90
|
3 weken geleden |