.. |
instantiations
|
7f5d73a162
Add env var to disable specific hdim
|
1 månad sedan |
__init__.py
|
7f67966cc7
FA3 initial code release
|
6 månader sedan |
benchmark_attn.py
|
82c1aa3514
Move PackGQA epilogue code to pack_gqa.h
|
2 månader sedan |
benchmark_flash_attention_fp8.py
|
c92ca63268
FA3 FP8 qkv descales + restore max offset for h128 causal + added sync for producer WG (#1173)
|
5 månader sedan |
copy_sm90_bulk_reduce.hpp
|
29cdfedd80
Use Bulk reduce instead of TMA for dQaccum, split across WGs
|
1 månad sedan |
epilogue_bwd.hpp
|
0890032358
Implement backward pass for Sm80
|
2 veckor sedan |
epilogue_fwd.hpp
|
da264e5742
Change file names and class names to include sm90 suffix
|
1 månad sedan |
flash.h
|
76f14c61c9
Tune fwd tile sizes for Sm86 and Sm89
|
2 veckor sedan |
flash_api.cpp
|
0519920e23
Deal with the case where q or k/v have length 0
|
2 veckor sedan |
flash_attn_interface.py
|
a609d82315
Change extension name to flash_attn_3_cuda
|
2 veckor sedan |
flash_bwd_kernel_sm80.h
|
0890032358
Implement backward pass for Sm80
|
2 veckor sedan |
flash_bwd_kernel_sm90.h
|
659a631f4c
Rename bwd classes to include Sm90 suffix
|
3 veckor sedan |
flash_bwd_launch_template.h
|
0890032358
Implement backward pass for Sm80
|
2 veckor sedan |
flash_bwd_postprocess_kernel.h
|
0890032358
Implement backward pass for Sm80
|
2 veckor sedan |
flash_bwd_preprocess_kernel.h
|
2c996ca25f
Use SeqlenInfo for bwd and epilogue
|
1 månad sedan |
flash_fwd_combine_kernel.h
|
2c996ca25f
Use SeqlenInfo for bwd and epilogue
|
1 månad sedan |
flash_fwd_combine_launch_template.h
|
9fd6b977bb
Precompute the pointers in mha_combine kernel
|
2 månader sedan |
flash_fwd_combine_sm80.cu
|
9fd6b977bb
Precompute the pointers in mha_combine kernel
|
2 månader sedan |
flash_fwd_kernel_sm80.h
|
76f14c61c9
Tune fwd tile sizes for Sm86 and Sm89
|
2 veckor sedan |
flash_fwd_kernel_sm90.h
|
659a631f4c
Rename bwd classes to include Sm90 suffix
|
3 veckor sedan |
flash_fwd_launch_template.h
|
f907a13187
Tune tile sizes for fwd varlen on Sm80 and Sm86
|
2 veckor sedan |
generate_kernels.py
|
7f5d73a162
Add env var to disable specific hdim
|
1 månad sedan |
heuristics.h
|
147ac33a2e
Tune num_splits for local, don't split when num_n_blocks is small
|
1 månad sedan |
mainloop_bwd_sm80.hpp
|
5acb532214
Switch to cutlass v3.6.0, fix perf regression for hdim 128 causal
|
2 veckor sedan |
mainloop_bwd_sm90_tma_gmma_ws.hpp
|
0890032358
Implement backward pass for Sm80
|
2 veckor sedan |
mainloop_fwd_sm80.hpp
|
5acb532214
Switch to cutlass v3.6.0, fix perf regression for hdim 128 causal
|
2 veckor sedan |
mainloop_fwd_sm90_tma_gmma_ws.hpp
|
5acb532214
Switch to cutlass v3.6.0, fix perf regression for hdim 128 causal
|
2 veckor sedan |
mask.h
|
51484a7b56
Make backward epilogue work for Sm80
|
3 veckor sedan |
named_barrier.hpp
|
29cdfedd80
Use Bulk reduce instead of TMA for dQaccum, split across WGs
|
1 månad sedan |
pack_gqa.h
|
5acb532214
Switch to cutlass v3.6.0, fix perf regression for hdim 128 causal
|
2 veckor sedan |
padding.py
|
0519920e23
Deal with the case where q or k/v have length 0
|
2 veckor sedan |
paged_kv.h
|
5acb532214
Switch to cutlass v3.6.0, fix perf regression for hdim 128 causal
|
2 veckor sedan |
rotary.h
|
82dc825759
Don't use the unsafe convert_type function
|
1 månad sedan |
seqlen.h
|
2c996ca25f
Use SeqlenInfo for bwd and epilogue
|
1 månad sedan |
setup.py
|
a609d82315
Change extension name to flash_attn_3_cuda
|
2 veckor sedan |
sm90_pipeline_no_cluster.hpp
|
a609d82315
Change extension name to flash_attn_3_cuda
|
2 veckor sedan |
softmax.h
|
6807b1ea37
Longest-processing-time-first scheduler for causal
|
1 månad sedan |
static_switch.h
|
42fc4962f0
Uncomment tanh softcapping
|
1 månad sedan |
test_flash_attn.py
|
0519920e23
Deal with the case where q or k/v have length 0
|
2 veckor sedan |
test_util.py
|
0519920e23
Deal with the case where q or k/v have length 0
|
2 veckor sedan |
tile_scheduler.hpp
|
a901c7eeda
Make Sm80 forward pass work with persistent scheduler
|
3 veckor sedan |
tile_size.h
|
f907a13187
Tune tile sizes for fwd varlen on Sm80 and Sm86
|
2 veckor sedan |
utils.h
|
51484a7b56
Make backward epilogue work for Sm80
|
3 veckor sedan |