1
0
Tri Dao 0519920e23 Deal with the case where q or k/v have length 0 2 долоо хоног өмнө
..
instantiations 7f5d73a162 Add env var to disable specific hdim 1 сар өмнө
__init__.py 7f67966cc7 FA3 initial code release 6 сар өмнө
benchmark_attn.py 82c1aa3514 Move PackGQA epilogue code to pack_gqa.h 2 сар өмнө
benchmark_flash_attention_fp8.py c92ca63268 FA3 FP8 qkv descales + restore max offset for h128 causal + added sync for producer WG (#1173) 5 сар өмнө
copy_sm90_bulk_reduce.hpp 29cdfedd80 Use Bulk reduce instead of TMA for dQaccum, split across WGs 1 сар өмнө
epilogue_bwd.hpp 0890032358 Implement backward pass for Sm80 2 долоо хоног өмнө
epilogue_fwd.hpp da264e5742 Change file names and class names to include sm90 suffix 1 сар өмнө
flash.h 76f14c61c9 Tune fwd tile sizes for Sm86 and Sm89 2 долоо хоног өмнө
flash_api.cpp 0519920e23 Deal with the case where q or k/v have length 0 2 долоо хоног өмнө
flash_attn_interface.py a609d82315 Change extension name to flash_attn_3_cuda 2 долоо хоног өмнө
flash_bwd_kernel_sm80.h 0890032358 Implement backward pass for Sm80 2 долоо хоног өмнө
flash_bwd_kernel_sm90.h 659a631f4c Rename bwd classes to include Sm90 suffix 3 долоо хоног өмнө
flash_bwd_launch_template.h 0890032358 Implement backward pass for Sm80 2 долоо хоног өмнө
flash_bwd_postprocess_kernel.h 0890032358 Implement backward pass for Sm80 2 долоо хоног өмнө
flash_bwd_preprocess_kernel.h 2c996ca25f Use SeqlenInfo for bwd and epilogue 1 сар өмнө
flash_fwd_combine_kernel.h 2c996ca25f Use SeqlenInfo for bwd and epilogue 1 сар өмнө
flash_fwd_combine_launch_template.h 9fd6b977bb Precompute the pointers in mha_combine kernel 2 сар өмнө
flash_fwd_combine_sm80.cu 9fd6b977bb Precompute the pointers in mha_combine kernel 2 сар өмнө
flash_fwd_kernel_sm80.h 76f14c61c9 Tune fwd tile sizes for Sm86 and Sm89 2 долоо хоног өмнө
flash_fwd_kernel_sm90.h 659a631f4c Rename bwd classes to include Sm90 suffix 3 долоо хоног өмнө
flash_fwd_launch_template.h f907a13187 Tune tile sizes for fwd varlen on Sm80 and Sm86 2 долоо хоног өмнө
generate_kernels.py 7f5d73a162 Add env var to disable specific hdim 1 сар өмнө
heuristics.h 147ac33a2e Tune num_splits for local, don't split when num_n_blocks is small 1 сар өмнө
mainloop_bwd_sm80.hpp 5acb532214 Switch to cutlass v3.6.0, fix perf regression for hdim 128 causal 2 долоо хоног өмнө
mainloop_bwd_sm90_tma_gmma_ws.hpp 0890032358 Implement backward pass for Sm80 2 долоо хоног өмнө
mainloop_fwd_sm80.hpp 5acb532214 Switch to cutlass v3.6.0, fix perf regression for hdim 128 causal 2 долоо хоног өмнө
mainloop_fwd_sm90_tma_gmma_ws.hpp 5acb532214 Switch to cutlass v3.6.0, fix perf regression for hdim 128 causal 2 долоо хоног өмнө
mask.h 51484a7b56 Make backward epilogue work for Sm80 3 долоо хоног өмнө
named_barrier.hpp 29cdfedd80 Use Bulk reduce instead of TMA for dQaccum, split across WGs 1 сар өмнө
pack_gqa.h 5acb532214 Switch to cutlass v3.6.0, fix perf regression for hdim 128 causal 2 долоо хоног өмнө
padding.py 0519920e23 Deal with the case where q or k/v have length 0 2 долоо хоног өмнө
paged_kv.h 5acb532214 Switch to cutlass v3.6.0, fix perf regression for hdim 128 causal 2 долоо хоног өмнө
rotary.h 82dc825759 Don't use the unsafe convert_type function 1 сар өмнө
seqlen.h 2c996ca25f Use SeqlenInfo for bwd and epilogue 1 сар өмнө
setup.py a609d82315 Change extension name to flash_attn_3_cuda 2 долоо хоног өмнө
sm90_pipeline_no_cluster.hpp a609d82315 Change extension name to flash_attn_3_cuda 2 долоо хоног өмнө
softmax.h 6807b1ea37 Longest-processing-time-first scheduler for causal 1 сар өмнө
static_switch.h 42fc4962f0 Uncomment tanh softcapping 1 сар өмнө
test_flash_attn.py 0519920e23 Deal with the case where q or k/v have length 0 2 долоо хоног өмнө
test_util.py 0519920e23 Deal with the case where q or k/v have length 0 2 долоо хоног өмнө
tile_scheduler.hpp a901c7eeda Make Sm80 forward pass work with persistent scheduler 3 долоо хоног өмнө
tile_size.h f907a13187 Tune tile sizes for fwd varlen on Sm80 and Sm86 2 долоо хоног өмнө
utils.h 51484a7b56 Make backward epilogue work for Sm80 3 долоо хоног өмнө