Tri Dao 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
..
alibi.h 9486635c92 Fix typos of comments about shape. (#837) vor 6 Monaten
block_info.h 40e534a7f6 Implement cache_leftpad vor 6 Monaten
dropout.h 66a127aef8 Refactor masking in fwd pass into 1 object vor 1 Jahr
flash.h 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim128_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim128_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim128_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim128_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim160_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim160_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim160_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim160_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim192_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim192_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim192_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim192_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim224_bf16_sm80.cu ea8a25ca38 Remove configure in bwd kernel launch vor 1 Jahr
flash_bwd_hdim224_fp16_sm80.cu ea8a25ca38 Remove configure in bwd kernel launch vor 1 Jahr
flash_bwd_hdim256_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim256_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim256_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim256_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim32_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim32_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim32_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim32_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim64_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim64_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim64_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim64_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim96_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim96_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim96_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_hdim96_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_kernel.h 5ca83a9c71 Clean up softcapping bwd a bit vor 6 Monaten
flash_bwd_launch_template.h 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_bwd_preprocess_kernel.h f816dee63c Support unpadded LSE layout (#970) vor 6 Monaten
flash_fwd_hdim128_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim128_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim128_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim128_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim160_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim160_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim160_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim160_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim192_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim192_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim192_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim192_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim224_bf16_causal_sm80.cu 908511b2b6 Split into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim224_bf16_sm80.cu 908511b2b6 Split into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim224_fp16_causal_sm80.cu 908511b2b6 Split into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim224_fp16_sm80.cu 908511b2b6 Split into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim256_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim256_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim256_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim256_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim32_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim32_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim32_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim32_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim64_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim64_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim64_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim64_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim96_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim96_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim96_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_hdim96_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_kernel.h 5f1ae4a34b backwards for softcapping (#1033) vor 6 Monaten
flash_fwd_launch_template.h 751c762c9c Don't specialize for hdim 224 to speed up compilation vor 6 Monaten
flash_fwd_split_hdim128_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim128_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim128_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim128_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim160_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim160_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim160_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim160_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim192_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim192_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim192_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim192_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim224_bf16_causal_sm80.cu 908511b2b6 Split into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim224_bf16_sm80.cu 908511b2b6 Split into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim224_fp16_causal_sm80.cu 908511b2b6 Split into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim224_fp16_sm80.cu 908511b2b6 Split into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim256_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim256_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim256_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim256_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim32_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim32_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim32_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim32_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim64_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim64_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim64_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim64_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim96_bf16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim96_bf16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim96_fp16_causal_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
flash_fwd_split_hdim96_fp16_sm80.cu 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
generate_kernels.py 65f723bb9a Split bwd into more .cu files to speed up compilation vor 6 Monaten
kernel_traits.h d732be1e67 Update to Cutlass 3.5 vor 7 Monaten
mask.h 9486635c92 Fix typos of comments about shape. (#837) vor 6 Monaten
philox.cuh ed4959b2eb Change inline to __forceinline__, use __grid_constant__ param vor 1 Jahr
rotary.h d732be1e67 Update to Cutlass 3.5 vor 7 Monaten
softmax.h 23e8fa5a26 Add the option for the macro and note (#893) vor 9 Monaten
static_switch.h 751c762c9c Don't specialize for hdim 224 to speed up compilation vor 6 Monaten
utils.h 5f1ae4a34b backwards for softcapping (#1033) vor 6 Monaten