Tri Dao ccbb14f38e Implement rotary embedding in flash_attn_with_kvcache 1 年之前
..
cutlass @ 34fd98056b 6a89b2f121 Remove constexpr in launch template to fix CI compilation 1 年之前
flash_attn ccbb14f38e Implement rotary embedding in flash_attn_with_kvcache 1 年之前
ft_attention ccbb14f38e Implement rotary embedding in flash_attn_with_kvcache 1 年之前
fused_dense_lib 27f8f890df [FusedDense] Allocate lt_workspace on input device 1 年之前
fused_softmax ed553e9238 Add Megatron attention implementation for benchmarking 2 年之前
layer_norm 767b71ccf0 Fix random state for dropout_layer_norm (#315) 1 年之前
rotary dc08ea1c33 Support H100 for other CUDA extensions 2 年之前
xentropy 5400fdc4ac [CE] Implement CrossEntropyLoss in Triton 1 年之前