.. |
cutlass @ 34fd98056b
|
6a89b2f121
Remove constexpr in launch template to fix CI compilation
|
1 年之前 |
flash_attn
|
ccbb14f38e
Implement rotary embedding in flash_attn_with_kvcache
|
1 年之前 |
ft_attention
|
ccbb14f38e
Implement rotary embedding in flash_attn_with_kvcache
|
1 年之前 |
fused_dense_lib
|
27f8f890df
[FusedDense] Allocate lt_workspace on input device
|
1 年之前 |
fused_softmax
|
ed553e9238
Add Megatron attention implementation for benchmarking
|
2 年之前 |
layer_norm
|
767b71ccf0
Fix random state for dropout_layer_norm (#315)
|
1 年之前 |
rotary
|
dc08ea1c33
Support H100 for other CUDA extensions
|
2 年之前 |
xentropy
|
5400fdc4ac
[CE] Implement CrossEntropyLoss in Triton
|
1 年之前 |