Ying Zhang
|
1c9717d699
address comments
|
2 maanden geleden |
Ying Zhang
|
be6c1b98c4
small fixes
|
2 maanden geleden |
Ying Zhang
|
dff976a84a
fixes
|
3 maanden geleden |
Ying Zhang
|
7b4e68e04f
hopper local attention
|
3 maanden geleden |
Ying Zhang
|
af314d4006
Merge pull request #1182 from ipiszy/used_q
|
2 maanden geleden |
Ying Zhang
|
8cbc8a042f
small fixes
|
2 maanden geleden |
Ying Zhang
|
cdbbe844b1
minor changes to unpad_input test util func
|
3 maanden geleden |
Ying Zhang
|
db80387343
Add seqused_q in fwd / bwd and seqused_k in bwd.
|
3 maanden geleden |
rocking
|
e2182cc21d
Support page kvcache in AMD ROCm (#1198)
|
3 maanden geleden |
Tri Dao
|
cc1690d9d6
[Rotary] Add test for rotary when qkv are packed an there's GQA
|
3 maanden geleden |
Tri Dao
|
8c20cfef49
[Rotary] Support qkv block layout from GQA
|
3 maanden geleden |
Charlene Yang
|
bdf733be55
Add q, k, v descales to FA3 interface (#1210)
|
3 maanden geleden |
Tri Dao
|
c7f32a8409
[CrossEntropy] Support precomputed LSE
|
3 maanden geleden |
juejuezi
|
e371bea04f
feat: change minimal supported CUDA version to 11.7 (#1206)
|
3 maanden geleden |
Cameron Shinn
|
3cea2fb6ee
Add ArchTag to pre/postprocess bwd kernels (#1180)
|
3 maanden geleden |
jayhshah
|
c92ca63268
FA3 FP8 qkv descales + restore max offset for h128 causal + added sync for producer WG (#1173)
|
3 maanden geleden |
Tri Dao
|
d79f9b41a8
[CrossEntropy] Use online softmax to simplify implementation
|
3 maanden geleden |
Jay Shah
|
32792d37ec
add missing if condition for key_padding_mask in test_util.py
|
3 maanden geleden |
Ying Zhang
|
28e7f4ddbd
Merge pull request #1155 from ipiszy/fix
|
3 maanden geleden |
Ying Zhang
|
53537da422
add a unittest
|
3 maanden geleden |
Ying Zhang
|
a3a257c71d
Fix out-of-bound writes for var-seq-len zero-length KVs
|
4 maanden geleden |
Tri Dao
|
bcd918f275
[LayerNorm] Add option to write result to out and residual_out
|
4 maanden geleden |
Tri Dao
|
bd82d6c6eb
Revert "[LayerNorm] Don't store x + residual if we don't need gradients"
|
4 maanden geleden |
Tri Dao
|
800401847e
[LayerNorm] Don't store x + residual if we don't need gradients
|
4 maanden geleden |
Garrett Byrd
|
16025d8cc9
Clearer install instructions for CUDA and ROCm backends (#1147)
|
4 maanden geleden |
Ying Zhang
|
3669b25206
bwd benchmark + small fixes (#1129)
|
4 maanden geleden |
Tri Dao
|
5d5bfbb619
Remove contiguous checks
|
4 maanden geleden |
SueJane
|
3f1b4d38e7
Fix: check the type of max_seqlen_k instead of checking max_seqlen twice (#1127)
|
4 maanden geleden |
Tri Dao
|
3f6ff1c1c5
Remove struct : cute::aligned_struct to avoid error with gcc 12
|
4 maanden geleden |
Tri Dao
|
c33de664a1
Fix import in test
|
4 maanden geleden |