Commit History

Author SHA1 Message Date
  Jay Shah 9b6cba16c1 remove some debug code 2 months ago
  Jay Shah dec7dee1b1 fix integer sign compare warning 2 months ago
  Jay Shah 50cb90aea6 comment out unimplemented kwargs from flash_attn_with_kvcache 2 months ago
  Jay Shah b3d60fa3a5 prune more dead code 2 months ago
  Jay Shah 8efb953eeb remove commented out code 2 months ago
  Jay Shah a7cce59d25 adjust tolerances in test script for kv cache 2 months ago
  Jay Shah c06cc0ba9f change cu_seqlens_k to seqused_k for kv cache api 2 months ago
  Jay Shah 7c1473e0e5 remove Is_batch_dynamic from seqlen traits and handle fp8 perf regression using smem boolean 2 months ago
  Jay Shah 1ecf821207 remove constexpr checks for actual seqlen in mainloop 2 months ago
  Jay Shah 8374e1fa78 remove test code 2 months ago
  Jay Shah 35f3542442 refactor names 2 months ago
  Jay Shah b0f067efdc revert epi change for fp8 due to measured perf regression 2 months ago
  Jay Shah eb9c0ee22a add rmem -> gmem for fp8 2 months ago
  Jay Shah 551b91f4c9 uniform notation 2 months ago
  Jay Shah 7169b23399 unify rmem -> gmem methods 2 months ago
  Jay Shah ab5d336e61 better writeout logic with vectorization 2 months ago
  Jay Shah d437d3dd5c remove smem usage for when rmem -> gmem epilogue is used 2 months ago
  Ganesh Bikshandi e49cb5f77c passes except for hdim=256. 2 months ago
  Ganesh Bikshandi dc2c952f37 compiles and builes. Not validates. 2 months ago
  Ganesh Bikshandi a075e769fb handle gqa_parallel with rmem-to-gmem. Not validating yet. 2 months ago
  Jay Shah 4a4dbd29c5 move IsRegToGmem 2 months ago
  Jay Shah 8f45a8cfa2 tests passing now for non-gqa impl 2 months ago
  Ganesh Bikshandi f0b49460ec changes to use tiledcopy (still not passing). 2 months ago
  Ganesh Bikshandi 8fbefa8ac4 adding rmem to gmem. (Not validating yet). 2 months ago
  Jay Shah 785d978165 fix bug with fp8 q layout 2 months ago
  Jay Shah aa0e699412 move descale tensor declarations outside of conditional 2 months ago
  Jay Shah fff4b5c09b add split kv benchmark script 2 months ago
  Jay Shah bc4b8722f6 add crude hdim 64 heuristic 2 months ago
  Jay Shah 930c8cad98 reorg mma code for less redundancy 2 months ago
  Jay Shah 03200a753f removed old gqa cu files and unified methods 2 months ago