Commit History

Autor SHA1 Mensaxe Data
  Jay Shah 9b6cba16c1 remove some debug code hai 2 meses
  Jay Shah dec7dee1b1 fix integer sign compare warning hai 2 meses
  Jay Shah 50cb90aea6 comment out unimplemented kwargs from flash_attn_with_kvcache hai 2 meses
  Jay Shah b3d60fa3a5 prune more dead code hai 2 meses
  Jay Shah 8efb953eeb remove commented out code hai 2 meses
  Jay Shah a7cce59d25 adjust tolerances in test script for kv cache hai 2 meses
  Jay Shah c06cc0ba9f change cu_seqlens_k to seqused_k for kv cache api hai 2 meses
  Jay Shah 7c1473e0e5 remove Is_batch_dynamic from seqlen traits and handle fp8 perf regression using smem boolean hai 2 meses
  Jay Shah 1ecf821207 remove constexpr checks for actual seqlen in mainloop hai 2 meses
  Jay Shah 8374e1fa78 remove test code hai 2 meses
  Jay Shah 35f3542442 refactor names hai 2 meses
  Jay Shah b0f067efdc revert epi change for fp8 due to measured perf regression hai 2 meses
  Jay Shah eb9c0ee22a add rmem -> gmem for fp8 hai 2 meses
  Jay Shah 551b91f4c9 uniform notation hai 2 meses
  Jay Shah 7169b23399 unify rmem -> gmem methods hai 2 meses
  Jay Shah ab5d336e61 better writeout logic with vectorization hai 2 meses
  Jay Shah d437d3dd5c remove smem usage for when rmem -> gmem epilogue is used hai 2 meses
  Ganesh Bikshandi e49cb5f77c passes except for hdim=256. hai 2 meses
  Ganesh Bikshandi dc2c952f37 compiles and builes. Not validates. hai 2 meses
  Ganesh Bikshandi a075e769fb handle gqa_parallel with rmem-to-gmem. Not validating yet. hai 2 meses
  Jay Shah 4a4dbd29c5 move IsRegToGmem hai 2 meses
  Jay Shah 8f45a8cfa2 tests passing now for non-gqa impl hai 2 meses
  Ganesh Bikshandi f0b49460ec changes to use tiledcopy (still not passing). hai 2 meses
  Ganesh Bikshandi 8fbefa8ac4 adding rmem to gmem. (Not validating yet). hai 2 meses
  Jay Shah 785d978165 fix bug with fp8 q layout hai 2 meses
  Jay Shah aa0e699412 move descale tensor declarations outside of conditional hai 2 meses
  Jay Shah fff4b5c09b add split kv benchmark script hai 2 meses
  Jay Shah bc4b8722f6 add crude hdim 64 heuristic hai 2 meses
  Jay Shah 930c8cad98 reorg mma code for less redundancy hai 2 meses
  Jay Shah 03200a753f removed old gqa cu files and unified methods hai 2 meses