Commit Verlauf

Autor SHA1 Nachricht Datum
  Jay Shah 9b6cba16c1 remove some debug code vor 2 Monaten
  Jay Shah dec7dee1b1 fix integer sign compare warning vor 2 Monaten
  Jay Shah 50cb90aea6 comment out unimplemented kwargs from flash_attn_with_kvcache vor 2 Monaten
  Jay Shah b3d60fa3a5 prune more dead code vor 2 Monaten
  Jay Shah 8efb953eeb remove commented out code vor 2 Monaten
  Jay Shah a7cce59d25 adjust tolerances in test script for kv cache vor 2 Monaten
  Jay Shah c06cc0ba9f change cu_seqlens_k to seqused_k for kv cache api vor 2 Monaten
  Jay Shah 7c1473e0e5 remove Is_batch_dynamic from seqlen traits and handle fp8 perf regression using smem boolean vor 2 Monaten
  Jay Shah 1ecf821207 remove constexpr checks for actual seqlen in mainloop vor 2 Monaten
  Jay Shah 8374e1fa78 remove test code vor 2 Monaten
  Jay Shah 35f3542442 refactor names vor 2 Monaten
  Jay Shah b0f067efdc revert epi change for fp8 due to measured perf regression vor 2 Monaten
  Jay Shah eb9c0ee22a add rmem -> gmem for fp8 vor 2 Monaten
  Jay Shah 551b91f4c9 uniform notation vor 2 Monaten
  Jay Shah 7169b23399 unify rmem -> gmem methods vor 2 Monaten
  Jay Shah ab5d336e61 better writeout logic with vectorization vor 2 Monaten
  Jay Shah d437d3dd5c remove smem usage for when rmem -> gmem epilogue is used vor 2 Monaten
  Ganesh Bikshandi e49cb5f77c passes except for hdim=256. vor 2 Monaten
  Ganesh Bikshandi dc2c952f37 compiles and builes. Not validates. vor 2 Monaten
  Ganesh Bikshandi a075e769fb handle gqa_parallel with rmem-to-gmem. Not validating yet. vor 2 Monaten
  Jay Shah 4a4dbd29c5 move IsRegToGmem vor 2 Monaten
  Jay Shah 8f45a8cfa2 tests passing now for non-gqa impl vor 2 Monaten
  Ganesh Bikshandi f0b49460ec changes to use tiledcopy (still not passing). vor 2 Monaten
  Ganesh Bikshandi 8fbefa8ac4 adding rmem to gmem. (Not validating yet). vor 2 Monaten
  Jay Shah 785d978165 fix bug with fp8 q layout vor 2 Monaten
  Jay Shah aa0e699412 move descale tensor declarations outside of conditional vor 2 Monaten
  Jay Shah fff4b5c09b add split kv benchmark script vor 2 Monaten
  Jay Shah bc4b8722f6 add crude hdim 64 heuristic vor 2 Monaten
  Jay Shah 930c8cad98 reorg mma code for less redundancy vor 2 Monaten
  Jay Shah 03200a753f removed old gqa cu files and unified methods vor 2 Monaten