Commit History

Autor SHA1 Mensaxe Data
  Tri Dao 4f0640d534 Move writing P to smem as separate function hai 1 día
  Tri Dao 3edf7e0daa Add kwargs to _write_ninja_file for compatibility with new torch hai 1 día
  Tri Dao 45c48afb2b Add option for WG1 to use RS MMA but WG2 using SS MMA hai 2 días
  xin-w8023 6865e60145 fix: prompt index to type longlong to avoid numerical overflow (#1500) hai 4 días
  Tri Dao 5458c78e6d Remove sink token hai 4 días
  Tri Dao 20b84d6363 Don't use IntraWGOverlap for hdim 64,512 hai 4 días
  Lucas Wilkinson 39e7197564 Fix cuda 12.1 build (#1511) hai 5 días
  Tri Dao 085ce5864a Change margin in prepare_scheduler.cu from 20% to 10% hai 5 días
  Tri Dao 08f4c802c4 Add FLOPS to MLA decode benchmark hai 5 días
  Jiang, Zhiwei dec83a10c4 fix: add "typename" prior to dependent type name (#1517) hai 5 días
  Tri Dao 3b5047d2ce Fix loop in prepare_scheduler.cu (h/t Jay Shah) hai 1 semana
  Tri Dao 9505c7436e Adjust seqlen_q in MLA decode benchmark script hai 1 semana
  Tri Dao cdda5bfdd7 Update to Cutlass 3.8.0 tag hai 1 semana
  Tri Dao 6752d62aa4 Add dynamic splits hai 1 semana
  Tri Dao 6aed835dd9 Add simple script to benchmark MLA decode hai 1 semana
  Ted Zadouri 06e34f62d1 Enable MLA flag in FA3 (rope=64, latent=512) (#1504) hai 1 semana
  Tri Dao ecdb528dea Make rotary test optional in FA3 hai 1 semana
  Tri Dao b36ad4ef76 Use split for super long sequences that don't fit into L2 hai 2 semanas
  Tri Dao 74dfa43c8d Fix divide by 0 in causal tile_scheduler for large seqlen hai 2 semanas
  Tri Dao ea3ecea97a Add tp_degree to benchmark_split_kv hai 2 semanas
  Tri Dao 91917b406b Update benchmark_split_kv.py to work w new API hai 2 semanas
  Tri Dao 40cbd529e4 Temporarily change package name of FA3 to allow FA2 & FA3 install hai 2 semanas
  Anton Vlasjuk a09abcd32d make seqused optional on top level interface (#1497) hai 2 semanas
  Tri Dao fa445ff6c2 Fix FP8 test hai 3 semanas
  Tri Dao eafd53c2f1 Update cutlass 3.8 to fix error w cudaGetDriverEntryPointByVersion hai 3 semanas
  Tri Dao 9f313c7073 Move functions getting number of m/n blocks to a separate file hai 3 semanas
  Tri Dao 15cf7ee435 Rename collective_mainloop -> mainloop, move tile_scheduler variable hai 3 semanas
  Tri Dao 1a7f4dfa9e Adjust ninja build file hai 3 semanas
  Tri Dao 5e39b100b4 Adjust tile size for hdim 64 hai 3 semanas
  Tri Dao c091545720 Update Cutlass to 3.8 hai 3 semanas