Histórico de Commits

Autor SHA1 Mensagem Data
  Tri Dao 4f0640d534 Move writing P to smem as separate function há 1 dia atrás
  Tri Dao 3edf7e0daa Add kwargs to _write_ninja_file for compatibility with new torch há 1 dia atrás
  Tri Dao 45c48afb2b Add option for WG1 to use RS MMA but WG2 using SS MMA há 2 dias atrás
  xin-w8023 6865e60145 fix: prompt index to type longlong to avoid numerical overflow (#1500) há 4 dias atrás
  Tri Dao 5458c78e6d Remove sink token há 4 dias atrás
  Tri Dao 20b84d6363 Don't use IntraWGOverlap for hdim 64,512 há 4 dias atrás
  Lucas Wilkinson 39e7197564 Fix cuda 12.1 build (#1511) há 5 dias atrás
  Tri Dao 085ce5864a Change margin in prepare_scheduler.cu from 20% to 10% há 5 dias atrás
  Tri Dao 08f4c802c4 Add FLOPS to MLA decode benchmark há 5 dias atrás
  Jiang, Zhiwei dec83a10c4 fix: add "typename" prior to dependent type name (#1517) há 5 dias atrás
  Tri Dao 3b5047d2ce Fix loop in prepare_scheduler.cu (h/t Jay Shah) há 1 semana atrás
  Tri Dao 9505c7436e Adjust seqlen_q in MLA decode benchmark script há 1 semana atrás
  Tri Dao cdda5bfdd7 Update to Cutlass 3.8.0 tag há 1 semana atrás
  Tri Dao 6752d62aa4 Add dynamic splits há 1 semana atrás
  Tri Dao 6aed835dd9 Add simple script to benchmark MLA decode há 1 semana atrás
  Ted Zadouri 06e34f62d1 Enable MLA flag in FA3 (rope=64, latent=512) (#1504) há 1 semana atrás
  Tri Dao ecdb528dea Make rotary test optional in FA3 há 1 semana atrás
  Tri Dao b36ad4ef76 Use split for super long sequences that don't fit into L2 há 2 semanas atrás
  Tri Dao 74dfa43c8d Fix divide by 0 in causal tile_scheduler for large seqlen há 2 semanas atrás
  Tri Dao ea3ecea97a Add tp_degree to benchmark_split_kv há 2 semanas atrás
  Tri Dao 91917b406b Update benchmark_split_kv.py to work w new API há 2 semanas atrás
  Tri Dao 40cbd529e4 Temporarily change package name of FA3 to allow FA2 & FA3 install há 2 semanas atrás
  Anton Vlasjuk a09abcd32d make seqused optional on top level interface (#1497) há 2 semanas atrás
  Tri Dao fa445ff6c2 Fix FP8 test há 3 semanas atrás
  Tri Dao eafd53c2f1 Update cutlass 3.8 to fix error w cudaGetDriverEntryPointByVersion há 3 semanas atrás
  Tri Dao 9f313c7073 Move functions getting number of m/n blocks to a separate file há 3 semanas atrás
  Tri Dao 15cf7ee435 Rename collective_mainloop -> mainloop, move tile_scheduler variable há 3 semanas atrás
  Tri Dao 1a7f4dfa9e Adjust ninja build file há 3 semanas atrás
  Tri Dao 5e39b100b4 Adjust tile size for hdim 64 há 3 semanas atrás
  Tri Dao c091545720 Update Cutlass to 3.8 há 3 semanas atrás