Commit History

Author SHA1 Message Date
  Tri Dao a157cc8c9b [FT] Implement MQA/GQA 1 year ago
  Tri Dao 2800efc71f [FT] rotary_cos/sin should have batch_size dimension 1 year ago
  Tri Dao 62e9814466 [Rotary] Make sure frequency calculation is in fp32 1 year ago
  Tri Dao 48bc6eacd6 [Gen] Add rotary base as an argument to FT attention kernel 1 year ago
  Tri Dao be1afaa276 [Gen, FT] Use fp32 accum for FMA 1 year ago
  Tri Dao f266fc7262 [Gen, FT] Use tlength instead of params.timestep for rotary 1 year ago
  Tri Dao a01d1213d7 [Gen] Add kernel from FasterTransformer for benchmarking 1 year ago