Commit History

Autor SHA1 Mensaxe Data
  Tri Dao 8c20cfef49 [Rotary] Support qkv block layout from GQA hai 3 meses
  Antony Frolov 3566596ad8 Fix typo in RotaryEmbedding forward output type (#666) hai 1 ano
  Katherine Crowson 4c8ff9154e Fix NameError and typo in ApplyRotaryEmbQKV_ (#569) hai 1 ano
  Tri Dao 1879e089c7 Reduce number of templates for headdim > 128 hai 1 ano
  Tri Dao a86442f0f3 [Gen] Use flash_attn_with_kvcache in generation hai 1 ano
  Tri Dao b28ec236df [Rotary] Implement varlen rotary hai 1 ano
  Tri Dao de2949f37d [Rotary] Pass max_seqlen from mha.py to rotary during inference hai 1 ano
  Tri Dao 942fcbf046 [Rotary] Implement rotary in Triton hai 1 ano
  Tri Dao f1a73d0740 Run isort and black on python files hai 1 ano
  Tri Dao 425dbcb6c6 [MHA] Implement MQA/GQA hai 1 ano
  Tri Dao ec9f74ab9a [Rotary] Don't store inv_freq in state_dict hai 1 ano
  Volodymyr Kyrylov 70ab266a56 rotary: update cos/sin cache when switching from inference mode hai 1 ano
  Tri Dao 62e9814466 [Rotary] Make sure frequency calculation is in fp32 hai 1 ano
  Tri Dao 48bc6eacd6 [Gen] Add rotary base as an argument to FT attention kernel hai 1 ano
  Tri Dao e45a46a5b7 [Rotary] Implement GPT-J style (interleaved) rotary hai 1 ano
  Tri Dao 85b8e3d334 [Docs] Mention that XPos's scale_base is recommended to be 512 hai 1 ano
  Tri Dao 1e712ea8b0 Implement TensorParallel for MHA %!s(int64=2) %!d(string=hai) anos
  Tri Dao 496e4f528c Implement XPos (Sun et al.) %!s(int64=2) %!d(string=hai) anos
  Alexander Ploshkin ee8984d2be add asserts for sin shape %!s(int64=2) %!d(string=hai) anos
  Alexander Ploshkin c7c66976cc fix slicing dimensions %!s(int64=2) %!d(string=hai) anos
  Alexander Ploshkin 96656b9323 Remove redundant shape asserts in rotary embeddings %!s(int64=2) %!d(string=hai) anos
  Tri Dao 71f674ae23 [Rotary] Customize base, support seqlen_offset %!s(int64=2) %!d(string=hai) anos
  Tri Dao d4b320b31f Add MLP, MHA, Block, Embedding modules %!s(int64=2) %!d(string=hai) anos