Historique des commits

Auteur SHA1 Message Date
  AlpinDale 93cffaf446 add flash_attn back il y a 7 mois
  AlpinDale f970f3f3fb add base class for VLMs il y a 7 mois
  AlpinDale 9e73559eba make use of batched rotary embedding kernels to support long context lora il y a 7 mois
  AlpinDale 1b86cf6164 navi21 fallback to naive attention il y a 7 mois
  AlpinDale 0dc8492188 relax tiktoken version il y a 7 mois
  AlpinDale 676322dd62 qwen2_moe: mlp_only_layers il y a 7 mois
  AlpinDale 14a2d6f624 fix rope error when loading models with different dtypes il y a 7 mois
  AlpinDale 0c15965621 fix fp8 kv il y a 7 mois
  AlpinDale 2313c97e3d add cutlass w8a8 kernels (#556) il y a 7 mois
  AlpinDale d4edba99f9 add lora dims for Qwen1.5-32B il y a 7 mois
  AlpinDale eaa06fdd14 fix some f-strings il y a 7 mois
  AlpinDale c58589318f remove the graph mode func il y a 7 mois
  AlpinDale 8e11259e90 missing triton autoconfig for rocm flash attn il y a 7 mois
  AlpinDale c66b1b57b1 Marlin 2:4 sparsity (#555) il y a 7 mois
  AlpinDale ad1c6b86a1 gptq_marlin: enable bfloat16 il y a 7 mois
  AlpinDale 2ecfa98da9 re-fix mistral nemo il y a 7 mois
  AlpinDale 9f3d6205ce fix ray gpu executor il y a 7 mois
  AlpinDale 236be273e5 feat: tensor parallel speculative decoding (#554) il y a 7 mois
  AlpinDale 072b30fb42 measure end time within the cuda memory profiler il y a 7 mois
  AlpinDale 7bcff4ac03 implement sharded state dict il y a 7 mois
  AlpinDale 13e5ffd456 fix distributed_executor_backend in args il y a 7 mois
  AlpinDale a94de94c44 refactor: combine the prefill and decode into a single API (#553) il y a 7 mois
  AlpinDale fe431bb840 check for next port if current is unavailable il y a 7 mois
  AlpinDale 033797fd55 refactor throughput benchmark script il y a 7 mois
  AlpinDale c6a501f682 add multiprocessing executor; make ray optional il y a 7 mois
  AlpinDale 342346afda improve hashing function il y a 7 mois
  AlpinDale d7c0dd5b50 fix: do not set the weight to fp8 for fp16 checkpoints il y a 7 mois
  AlpinDale 01190e5049 use flash attention for the decoding phase il y a 7 mois
  AlpinDale e42d0b3455 possibly improve ngram efficiency il y a 7 mois
  AlpinDale 0cea453d36 automatically detect tensorized models il y a 7 mois