Commit History

Author SHA1 Message Date
  AlpinDale 93cffaf446 add flash_attn back 7 months ago
  AlpinDale f970f3f3fb add base class for VLMs 7 months ago
  AlpinDale 9e73559eba make use of batched rotary embedding kernels to support long context lora 7 months ago
  AlpinDale 1b86cf6164 navi21 fallback to naive attention 7 months ago
  AlpinDale 0dc8492188 relax tiktoken version 7 months ago
  AlpinDale 676322dd62 qwen2_moe: mlp_only_layers 7 months ago
  AlpinDale 14a2d6f624 fix rope error when loading models with different dtypes 7 months ago
  AlpinDale 0c15965621 fix fp8 kv 7 months ago
  AlpinDale 2313c97e3d add cutlass w8a8 kernels (#556) 7 months ago
  AlpinDale d4edba99f9 add lora dims for Qwen1.5-32B 7 months ago
  AlpinDale eaa06fdd14 fix some f-strings 7 months ago
  AlpinDale c58589318f remove the graph mode func 7 months ago
  AlpinDale 8e11259e90 missing triton autoconfig for rocm flash attn 7 months ago
  AlpinDale c66b1b57b1 Marlin 2:4 sparsity (#555) 7 months ago
  AlpinDale ad1c6b86a1 gptq_marlin: enable bfloat16 7 months ago
  AlpinDale 2ecfa98da9 re-fix mistral nemo 7 months ago
  AlpinDale 9f3d6205ce fix ray gpu executor 7 months ago
  AlpinDale 236be273e5 feat: tensor parallel speculative decoding (#554) 7 months ago
  AlpinDale 072b30fb42 measure end time within the cuda memory profiler 7 months ago
  AlpinDale 7bcff4ac03 implement sharded state dict 7 months ago
  AlpinDale 13e5ffd456 fix distributed_executor_backend in args 7 months ago
  AlpinDale a94de94c44 refactor: combine the prefill and decode into a single API (#553) 7 months ago
  AlpinDale fe431bb840 check for next port if current is unavailable 7 months ago
  AlpinDale 033797fd55 refactor throughput benchmark script 7 months ago
  AlpinDale c6a501f682 add multiprocessing executor; make ray optional 7 months ago
  AlpinDale 342346afda improve hashing function 7 months ago
  AlpinDale d7c0dd5b50 fix: do not set the weight to fp8 for fp16 checkpoints 7 months ago
  AlpinDale 01190e5049 use flash attention for the decoding phase 7 months ago
  AlpinDale e42d0b3455 possibly improve ngram efficiency 7 months ago
  AlpinDale 0cea453d36 automatically detect tensorized models 7 months ago