Commit History

Author SHA1 Message Date
  AlpinDale 61260c3f10 chore: log the message queue comms handle 4 months ago
  AlpinDale 0c72961a12 chore: shutdown method for multiproc executor 4 months ago
  AlpinDale 6d1cf604b7 fix: mamba-ssm installation stuff 4 months ago
  AlpinDale ea54ffafa4 let's try this again 4 months ago
  AlpinDale 21396977b2 fix: admin key arg 4 months ago
  AlpinDale ba02bd3e18 fix: install wheel and packaging in docker 4 months ago
  AlpinDale b71a865b3a Revert "update dockerfile" 4 months ago
  AlpinDale 321a0892e0 let's not build these for now 4 months ago
  AlpinDale 6dd64089a2 update dockerfile 4 months ago
  AlpinDale 5289c14b24 feat: Asymmetric Tensor Parallel (#594) 4 months ago
  AlpinDale 9d7beaa5b9 chore: separate kv_scale into k_scale and v_scale 4 months ago
  AlpinDale fc38c74495 chore: log spec decoding metrics 4 months ago
  AlpinDale e1475fbec7 feat: MoE support with Pallas GMM kernel for TPUs 4 months ago
  AlpinDale ebf8a53618 feat: optimize throughput to 1.4x by using numpy for token padding 4 months ago
  AlpinDale 8432caecd2 feat: chat completions tokenization endpoint (#592) 4 months ago
  AlpinDale 90b2f79b0d fix: minor fix for prompt adapter config 4 months ago
  AlpinDale e76bbe72eb chore: handle aborted requests for jamba 4 months ago
  AlpinDale 4bbf66451a chore: add CustomAP interface to UnquantizedFusedMoEMethod 4 months ago
  AlpinDale c8d398a4ae feat: add custom triton cache manager 4 months ago
  AlpinDale f83bbc669d chore: upgrade flashinfer to 0.0.9 4 months ago
  AlpinDale b5d23ab6d4 chore: enable bias w/ FP8 layers in CUTLASS kernels 4 months ago
  AlpinDale e26a4ac698 chore: avoid loading the unused layers and init the VLM up to the required feature space 4 months ago
  AlpinDale 96d5b8cf2c fix: allow getting the chat template from a url 4 months ago
  AlpinDale d6bf4bcba4 fix: convert image to RGB by default 4 months ago
  AlpinDale 497bf64942 chore: simplify pipeline parallel code in llama 4 months ago
  AlpinDale cf381a0c54 OpenAI API Refactor (#591) 4 months ago
  AlpinDale 156a24978e fix: build with pylimited api in the docker file 4 months ago
  AlpinDale b82c39772c chore: allow quantizing all layers of deepseek-v2 4 months ago
  AlpinDale e7e847c3df fix: turn off cutlass scaled_mm for ada lovelace cards 4 months ago
  AlpinDale e13a66925c feat: add fuyu vision model and persimmon language model support 4 months ago