Commit History

Author SHA1 Message Date
  AlpinDale 0429cb2229 fix: only create embeddings and lm_head when necessary for PP 5 months ago
  AlpinDale 2dfa4e47e6 chore: set seed for dummy weights init 5 months ago
  AlpinDale f5d52320da Port mamba kernels to Aphrodite (#595) 5 months ago
  AlpinDale 61260c3f10 chore: log the message queue comms handle 5 months ago
  AlpinDale 0c72961a12 chore: shutdown method for multiproc executor 5 months ago
  AlpinDale 6d1cf604b7 fix: mamba-ssm installation stuff 5 months ago
  AlpinDale ea54ffafa4 let's try this again 5 months ago
  AlpinDale 21396977b2 fix: admin key arg 5 months ago
  AlpinDale ba02bd3e18 fix: install wheel and packaging in docker 5 months ago
  AlpinDale b71a865b3a Revert "update dockerfile" 5 months ago
  AlpinDale 321a0892e0 let's not build these for now 5 months ago
  AlpinDale 6dd64089a2 update dockerfile 5 months ago
  AlpinDale 5289c14b24 feat: Asymmetric Tensor Parallel (#594) 5 months ago
  AlpinDale 9d7beaa5b9 chore: separate kv_scale into k_scale and v_scale 5 months ago
  AlpinDale fc38c74495 chore: log spec decoding metrics 5 months ago
  AlpinDale e1475fbec7 feat: MoE support with Pallas GMM kernel for TPUs 5 months ago
  AlpinDale ebf8a53618 feat: optimize throughput to 1.4x by using numpy for token padding 5 months ago
  AlpinDale 8432caecd2 feat: chat completions tokenization endpoint (#592) 5 months ago
  AlpinDale 90b2f79b0d fix: minor fix for prompt adapter config 5 months ago
  AlpinDale e76bbe72eb chore: handle aborted requests for jamba 5 months ago
  AlpinDale 4bbf66451a chore: add CustomAP interface to UnquantizedFusedMoEMethod 5 months ago
  AlpinDale c8d398a4ae feat: add custom triton cache manager 5 months ago
  AlpinDale f83bbc669d chore: upgrade flashinfer to 0.0.9 5 months ago
  AlpinDale b5d23ab6d4 chore: enable bias w/ FP8 layers in CUTLASS kernels 5 months ago
  AlpinDale e26a4ac698 chore: avoid loading the unused layers and init the VLM up to the required feature space 5 months ago
  AlpinDale 96d5b8cf2c fix: allow getting the chat template from a url 5 months ago
  AlpinDale d6bf4bcba4 fix: convert image to RGB by default 5 months ago
  AlpinDale 497bf64942 chore: simplify pipeline parallel code in llama 5 months ago
  AlpinDale cf381a0c54 OpenAI API Refactor (#591) 5 months ago
  AlpinDale 156a24978e fix: build with pylimited api in the docker file 5 months ago