AlpinDale
|
61260c3f10
chore: log the message queue comms handle
|
4 months ago |
AlpinDale
|
0c72961a12
chore: shutdown method for multiproc executor
|
4 months ago |
AlpinDale
|
6d1cf604b7
fix: mamba-ssm installation stuff
|
4 months ago |
AlpinDale
|
ea54ffafa4
let's try this again
|
4 months ago |
AlpinDale
|
21396977b2
fix: admin key arg
|
4 months ago |
AlpinDale
|
ba02bd3e18
fix: install wheel and packaging in docker
|
4 months ago |
AlpinDale
|
b71a865b3a
Revert "update dockerfile"
|
4 months ago |
AlpinDale
|
321a0892e0
let's not build these for now
|
4 months ago |
AlpinDale
|
6dd64089a2
update dockerfile
|
4 months ago |
AlpinDale
|
5289c14b24
feat: Asymmetric Tensor Parallel (#594)
|
4 months ago |
AlpinDale
|
9d7beaa5b9
chore: separate kv_scale into k_scale and v_scale
|
4 months ago |
AlpinDale
|
fc38c74495
chore: log spec decoding metrics
|
4 months ago |
AlpinDale
|
e1475fbec7
feat: MoE support with Pallas GMM kernel for TPUs
|
4 months ago |
AlpinDale
|
ebf8a53618
feat: optimize throughput to 1.4x by using numpy for token padding
|
4 months ago |
AlpinDale
|
8432caecd2
feat: chat completions tokenization endpoint (#592)
|
4 months ago |
AlpinDale
|
90b2f79b0d
fix: minor fix for prompt adapter config
|
4 months ago |
AlpinDale
|
e76bbe72eb
chore: handle aborted requests for jamba
|
4 months ago |
AlpinDale
|
4bbf66451a
chore: add CustomAP interface to UnquantizedFusedMoEMethod
|
4 months ago |
AlpinDale
|
c8d398a4ae
feat: add custom triton cache manager
|
4 months ago |
AlpinDale
|
f83bbc669d
chore: upgrade flashinfer to 0.0.9
|
4 months ago |
AlpinDale
|
b5d23ab6d4
chore: enable bias w/ FP8 layers in CUTLASS kernels
|
4 months ago |
AlpinDale
|
e26a4ac698
chore: avoid loading the unused layers and init the VLM up to the required feature space
|
4 months ago |
AlpinDale
|
96d5b8cf2c
fix: allow getting the chat template from a url
|
4 months ago |
AlpinDale
|
d6bf4bcba4
fix: convert image to RGB by default
|
4 months ago |
AlpinDale
|
497bf64942
chore: simplify pipeline parallel code in llama
|
4 months ago |
AlpinDale
|
cf381a0c54
OpenAI API Refactor (#591)
|
4 months ago |
AlpinDale
|
156a24978e
fix: build with pylimited api in the docker file
|
4 months ago |
AlpinDale
|
b82c39772c
chore: allow quantizing all layers of deepseek-v2
|
4 months ago |
AlpinDale
|
e7e847c3df
fix: turn off cutlass scaled_mm for ada lovelace cards
|
4 months ago |
AlpinDale
|
e13a66925c
feat: add fuyu vision model and persimmon language model support
|
4 months ago |