AlpinDale
|
ec5b99d075
fix: use named args
|
7 ay önce |
AlpinDale
|
e0886ee929
feat: add `ProposerWorkerBase` abstract class
|
7 ay önce |
AlpinDale
|
d00a7517e6
fix: tokenizer delay with using LLM class
|
7 ay önce |
AlpinDale
|
39b36efabf
fix: mixtral fp8 ckpt loading
|
7 ay önce |
AlpinDale
|
e32f506e17
chore: gpu arch guard for cutlass w8a8 kernels
|
7 ay önce |
AlpinDale
|
814c1ddeba
feat: add CustomOp interface for device portability
|
7 ay önce |
AlpinDale
|
f91f217bf8
fix: do not skip `prompt_logprobs` when `SamplingParams.detokenize=True`
|
7 ay önce |
AlpinDale
|
5b5e6dc359
chore: add batch size 1536 and 3072 to moe benchmark
|
7 ay önce |
AlpinDale
|
a7fb48acdf
fix: setuptools version in dockerfile for cpu
|
7 ay önce |
AlpinDale
|
05d6e43244
fix: `torch.compile()` with mp executor backend
|
7 ay önce |
AlpinDale
|
4bdd2f9892
chore: enhance MoE benchmarking
|
7 ay önce |
AlpinDale
|
e321d80e4e
fix: `prompt_logprobs==0` case
|
7 ay önce |
AlpinDale
|
141c602c39
feat: OpenAI `tools` support named functions
|
7 ay önce |
AlpinDale
|
237fa59aea
feat: support CPU/GPU swapping in BlockManagerV2
|
7 ay önce |
AlpinDale
|
ba02fb65c9
fix: pos encodings for CPU
|
7 ay önce |
AlpinDale
|
90bafca8e3
fix: cuda graphs with sparseml quants
|
7 ay önce |
AlpinDale
|
89ee54dcff
update dockerfile and enhance serving benchmark
|
7 ay önce |
AlpinDale
|
75f97bc25d
bump flash-attn to remove unnecessary copies in the backend
|
7 ay önce |
AlpinDale
|
7e1d2c9feb
fix: add images/ to gitignore
|
7 ay önce |
AlpinDale
|
8d77c69cbd
feat: support image processor and add llava example
|
7 ay önce |
AlpinDale
|
00acf371f9
rocm: fused topk softmax
|
7 ay önce |
AlpinDale
|
78de98463b
feat: return max_model_len in /v1/models
|
7 ay önce |
AlpinDale
|
8c61fb9c19
fix: prevent LLM.encode() to be used with causal models
|
7 ay önce |
AlpinDale
|
5fecc6b025
when was this deprecated?
|
7 ay önce |
AlpinDale
|
690110a051
feat: bitsandbytes quantization
|
7 ay önce |
AlpinDale
|
0307da9e15
refactor: bitsandbytes -> autoquant
|
7 ay önce |
AlpinDale
|
f2c6791527
feat: update cutlass fp8 configs
|
7 ay önce |
AlpinDale
|
54f4f1e7f3
allow the cutlass kernels to take scales that reside on the GPU
|
7 ay önce |
AlpinDale
|
52474b8fa9
build: parallelize all build extensions
|
7 ay önce |
AlpinDale
|
67084aca5b
do not build cutlass kernels if cuda version is too low
|
7 ay önce |