.. |
attention
|
75f97bc25d
bump flash-attn to remove unnecessary copies in the backend
|
vor 7 Monaten |
common
|
237fa59aea
feat: support CPU/GPU swapping in BlockManagerV2
|
vor 7 Monaten |
distributed
|
b2fd915c35
improve p2p access check
|
vor 7 Monaten |
endpoints
|
d00a7517e6
fix: tokenizer delay with using LLM class
|
vor 7 Monaten |
engine
|
ec5b99d075
fix: use named args
|
vor 7 Monaten |
executor
|
05d6e43244
fix: `torch.compile()` with mp executor backend
|
vor 7 Monaten |
kv_quant
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
vor 1 Jahr |
lora
|
5fecc6b025
when was this deprecated?
|
vor 7 Monaten |
modeling
|
39b36efabf
fix: mixtral fp8 ckpt loading
|
vor 7 Monaten |
multimodal
|
75f97bc25d
bump flash-attn to remove unnecessary copies in the backend
|
vor 7 Monaten |
processing
|
237fa59aea
feat: support CPU/GPU swapping in BlockManagerV2
|
vor 7 Monaten |
quantization
|
39b36efabf
fix: mixtral fp8 ckpt loading
|
vor 7 Monaten |
spec_decode
|
ec5b99d075
fix: use named args
|
vor 7 Monaten |
task_handler
|
e321d80e4e
fix: `prompt_logprobs==0` case
|
vor 7 Monaten |
transformers_utils
|
8d77c69cbd
feat: support image processor and add llava example
|
vor 7 Monaten |
__init__.py
|
be8154a8a0
feat: proper embeddings API with e5-mistral-7b support
|
vor 7 Monaten |
py.typed
|
1c988a48b2
fix logging and add py.typed
|
vor 1 Jahr |