.. |
fused_moe
|
1e35cef979
feat: add arctic snowflake model (#551)
|
6 months ago |
ops
|
fca911ee0a
vLLM Upstream Sync (#526)
|
6 months ago |
__init__.py
|
07aa2a492f
upstream: add option to specify tokenizer
|
1 year ago |
activation.py
|
6fc1ec6e9a
fix redirects and improve low level debugging
|
6 months ago |
layernorm.py
|
6fc1ec6e9a
fix redirects and improve low level debugging
|
6 months ago |
linear.py
|
5884e0b904
add bitnetforcausallm support
|
5 months ago |
logits_processor.py
|
e8b7f53321
allow prompt token IDs in the logits processor api
|
5 months ago |
pooler.py
|
be8154a8a0
feat: proper embeddings API with e5-mistral-7b support
|
6 months ago |
rejection.py
|
197a6d2c16
auto disable speculative decoding by the running queue size
|
6 months ago |
rotary_embedding.py
|
9e73559eba
make use of batched rotary embedding kernels to support long context lora
|
6 months ago |
sampler.py
|
be8154a8a0
feat: proper embeddings API with e5-mistral-7b support
|
6 months ago |
vocab_parallel_embedding.py
|
6fc1ec6e9a
fix redirects and improve low level debugging
|
6 months ago |