.. |
fused_moe
|
9be43994fe
feat: fbgemm quantization support (#601)
|
il y a 5 mois |
mamba
|
2dfa4e47e6
chore: set seed for dummy weights init
|
il y a 5 mois |
ops
|
fca911ee0a
vLLM Upstream Sync (#526)
|
il y a 7 mois |
__init__.py
|
07aa2a492f
upstream: add option to specify tokenizer
|
il y a 1 an |
activation.py
|
c0c336aaa3
refactor: registry for processing model inputs; quick_gelu; clip model support
|
il y a 5 mois |
layernorm.py
|
5761ef8c35
feat: gemma-2 support
|
il y a 5 mois |
linear.py
|
ba371fbbbd
feat: AWQ marlin kernels (#603)
|
il y a 5 mois |
logits_processor.py
|
5761ef8c35
feat: gemma-2 support
|
il y a 5 mois |
pooler.py
|
be8154a8a0
feat: proper embeddings API with e5-mistral-7b support
|
il y a 6 mois |
rejection_sampler.py
|
2c653a2268
fix: make speculative decoding work with per-request seed
|
il y a 5 mois |
rotary_embedding.py
|
5761ef8c35
feat: gemma-2 support
|
il y a 5 mois |
sampler.py
|
709628a74d
fix
|
il y a 5 mois |
spec_decode_base_sampler.py
|
2c653a2268
fix: make speculative decoding work with per-request seed
|
il y a 5 mois |
typical_acceptance_sampler.py
|
2c653a2268
fix: make speculative decoding work with per-request seed
|
il y a 5 mois |
vocab_parallel_embedding.py
|
9be43994fe
feat: fbgemm quantization support (#601)
|
il y a 5 mois |