.. |
fused_moe
|
9be43994fe
feat: fbgemm quantization support (#601)
|
hace 5 meses |
mamba
|
2dfa4e47e6
chore: set seed for dummy weights init
|
hace 5 meses |
ops
|
fca911ee0a
vLLM Upstream Sync (#526)
|
hace 7 meses |
__init__.py
|
07aa2a492f
upstream: add option to specify tokenizer
|
hace 1 año |
activation.py
|
c0c336aaa3
refactor: registry for processing model inputs; quick_gelu; clip model support
|
hace 5 meses |
layernorm.py
|
5761ef8c35
feat: gemma-2 support
|
hace 5 meses |
linear.py
|
ba371fbbbd
feat: AWQ marlin kernels (#603)
|
hace 5 meses |
logits_processor.py
|
5761ef8c35
feat: gemma-2 support
|
hace 5 meses |
pooler.py
|
be8154a8a0
feat: proper embeddings API with e5-mistral-7b support
|
hace 6 meses |
rejection_sampler.py
|
2c653a2268
fix: make speculative decoding work with per-request seed
|
hace 5 meses |
rotary_embedding.py
|
5761ef8c35
feat: gemma-2 support
|
hace 5 meses |
sampler.py
|
709628a74d
fix
|
hace 5 meses |
spec_decode_base_sampler.py
|
2c653a2268
fix: make speculative decoding work with per-request seed
|
hace 5 meses |
typical_acceptance_sampler.py
|
2c653a2268
fix: make speculative decoding work with per-request seed
|
hace 5 meses |
vocab_parallel_embedding.py
|
9be43994fe
feat: fbgemm quantization support (#601)
|
hace 5 meses |