.. |
__init__.py
|
89a2c6dee1
chore: refactor `MultiModalConfig` initialization and profiling (#745)
|
преди 4 месеца |
loader.py
|
349a612338
chore: bump bitsandbytes version to latest; enable cuda graphs for 4bit bnb (#1123)
|
преди 5 дни |
neuron.py
|
145e554a4d
neuron: add 8bit quantization for Neuron (#994)
|
преди 1 месец |
openvino.py
|
0dfa6b60ec
core: support logprobs with multi-step scheduling (#963)
|
преди 1 месец |
tensorizer.py
|
5b03d67abb
Core: add output streaming support to multi-step + async (#1112)
|
преди 2 седмици |
utils.py
|
9f3e7c86e2
feat: add fused Marlin MoE kernel (#934)
|
преди 1 месец |
weight_utils.py
|
3e6addcc2c
LLM: enable batched inference for llm.chat() API (#1120)
|
преди 5 дни |