.. |
arctic_inference.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
4 months ago |
cached_prefix_inference.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
4 months ago |
embedding_inference.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
4 months ago |
encoder_decoder_inference.py
|
62111fab17
feat: allow serving encoder-decoder models in the API server (#664)
|
4 months ago |
gguf_inference.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
4 months ago |
lora_aphrodite_engine.py
|
673621a3d2
xpu: refactor the model runner for tensor parallelism (#910)
|
3 weeks ago |
lora_async_aphrodite.py
|
673621a3d2
xpu: refactor the model runner for tensor parallelism (#910)
|
3 weeks ago |
mlpspeculator_inference.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
4 months ago |
neuron_inference.py
|
ba6d798784
neuron: support for context length and token bucketing (#960)
|
2 weeks ago |
neuron_int8_quantization.py
|
145e554a4d
neuron: add 8bit quantization for Neuron (#994)
|
2 weeks ago |
offline_inference.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
4 months ago |
ray_distributed_inference.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
4 months ago |
soft_prompt_inference.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
4 months ago |
tpu_inference.py
|
436d8fa0f1
core: do not compile for profiling (#931)
|
2 weeks ago |