.. |
adapter_commons
|
2f61644f6e
SPMD optimizations (#824)
|
3 maanden geleden |
assets
|
653d1a08d4
feat: add support for audio models (#891)
|
2 maanden geleden |
attention
|
de341ffb00
fix: ensure multistep lookahead allocation is compatible with cugraph max capture (#1008)
|
2 maanden geleden |
common
|
dcb36de9c4
quants: add support for NVIDIA's ModelOpt checkpoints (#1013)
|
2 maanden geleden |
compilation
|
0e5cf7f840
tpu: avoid dynamo guard eval overhead (#949)
|
2 maanden geleden |
distributed
|
61103b92d4
tpu: support single and multi-host TPUs on GKE and RayServe (#970)
|
2 maanden geleden |
endpoints
|
0191c5efd1
tools: fix tool calls to more strictly follow OpenAI format (#1003)
|
2 maanden geleden |
engine
|
9a42869055
chore: keep chunked prefill enabled with prefix caching (#1007)
|
2 maanden geleden |
executor
|
4737c22ab3
fix: pass `APHRODITE_ATTENTION_BACKEND` to ray workers (#1009)
|
2 maanden geleden |
inputs
|
908ff753a1
fix: phi_3.5_v loading (#896)
|
2 maanden geleden |
kv_quant
|
8a71788372
Add OLMoE (#772)
|
5 maanden geleden |
lora
|
bf4a4d8516
fix: do not register punica with torch if using older torch (#948)
|
2 maanden geleden |
modeling
|
dcb36de9c4
quants: add support for NVIDIA's ModelOpt checkpoints (#1013)
|
2 maanden geleden |
multimodal
|
f644e10449
vlm: enable multimodal inputs for the LLM class (#992)
|
2 maanden geleden |
platforms
|
9f3e7c86e2
feat: add fused Marlin MoE kernel (#934)
|
2 maanden geleden |
plugins
|
22a4cd4595
core: fix spec decode metrics and envs circular import (#889)
|
2 maanden geleden |
processing
|
f561a54a43
core: fix async postprocessor in case of preemption (#1000)
|
2 maanden geleden |
prompt_adapter
|
30d02d0747
chore: remove peft as a requirement (#1006)
|
2 maanden geleden |
quantization
|
dcb36de9c4
quants: add support for NVIDIA's ModelOpt checkpoints (#1013)
|
2 maanden geleden |
server
|
22a4cd4595
core: fix spec decode metrics and envs circular import (#889)
|
2 maanden geleden |
spec_decode
|
5c3b94de45
spec decode: move ops.advane_step to flash attention backend (#1005)
|
2 maanden geleden |
transformers_utils
|
f644e10449
vlm: enable multimodal inputs for the LLM class (#992)
|
2 maanden geleden |
triton_utils
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
6 maanden geleden |
worker
|
5c3b94de45
spec decode: move ops.advane_step to flash attention backend (#1005)
|
2 maanden geleden |
__init__.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
6 maanden geleden |
_core_ext.py
|
f1ea7711bd
core: do not compile ScalarType for torch < 2.4.0 (#938)
|
2 maanden geleden |
_custom_ops.py
|
fcfcfc65e1
quants: add triton kernels for AWQ (#946)
|
2 maanden geleden |
_ipex_ops.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
6 maanden geleden |
connections.py
|
c6c91edab7
ci: update & overhaul test units (#769)
|
3 maanden geleden |
constants.py
|
2f61644f6e
SPMD optimizations (#824)
|
3 maanden geleden |
py.typed
|
1c988a48b2
fix logging and add py.typed
|
1 jaar geleden |
scalar_type.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
6 maanden geleden |
version.py
|
cbd51a208a
ci: bump to 0.6.5 (#964)
|
2 maanden geleden |