.. |
attention
|
0d3562a7f9
MQA in triton FA
|
hai 5 meses |
common
|
3ab36e6b2d
feat: extended RoPE for Llama 3.1 (#543)
|
hai 5 meses |
distributed
|
d1a3c7bc2c
chore: simplify try-finally logic in pynccl
|
hai 6 meses |
endpoints
|
46159b107a
formatting: pt1
|
hai 6 meses |
engine
|
aed64884c6
feat: prompt logprobs with chunked prefill (#539)
|
hai 5 meses |
executor
|
c9d6f9f164
fix formatting
|
hai 5 meses |
kv_quant
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
hai 10 meses |
lora
|
b178ae4b4a
chore: generalize linear_method to be quant_method (#540)
|
hai 5 meses |
modeling
|
c2be1b9f29
formatting
|
hai 5 meses |
processing
|
aed64884c6
feat: prompt logprobs with chunked prefill (#539)
|
hai 5 meses |
quantization
|
36660b55c2
chore: mixtral fp8 w/ static scales (#542)
|
hai 5 meses |
spec_decode
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
hai 8 meses |
task_handler
|
aed64884c6
feat: prompt logprobs with chunked prefill (#539)
|
hai 5 meses |
transformers_utils
|
ed759f065d
chore: tokenizer_revision -> revision
|
hai 6 meses |
__init__.py
|
199e776722
chore: move ray utils to executor dir
|
hai 6 meses |
py.typed
|
1c988a48b2
fix logging and add py.typed
|
hai 1 ano |