.. |
attention
|
0d3562a7f9
MQA in triton FA
|
5 months ago |
common
|
3ab36e6b2d
feat: extended RoPE for Llama 3.1 (#543)
|
5 months ago |
distributed
|
d1a3c7bc2c
chore: simplify try-finally logic in pynccl
|
6 months ago |
endpoints
|
46159b107a
formatting: pt1
|
6 months ago |
engine
|
aed64884c6
feat: prompt logprobs with chunked prefill (#539)
|
5 months ago |
executor
|
c9d6f9f164
fix formatting
|
5 months ago |
kv_quant
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
9 months ago |
lora
|
b178ae4b4a
chore: generalize linear_method to be quant_method (#540)
|
5 months ago |
modeling
|
c2be1b9f29
formatting
|
5 months ago |
processing
|
aed64884c6
feat: prompt logprobs with chunked prefill (#539)
|
5 months ago |
quantization
|
36660b55c2
chore: mixtral fp8 w/ static scales (#542)
|
5 months ago |
spec_decode
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
8 months ago |
task_handler
|
aed64884c6
feat: prompt logprobs with chunked prefill (#539)
|
5 months ago |
transformers_utils
|
ed759f065d
chore: tokenizer_revision -> revision
|
6 months ago |
__init__.py
|
199e776722
chore: move ray utils to executor dir
|
6 months ago |
py.typed
|
1c988a48b2
fix logging and add py.typed
|
1 year ago |