.. |
gguf
|
fcfb72af24
Support arbitrary model in GGUF. (#381)
|
hace 8 meses |
__init__.py
|
07aa2a492f
upstream: add option to specify tokenizer
|
hace 1 año |
block.py
|
ac82b67f75
feat: naive context shift and various QoL changes (#289)
|
hace 10 meses |
config.py
|
a3b1602391
fix: rope scaling for cohere and qwen (#436)
|
hace 8 meses |
grammar.py
|
0527131e93
fix: grammar logits processor (#268)
|
hace 10 meses |
logger.py
|
58b0616dd3
feat: support sharded ggufs (#420)
|
hace 8 meses |
logits_processor.py
|
f67b5be198
chore: port sampler+metadata changes from main to dev (#427)
|
hace 8 meses |
outputs.py
|
c18bf116da
fix stop strings not being excluded from outputs
|
hace 9 meses |
sampling_params.py
|
c18bf116da
fix stop strings not being excluded from outputs
|
hace 9 meses |
sequence.py
|
c577c31aaa
feat: tree attention
|
hace 8 meses |
test_utils.py
|
50c2434267
move megatron to a top-level directory
|
hace 9 meses |
utils.py
|
1528ce50e5
fix: abort requests when the connection to /v1/completions is interrupted (#431)
|
hace 8 meses |