.. |
guided_decoding
|
d8c4193704
feat: Speculative Decoding using a draft model (#432)
|
8 months ago |
layers
|
b28011e86e
fix: shard exl2 weights more evenly between ranks (#437)
|
8 months ago |
models
|
a3b1602391
fix: rope scaling for cohere and qwen (#436)
|
8 months ago |
__init__.py
|
0f1399c135
feat: attention refactor part 2
|
9 months ago |
hf_downloader.py
|
58b0616dd3
feat: support sharded ggufs (#420)
|
8 months ago |
loader.py
|
589fe0c73e
fix: split the exl2 weight loading and SQ+ init (#423)
|
8 months ago |
neuron_loader.py
|
d1786645a3
fix formatting
|
9 months ago |
sampling_metadata.py
|
f67b5be198
chore: port sampler+metadata changes from main to dev (#427)
|
8 months ago |
utils.py
|
d1786645a3
fix formatting
|
9 months ago |