.. |
guided_decoding
|
d8c4193704
feat: Speculative Decoding using a draft model (#432)
|
il y a 8 mois |
layers
|
b28011e86e
fix: shard exl2 weights more evenly between ranks (#437)
|
il y a 8 mois |
models
|
a3b1602391
fix: rope scaling for cohere and qwen (#436)
|
il y a 8 mois |
__init__.py
|
0f1399c135
feat: attention refactor part 2
|
il y a 9 mois |
hf_downloader.py
|
58b0616dd3
feat: support sharded ggufs (#420)
|
il y a 8 mois |
loader.py
|
589fe0c73e
fix: split the exl2 weight loading and SQ+ init (#423)
|
il y a 8 mois |
neuron_loader.py
|
d1786645a3
fix formatting
|
il y a 9 mois |
sampling_metadata.py
|
f67b5be198
chore: port sampler+metadata changes from main to dev (#427)
|
il y a 8 mois |
utils.py
|
d1786645a3
fix formatting
|
il y a 9 mois |