.. |
quantization
|
62b2c4119d
feat: re-write GPTQ and refactor exllama kernels (#152)
|
1 year ago |
__init__.py
|
07aa2a492f
upstream: add option to specify tokenizer
|
1 year ago |
activation.py
|
5dbd5f8c30
fix: quant TP (#129)
|
1 year ago |
attention.py
|
653da510d1
chore: rewrite InputMetadata (#143)
|
1 year ago |
layernorm.py
|
1aab8a7d6f
feat: speedup compilation times by 3x (#130)
|
1 year ago |
linear.py
|
62b2c4119d
feat: re-write GPTQ and refactor exllama kernels (#152)
|
1 year ago |
rotary_embedding.py
|
e386032ae8
fix: rope duplication (#142)
|
1 year ago |
sampler.py
|
653da510d1
chore: rewrite InputMetadata (#143)
|
1 year ago |
sampler_mirostat.py
|
653da510d1
chore: rewrite InputMetadata (#143)
|
1 year ago |
vocab_parallel_embedding.py
|
e7b6a2d5a0
chore: tensor parallel refactors part 2 (#116)
|
1 year ago |