AlpinDale 968bde81bf fix: tensor parallel with GPTQ and AWQ quants (#307) 10 mēneši atpakaļ
..
quantization c41462cfcd feat: exllamav2 quantization (#305) 10 mēneši atpakaļ
triton_kernel 16615784b3 fix: prefix cache for turing gpus 10 mēneši atpakaļ
__init__.py 07aa2a492f upstream: add option to specify tokenizer 1 gadu atpakaļ
activation.py e31c6f0b45 feat: refactor modeling logic and support more models (#274) 10 mēneši atpakaļ
attention.py 9810daa699 feat: INT8 KV Cache (#298) 10 mēneši atpakaļ
layernorm.py e31c6f0b45 feat: refactor modeling logic and support more models (#274) 10 mēneši atpakaļ
linear.py c2d77b1822 chore: logging refactor (#302) 10 mēneši atpakaļ
rejection.py 95bdd35ec9 feat: rejection sampler (#197) 1 gadu atpakaļ
rotary_embedding.py ea0f57b233 feat: allow further support for non-cuda devices (#247) 11 mēneši atpakaļ
sampler.py 9fa99215f8 feat: add cubic sampling (#280) 10 mēneši atpakaļ
vocab_parallel_embedding.py 968bde81bf fix: tensor parallel with GPTQ and AWQ quants (#307) 10 mēneši atpakaļ