.. |
quantization
|
f8652c8e99
fix: optimize aqlm dequantization (#325)
|
há 10 meses atrás |
triton_kernel
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
há 10 meses atrás |
__init__.py
|
07aa2a492f
upstream: add option to specify tokenizer
|
há 1 ano atrás |
activation.py
|
e31c6f0b45
feat: refactor modeling logic and support more models (#274)
|
há 10 meses atrás |
attention.py
|
58e89e29d9
add custom bias to attention.py
|
há 9 meses atrás |
enc_dec_attention.py
|
3ed4cc431c
enc_dec attention code
|
há 9 meses atrás |
layernorm.py
|
e31c6f0b45
feat: refactor modeling logic and support more models (#274)
|
há 10 meses atrás |
linear.py
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
há 10 meses atrás |
rejection.py
|
95bdd35ec9
feat: rejection sampler (#197)
|
há 1 ano atrás |
rotary_embedding.py
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
há 10 meses atrás |
sampler.py
|
da223153c6
feat&fix: cohere support and missing GPU blocks (#333)
|
há 9 meses atrás |
vocab_parallel_embedding.py
|
968bde81bf
fix: tensor parallel with GPTQ and AWQ quants (#307)
|
há 10 meses atrás |