.. |
fused_moe
|
41beab5dc1
add exllamav2 tensor paralell, fused MoE for GPTQ/AWQ
|
9 months ago |
ops
|
9181fa0396
feat: Triton kernels for sampling (#383)
|
9 months ago |
quantization
|
8d26cf3876
simplify model_executor logic
|
8 months ago |
__init__.py
|
07aa2a492f
upstream: add option to specify tokenizer
|
1 year ago |
activation.py
|
50c2434267
move megatron to a top-level directory
|
9 months ago |
layernorm.py
|
e31c6f0b45
feat: refactor modeling logic and support more models (#274)
|
10 months ago |
linear.py
|
b28011e86e
fix: shard exl2 weights more evenly between ranks (#437)
|
8 months ago |
logits_processor.py
|
50c2434267
move megatron to a top-level directory
|
9 months ago |
rejection.py
|
d8c4193704
feat: Speculative Decoding using a draft model (#432)
|
8 months ago |
rotary_embedding.py
|
c8a91b0b96
rope: get_device() -> device
|
9 months ago |
sampler.py
|
d8c4193704
feat: Speculative Decoding using a draft model (#432)
|
8 months ago |
vocab_parallel_embedding.py
|
f3b546e33a
feat: upport twe lm_head for quantized weights (#409)
|
8 months ago |