AlpinDale 801eda0b7a feat: support GPTQ 2, 3, and 8bit quants (#181) 1 year ago
..
layers 801eda0b7a feat: support GPTQ 2, 3, and 8bit quants (#181) 1 year ago
megatron e7b6a2d5a0 chore: tensor parallel refactors part 2 (#116) 1 year ago
models b9b295d74e chore: backlogs 1 (#191) 1 year ago
__init__.py 653da510d1 chore: rewrite InputMetadata (#143) 1 year ago
hf_downloader.py 725be3e0de feat: mixtral HF with expert parallelism (#167) 1 year ago
loader.py 730357c7d5 chore: implement lazy module loader for models (#165) 1 year ago
metadata.py 7d91e9e0f2 feat: CUDA graphs (#172) 1 year ago
sampling_metadata.py 2aab3da9bd chore: fix Python 3.8+ compatibility (#170) 1 year ago
utils.py e7b6a2d5a0 chore: tensor parallel refactors part 2 (#116) 1 year ago