.. |
layers
|
801eda0b7a
feat: support GPTQ 2, 3, and 8bit quants (#181)
|
1 year ago |
megatron
|
e7b6a2d5a0
chore: tensor parallel refactors part 2 (#116)
|
1 year ago |
models
|
b9b295d74e
chore: backlogs 1 (#191)
|
1 year ago |
__init__.py
|
653da510d1
chore: rewrite InputMetadata (#143)
|
1 year ago |
hf_downloader.py
|
725be3e0de
feat: mixtral HF with expert parallelism (#167)
|
1 year ago |
loader.py
|
730357c7d5
chore: implement lazy module loader for models (#165)
|
1 year ago |
metadata.py
|
7d91e9e0f2
feat: CUDA graphs (#172)
|
1 year ago |
sampling_metadata.py
|
2aab3da9bd
chore: fix Python 3.8+ compatibility (#170)
|
1 year ago |
utils.py
|
e7b6a2d5a0
chore: tensor parallel refactors part 2 (#116)
|
1 year ago |