AlpinDale
|
bdf1cc1aec
fix: allow using custom all reduce when pp_size > 1
|
vor 6 Monaten |
AlpinDale
|
ad24e74a99
feat: FP8 weight-only quantization support for Ampere GPUs
|
vor 6 Monaten |
AlpinDale
|
5257ebce8c
fix: device >= 0 && device < num_gpus INTERNAL_ASSERT FAILED
|
vor 6 Monaten |
AlpinDale
|
5240c0da23
fix: avoid unnecessary ray import warnings
|
vor 6 Monaten |
AlpinDale
|
4599c98f99
feat: dynamic image size support for VLMs
|
vor 6 Monaten |
AlpinDale
|
cda0e93a10
abstract away the platform for device capability
|
vor 6 Monaten |
AlpinDale
|
5be90c3859
Mamba infrastrucuture support (#586)
|
vor 6 Monaten |
AlpinDale
|
2ed49cdc4c
fix: kobold api generation
|
vor 6 Monaten |
AlpinDale
|
0f4a9ee77b
quantized lm_head (#582)
|
vor 6 Monaten |
AlpinDale
|
cf472315cc
refactor: isolate FP8 from mixtral
|
vor 7 Monaten |
AlpinDale
|
ae04f57ec1
feat: Pipeline Parallel support (#581)
|
vor 7 Monaten |
AlpinDale
|
dd378ea063
feat: MLPSpeculator with tensor parallel
|
vor 7 Monaten |
AlpinDale
|
3a0fdf7b9b
chore: remove `image_input_type` from VLM config
|
vor 7 Monaten |
AlpinDale
|
63b735bc2a
chore: optimize v2 block manager to match the performance of v1
|
vor 7 Monaten |
AlpinDale
|
de7e6919c0
feat: support tied weights and input scale for MLPSpeculator
|
vor 7 Monaten |
AlpinDale
|
9da2448964
fix: ensure worker model loop is always stopped at the right time
|
vor 7 Monaten |
AlpinDale
|
ca6b69966d
fix: explicitly end_forward() calls to flashinfer
|
vor 7 Monaten |
AlpinDale
|
3b2666314d
fix: add chunking mechanism to fused_moe
|
vor 7 Monaten |
AlpinDale
|
70ecdefc4e
fix: Ray ActorDiedError not available on older ray versions
|
vor 7 Monaten |
AlpinDale
|
7253e9052d
feat: integrate typical acceptance sampling for spec decoding
|
vor 7 Monaten |
AlpinDale
|
7d79c0e726
chore: use nvml query to avoid accidental cuda initialization
|
vor 7 Monaten |
AlpinDale
|
ddb3323f94
refactor: have w8a8 compressed tensors use `process_weights_after_load` for fp8
|
vor 7 Monaten |
AlpinDale
|
17f7089e26
fix: `get_min_capability` for all quants
|
vor 7 Monaten |
AlpinDale
|
0a6db357d8
fix: use safetensor keys instead of adapter_config.json to find unexpected modules
|
vor 7 Monaten |
AlpinDale
|
4f87a14998
chore: allow base64 embeddings
|
vor 7 Monaten |
AlpinDale
|
aea0b52e52
fix: torchvision version for rocm
|
vor 7 Monaten |
AlpinDale
|
4cdc810b1c
fix: minor TP issues with vision models
|
vor 7 Monaten |
AlpinDale
|
336eb4dbf8
fix: raise error in moe kernel if it receives more than 65k tokens
|
vor 7 Monaten |
AlpinDale
|
bcc60a6555
chore: optimize SequenceStatus.is_finished by switching to IntEnum
|
vor 7 Monaten |
AlpinDale
|
7b04361934
fix: support getting `eos_token_id` from the config file
|
vor 7 Monaten |