AlpinDale
|
017b42c517
chore: use fork as the default method for mp backend
|
vor 7 Monaten |
AlpinDale
|
cd9ed8623b
fix: cuda version check for fp8 support in the cutlass kernels
|
vor 7 Monaten |
AlpinDale
|
fad77538de
feat: update cutlass int8 kernel configs for sm90
|
vor 7 Monaten |
AlpinDale
|
b753ff7870
feat: per-channel support for static activation quant
|
vor 7 Monaten |
AlpinDale
|
3c7444c89b
fix: asyncio.run hangs in python < 3.12
|
vor 7 Monaten |
AlpinDale
|
d44ac8e497
fix: `--preemption_mode` -> `--preemption-mode`
|
vor 7 Monaten |
AlpinDale
|
bcf9c83e6a
fix: incorrect args passed to generate() method in phi3v example
|
vor 7 Monaten |
AlpinDale
|
025322ee5f
fix: fp8 kv cache for qwen2 models
|
vor 7 Monaten |
AlpinDale
|
323fe23b21
chore: use 127.0.0.1 for single-node setups
|
vor 7 Monaten |
AlpinDale
|
89be49d058
fix: build for mi300x
|
vor 7 Monaten |
AlpinDale
|
7d3da17e19
fix: phi3 rope scaling
|
vor 7 Monaten |
AlpinDale
|
765adcfba1
chore: add w8a8 benchmark scripts
|
vor 7 Monaten |
AlpinDale
|
1587fab5de
fix: cuda version check for mma warning suppression
|
vor 7 Monaten |
AlpinDale
|
d1f91d0f70
fix: greedy sampling being not greedy in concurrent situations where penalties are used
|
vor 7 Monaten |
AlpinDale
|
da6765c084
feat: lora support for commandr models
|
vor 7 Monaten |
AlpinDale
|
70ec3a7b93
chore: make the dockerfile a bit better
|
vor 7 Monaten |
AlpinDale
|
9b4c72a801
feat: support channel-wise quant for w8a8 dynamic per token activation quant
|
vor 7 Monaten |
AlpinDale
|
79b1c0b861
fix: do not error our if two processes do not agree on p2p capability
|
vor 7 Monaten |
AlpinDale
|
e6d70101b3
feat: add support for phi-3 vision model
|
vor 7 Monaten |
AlpinDale
|
313e6e1ec7
feat: add typical acceptance sampling
|
vor 7 Monaten |
AlpinDale
|
0613d91551
fix: kv head calculation with MPT GQA
|
vor 7 Monaten |
AlpinDale
|
b5694be865
chore: use a pool to reuse LogicalTokenBlock.token_ids
|
vor 7 Monaten |
AlpinDale
|
c05a45f22f
chore: minor updates to throughput benchmark and llm class
|
vor 7 Monaten |
AlpinDale
|
dfa59bc5f9
fix: 16 GPUs in a cluster
|
vor 7 Monaten |
AlpinDale
|
5a925923e3
fix: numba cache
|
vor 7 Monaten |
AlpinDale
|
964aa08a70
fix: serializer log
|
vor 7 Monaten |
AlpinDale
|
5aa910a022
chore: allow building on non-avx512 machines
|
vor 7 Monaten |
AlpinDale
|
6a57861fca
feat: initial XPU support via intel_extension_for_pytorch (#571)
|
vor 7 Monaten |
AlpinDale
|
e2dbe5f05c
feat: add sparse marlin for compressed tensors
|
vor 7 Monaten |
AlpinDale
|
e2e64a6241
fix: limit numpy version
|
vor 7 Monaten |