AlpinDale
|
da6765c084
feat: lora support for commandr models
|
7 ماه پیش |
AlpinDale
|
70ec3a7b93
chore: make the dockerfile a bit better
|
7 ماه پیش |
AlpinDale
|
9b4c72a801
feat: support channel-wise quant for w8a8 dynamic per token activation quant
|
7 ماه پیش |
AlpinDale
|
79b1c0b861
fix: do not error our if two processes do not agree on p2p capability
|
7 ماه پیش |
AlpinDale
|
e6d70101b3
feat: add support for phi-3 vision model
|
7 ماه پیش |
AlpinDale
|
313e6e1ec7
feat: add typical acceptance sampling
|
7 ماه پیش |
AlpinDale
|
0613d91551
fix: kv head calculation with MPT GQA
|
7 ماه پیش |
AlpinDale
|
b5694be865
chore: use a pool to reuse LogicalTokenBlock.token_ids
|
7 ماه پیش |
AlpinDale
|
c05a45f22f
chore: minor updates to throughput benchmark and llm class
|
7 ماه پیش |
AlpinDale
|
dfa59bc5f9
fix: 16 GPUs in a cluster
|
7 ماه پیش |
AlpinDale
|
5a925923e3
fix: numba cache
|
7 ماه پیش |
AlpinDale
|
964aa08a70
fix: serializer log
|
7 ماه پیش |
AlpinDale
|
5aa910a022
chore: allow building on non-avx512 machines
|
7 ماه پیش |
AlpinDale
|
6a57861fca
feat: initial XPU support via intel_extension_for_pytorch (#571)
|
7 ماه پیش |
AlpinDale
|
e2dbe5f05c
feat: add sparse marlin for compressed tensors
|
7 ماه پیش |
AlpinDale
|
e2e64a6241
fix: limit numpy version
|
7 ماه پیش |
AlpinDale
|
e4407bbcb7
fix: do not start a ray cluster when not using ray
|
7 ماه پیش |
AlpinDale
|
ee174ea4fd
fix: guard for lora + chunked prefill
|
7 ماه پیش |
AlpinDale
|
f9a10145d1
fix: v2 block manager + prefix caching
|
7 ماه پیش |
AlpinDale
|
44331a4d00
chore: improve p2p cache generation
|
7 ماه پیش |
AlpinDale
|
c8c6de64cd
fix: typo in pallas backend
|
7 ماه پیش |
AlpinDale
|
cc3486477e
fix: benign multiprocessing error
|
7 ماه پیش |
AlpinDale
|
c482c09a3a
fix: remove duplicated input processing in async engine
|
7 ماه پیش |
AlpinDale
|
d0afe0cd21
fix: suppress mma.sp warning on CUDA 12.5 and above
|
7 ماه پیش |
AlpinDale
|
a33aaf3b42
chore: cleanup compressed tensors
|
7 ماه پیش |
AlpinDale
|
94f4e278ff
fix: illegal mem access for cutlass fp8 kernels
|
7 ماه پیش |
AlpinDale
|
8c32e49029
feat: add avx2 cpu support
|
7 ماه پیش |
AlpinDale
|
a89c9a0e92
fix: device ordinal issues with world_size and stuff
|
7 ماه پیش |
AlpinDale
|
ab7f4ed6e5
chore: revert commit for removing unnecessary copies in flash attn backend
|
7 ماه پیش |
AlpinDale
|
06ed127441
fix: do not raise optimization warning for fp8 quant
|
7 ماه پیش |