AlpinDale
|
be82f93463
refactor: factor out chat message parsing
|
4 months ago |
AlpinDale
|
87694c8aba
feat: add RPC server and client via ZMQ (#615)
|
4 months ago |
AlpinDale
|
0227f8cf25
revert: incorrect nightly build
|
4 months ago |
AlpinDale
|
54f3a52609
chore: add env var to enable torch.compile
|
4 months ago |
AlpinDale
|
cabca7383d
fix: use loopback address for single node again
|
4 months ago |
AlpinDale
|
523ac99aca
chore: pipeline parallel with Ray accelerated dag
|
4 months ago |
AlpinDale
|
141672a0d4
kernels: disambiguate quantized types via a new ScalarType
|
4 months ago |
AlpinDale
|
7844103e43
build: update torch to 2.4.0 for cpu
|
4 months ago |
AlpinDale
|
2050b42f3f
fix: remove unused code in sampler
|
4 months ago |
AlpinDale
|
6124140a45
fix: remove error_on_invalid_device_count_status
|
4 months ago |
AlpinDale
|
8e50f26b71
fix: input shape for flashinfer prefill wrapper
|
4 months ago |
AlpinDale
|
ebebccea6f
chore: optimize get_seqs
|
4 months ago |
AlpinDale
|
614ca6b0bf
feat: support logits soft capping with flash attention backend
|
4 months ago |
AlpinDale
|
f6959b40ac
'fix: lower gemma's unloaded_params exception to warning
|
4 months ago |
AlpinDale
|
92987963a4
fix: RMSNorm forward in InternViT attention qk_layernorm
|
4 months ago |
AlpinDale
|
b15e6376f8
bump to torch 2.4.0, add aphrodite_flash_attn (#614)
|
4 months ago |
AlpinDale
|
44d81c3a3e
chore: bump torch to 2.4.0
|
4 months ago |
AlpinDale
|
9866af1626
chore: optimize scheduler and remove policy
|
4 months ago |
AlpinDale
|
3134fcaebc
fix: set a default max_tokens for OAI requests
|
4 months ago |
AlpinDale
|
cd31f8efbb
chore: optimize PP comm by replacing send with partial send + allgather
|
4 months ago |
AlpinDale
|
c9310eeb02
fix: skip loading lm_head for tie_word_embeddings models
|
4 months ago |
AlpinDale
|
1cd7056c46
fix: don't use torch.generator() for TPU
|
4 months ago |
AlpinDale
|
d357341203
chore: add pipeline parallel support for Qwen
|
4 months ago |
AlpinDale
|
98f9dbd734
feat: Triton Kernels for Punica (#613)
|
4 months ago |
AlpinDale
|
5f23908977
chore; tune cutlass int8 kernels for sm_75
|
4 months ago |
AlpinDale
|
3ce81e215b
chore: enable fp8 cutlass for ada lovelace
|
4 months ago |
AlpinDale
|
165a3aa7b3
fix: fp8 marlin and cpu offloading with fp8 marlin
|
4 months ago |
AlpinDale
|
5cb760162c
feat: allow loading specific layer numbers per device
|
4 months ago |
AlpinDale
|
f83eb07fd1
feat: use FusedMoE for jamba
|
4 months ago |
AlpinDale
|
9e9515f39a
fix: feature size calculation for Llava-next
|
4 months ago |