david/aphrodite-engine

Author	SHA1 Message	Date
AlpinDale	be82f93463 refactor: factor out chat message parsing	4 months ago
AlpinDale	87694c8aba feat: add RPC server and client via ZMQ (#615)	4 months ago
AlpinDale	0227f8cf25 revert: incorrect nightly build	4 months ago
AlpinDale	54f3a52609 chore: add env var to enable torch.compile	4 months ago
AlpinDale	cabca7383d fix: use loopback address for single node again	4 months ago
AlpinDale	523ac99aca chore: pipeline parallel with Ray accelerated dag	4 months ago
AlpinDale	141672a0d4 kernels: disambiguate quantized types via a new ScalarType	4 months ago
AlpinDale	7844103e43 build: update torch to 2.4.0 for cpu	4 months ago
AlpinDale	2050b42f3f fix: remove unused code in sampler	4 months ago
AlpinDale	6124140a45 fix: remove error_on_invalid_device_count_status	4 months ago
AlpinDale	8e50f26b71 fix: input shape for flashinfer prefill wrapper	4 months ago
AlpinDale	ebebccea6f chore: optimize get_seqs	4 months ago
AlpinDale	614ca6b0bf feat: support logits soft capping with flash attention backend	4 months ago
AlpinDale	f6959b40ac 'fix: lower gemma's unloaded_params exception to warning	4 months ago
AlpinDale	92987963a4 fix: RMSNorm forward in InternViT attention qk_layernorm	4 months ago
AlpinDale	b15e6376f8 bump to torch 2.4.0, add aphrodite_flash_attn (#614)	4 months ago
AlpinDale	44d81c3a3e chore: bump torch to 2.4.0	4 months ago
AlpinDale	9866af1626 chore: optimize scheduler and remove policy	4 months ago
AlpinDale	3134fcaebc fix: set a default max_tokens for OAI requests	4 months ago
AlpinDale	cd31f8efbb chore: optimize PP comm by replacing send with partial send + allgather	4 months ago
AlpinDale	c9310eeb02 fix: skip loading lm_head for tie_word_embeddings models	4 months ago
AlpinDale	1cd7056c46 fix: don't use torch.generator() for TPU	4 months ago
AlpinDale	d357341203 chore: add pipeline parallel support for Qwen	4 months ago
AlpinDale	98f9dbd734 feat: Triton Kernels for Punica (#613)	4 months ago
AlpinDale	5f23908977 chore; tune cutlass int8 kernels for sm_75	4 months ago
AlpinDale	3ce81e215b chore: enable fp8 cutlass for ada lovelace	4 months ago
AlpinDale	165a3aa7b3 fix: fp8 marlin and cpu offloading with fp8 marlin	4 months ago
AlpinDale	5cb760162c feat: allow loading specific layer numbers per device	4 months ago
AlpinDale	f83eb07fd1 feat: use FusedMoE for jamba	4 months ago
AlpinDale	9e9515f39a fix: feature size calculation for Llava-next	4 months ago

Newer Older

Commit History Find

Commit History