david/aphrodite-engine

Author	SHA1 Message	Date
AlpinDale	90cfc55065 wip	3 months ago
AlpinDale	f6f02275d5 wip	3 months ago
AlpinDale	4434c4db84 chore: refactor llama3 rope (#748)	3 months ago
AlpinDale	9d9722b1c1 fix: metrics endpoint with RPC server (#747)	3 months ago
AlpinDale	81c5f196eb chore: various TPU fixes and optimizations (#746)	3 months ago
AlpinDale	89a2c6dee1 chore: refactor `MultiModalConfig` initialization and profiling (#745)	3 months ago
AlpinDale	1068597e8a fix: minor bug fixes & clean-ups (#744)	3 months ago
Geun, Lim	08711d2ac9 feat: add Exaone model support (#743)	3 months ago
AlpinDale	81c28d2a7f fix: use nvml to get consistent device names (#739)	3 months ago
AlpinDale	5559c5886f fix: clear engine ref in RPC server (#738)	3 months ago
AlpinDale	ef3a0f4cb1 fix: `custom_ar` check (#737)	3 months ago
AlpinDale	ccbda97416 fix: types in AQLM and GGUF for dynamo support (#736)	3 months ago
AlpinDale	9296d4b25d feat: dynamo support for ScalarType (#733)	3 months ago
AlpinDale	d9d85eeb6e chore: register lora functions as torch ops (#732)	3 months ago
AlpinDale	7a313483f1 chore: move update_flash_attn_metadata to attn backend (#731)	3 months ago
AlpinDale	d34e083c48 feat: add experts_int8 support (#730)	3 months ago
AlpinDale	b0f262eec1 feat: FP8 quantization support for AMD ROCm (#729)	3 months ago
AlpinDale	c744443679 ci: bump to 0.6.1.post1 (#728)	3 months ago
miku448	9c0e7d95c8 fix: libcudart path for some versions of pytorch (#726)	3 months ago
AlpinDale	4648f16c84 chore: fix return statement in Detokenizer class (#727)	3 months ago
AlpinDale	a286adaeaa feat: launch API server with uvloop (#725)	3 months ago
AlpinDale	60b702a827 chore: register custom torch ops for flash-attn and flashinfer (#724)	3 months ago
AlpinDale	8e0d376f1c ci: bump aphrodite to 0.6.1 (#722)	3 months ago
AlpinDale	12e40ae6fd chore: update grafana template (#721)	3 months ago
AlpinDale	61c7182491 feat: enable prompt logprobs in OpenAI API (#720)	3 months ago
AlpinDale	28b6397188 chore: quant config for speculative draft models (#719)	3 months ago
AlpinDale	8e22069c9e fix: weight loading for scalars (#718)	3 months ago
AlpinDale	d289c3855b fix: install protobuf for cpu (#716)	3 months ago
AlpinDale	008e646c7e chore: add support for up to 2048 block size (#715)	3 months ago
AlpinDale	1c519cc6ac chore: set per-rank XLA cache for TPU (#714)	3 months ago

Newer Older

Commit History Find

Commit History