david/aphrodite-engine

Author	SHA1 Message	Date
AlpinDale	93cffaf446 add flash_attn back	7 months ago
AlpinDale	f970f3f3fb add base class for VLMs	7 months ago
AlpinDale	9e73559eba make use of batched rotary embedding kernels to support long context lora	7 months ago
AlpinDale	1b86cf6164 navi21 fallback to naive attention	7 months ago
AlpinDale	0dc8492188 relax tiktoken version	7 months ago
AlpinDale	676322dd62 qwen2_moe: mlp_only_layers	7 months ago
AlpinDale	14a2d6f624 fix rope error when loading models with different dtypes	7 months ago
AlpinDale	0c15965621 fix fp8 kv	7 months ago
AlpinDale	2313c97e3d add cutlass w8a8 kernels (#556)	7 months ago
AlpinDale	d4edba99f9 add lora dims for Qwen1.5-32B	7 months ago
AlpinDale	eaa06fdd14 fix some f-strings	7 months ago
AlpinDale	c58589318f remove the graph mode func	7 months ago
AlpinDale	8e11259e90 missing triton autoconfig for rocm flash attn	7 months ago
AlpinDale	c66b1b57b1 Marlin 2:4 sparsity (#555)	7 months ago
AlpinDale	ad1c6b86a1 gptq_marlin: enable bfloat16	7 months ago
AlpinDale	2ecfa98da9 re-fix mistral nemo	7 months ago
AlpinDale	9f3d6205ce fix ray gpu executor	7 months ago
AlpinDale	236be273e5 feat: tensor parallel speculative decoding (#554)	7 months ago
AlpinDale	072b30fb42 measure end time within the cuda memory profiler	7 months ago
AlpinDale	7bcff4ac03 implement sharded state dict	7 months ago
AlpinDale	13e5ffd456 fix distributed_executor_backend in args	7 months ago
AlpinDale	a94de94c44 refactor: combine the prefill and decode into a single API (#553)	7 months ago
AlpinDale	fe431bb840 check for next port if current is unavailable	7 months ago
AlpinDale	033797fd55 refactor throughput benchmark script	7 months ago
AlpinDale	c6a501f682 add multiprocessing executor; make ray optional	7 months ago
AlpinDale	342346afda improve hashing function	7 months ago
AlpinDale	d7c0dd5b50 fix: do not set the weight to fp8 for fp16 checkpoints	7 months ago
AlpinDale	01190e5049 use flash attention for the decoding phase	7 months ago
AlpinDale	e42d0b3455 possibly improve ngram efficiency	7 months ago
AlpinDale	0cea453d36 automatically detect tensorized models	7 months ago

Newer Older

Commit History Find

Commit History