AlpinDale
|
65cd99ba89
fix KVCache type
|
8 months ago |
AlpinDale
|
9e52445ba0
formatting
|
8 months ago |
AlpinDale
|
6c43e00e60
add jamba modeling code
|
8 months ago |
AlpinDale
|
4fbb052b34
add jamba config file
|
8 months ago |
AlpinDale
|
a1f18f17e6
modify the cache engine and model runner/worker to support mamba states
|
8 months ago |
AlpinDale
|
f60803384d
move out of ops dir
|
8 months ago |
AlpinDale
|
2ced01bc3e
clean up interfaces and add selective state update triton kernels
|
8 months ago |
AlpinDale
|
7fd3bd4bf2
add selective scan kernels
|
8 months ago |
AlpinDale
|
44b02e94cd
add forward kernels for causal depthwise conv1d
|
8 months ago |
sgsdxzy
|
f3b546e33a
feat: upport twe lm_head for quantized weights (#409)
|
8 months ago |
sgsdxzy
|
214151b04c
fix: max_num_batched_tokens for chunked_prefill (#412)
|
8 months ago |
AlpinDale
|
1dccb03b17
incorrect comparison for hadamard and punica checks
|
8 months ago |
sgsdxzy
|
6a0a6360f1
fix: Allow setting config-path when converting ggufs. (#410)
|
8 months ago |
sgsdxzy
|
fcfb72af24
Support arbitrary model in GGUF. (#381)
|
8 months ago |
AlpinDale
|
bd0ddf1cfe
feat: EETQ quantization (#408)
|
8 months ago |
AlpinDale
|
b1caee23a6
cache the p2p access check for memory saving
|
8 months ago |
AlpinDale
|
373e0d3c01
fix neuron
|
8 months ago |
AlpinDale
|
28bcca2396
incorrect use of monotonic time in metrics logger
|
8 months ago |
AlpinDale
|
4ba273886a
debug logging for distributed_init_method
|
8 months ago |
AlpinDale
|
1270b5567e
triton compile error for flash_attn
|
8 months ago |
AlpinDale
|
f375353026
enable custom_all_reduce by default in llm.py
|
8 months ago |
AlpinDale
|
2d2b43fe00
fix type hint
|
8 months ago |
AlpinDale
|
531969a0b2
move merge_async_iterators to common utils
|
8 months ago |
AlpinDale
|
c18bf116da
fix stop strings not being excluded from outputs
|
8 months ago |
AlpinDale
|
5ab7a159d7
fix formatting for previous commit
|
8 months ago |
AlpinDale
|
b6bbf584ac
fix echo
|
8 months ago |
AlpinDale
|
6e0761ba5d
make init_distributed_environment compatible with init_process_group
|
8 months ago |
AlpinDale
|
083ba7b452
roll back chunked prefill changes to SDPA, isolate cpu worker
|
8 months ago |
AlpinDale
|
8c67b37131
fix docstrings
|
8 months ago |
AlpinDale
|
fe17712f29
fully working chunked prefill
|
8 months ago |