AlpinDale
|
c577c31aaa
feat: tree attention
|
9 months ago |
50h100a
|
f67b5be198
chore: port sampler+metadata changes from main to dev (#427)
|
9 months ago |
AlpinDale
|
8c67b37131
fix docstrings
|
9 months ago |
AlpinDale
|
fe17712f29
fully working chunked prefill
|
9 months ago |
AlpinDale
|
50c2434267
move megatron to a top-level directory
|
9 months ago |
AlpinDale
|
071269e406
feat: FP8 E4M3 KV Cache (#405)
|
9 months ago |
AlpinDale
|
f845a661dd
Chunked Prefill Part 2: data update
|
9 months ago |
AlpinDale
|
5f851e45e5
ruff
|
9 months ago |
AlpinDale
|
41f5af0426
add python nccl wrapper, remove cupy
|
9 months ago |
AlpinDale
|
7b9c08afae
vision model support
|
9 months ago |
AlpinDale
|
0f1399c135
feat: attention refactor part 2
|
9 months ago |
AlpinDale
|
2319b411ce
refactor: neuron support
|
9 months ago |
AlpinDale
|
15308ffb5b
compute logits in model_runner
|
9 months ago |
AlpinDale
|
78d66f16d1
Chunked Prefill Part 1 (#384)
|
9 months ago |
AlpinDale
|
9181fa0396
feat: Triton kernels for sampling (#383)
|
9 months ago |
AlpinDale
|
4b99ac15b7
fix: do not deepcopy metadata
|
9 months ago |
AlpinDale
|
17b034613d
chore: make metadata a dataclass (#377)
|
9 months ago |
AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
9 months ago |
50h100a
|
b9e0ae87c5
fix fine-grained seeding.
|
10 months ago |
AlpinDale
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
10 months ago |
sgsdxzy
|
50c0875c32
chore: log total memory usage (#316)
|
10 months ago |
AlpinDale
|
c2d77b1822
chore: logging refactor (#302)
|
10 months ago |
AlpinDale
|
9810daa699
feat: INT8 KV Cache (#298)
|
10 months ago |
AlpinDale
|
ac82b67f75
feat: naive context shift and various QoL changes (#289)
|
10 months ago |
AlpinDale
|
4d04ade9ef
feat: fine-grained seeds (#279)
|
11 months ago |
AlpinDale
|
697c06c4f5
fix: LoRA support for mixtral (#276)
|
11 months ago |
AlpinDale
|
4b80b42362
fix: memory leaks due to nccl cuda graphs (#275)
|
11 months ago |
AlpinDale
|
ea0f57b233
feat: allow further support for non-cuda devices (#247)
|
11 months ago |
AlpinDale
|
1a94ccf3cf
fix: prefix cache fail with lora (#239)
|
11 months ago |
AlpinDale
|
c3a221eb02
feat: GGUF, QuIP#, and Marlin support (#228)
|
1 year ago |