Commit History

Author SHA1 Message Date
  AlpinDale 9bdf8d5bfa mamba: enable continuous batching for mamba kernels (#1055) 1 week ago
  AlpinDale a985143768 core: add cuda graph support for encoder-decoder models (#1051) 1 week ago
  AlpinDale 4593a3b306 chore: remove dead code from triton sampling kernels (#1049) 1 week ago
  AlpinDale 638c08d9dc fix: clean shutdown issues (#1047) 1 week ago
  AlpinDale 65a59bbb6b cpu: raise error if using encoder-decoder models (#1027) 1 week ago
  AlpinDale f1ea7711bd core: do not compile ScalarType for torch < 2.4.0 (#938) 2 weeks ago
  AlpinDale 22a4cd4595 core: fix spec decode metrics and envs circular import (#889) 3 weeks ago
  AlpinDale 901900854e chore: consolidate environment variables within one file (#882) 4 weeks ago
  AlpinDale 9fc6473b18 server: log the process occupying our port (#866) 1 month ago
  AlpinDale 0f1af04cf5 frontend: minor logging improvements (#787) 2 months ago
  AlpinDale 0256ed236b feat: windows support (#790) 2 months ago
  50h100a 371d57af82 filesize-driven progress bar for loading tensors 2 months ago
  AlpinDale 0b8b407b6d feat: support profiling with multiple multi-modal inputs per prompt (#712) 3 months ago
  AlpinDale 5d37ec1016 suppress tpu import warning (#696) 4 months ago
  AlpinDale 4fe371b7fa fix: allow passing float for GiB arguments (#690) 4 months ago
  AlpinDale 3f712cd287 feat: add progress bar for loading individual weight modules (#640) 4 months ago
  AlpinDale 7df7b8ca53 optimization: reduce end-to-end overhead from python obj allocation (#666) 4 months ago
  AlpinDale 62111fab17 feat: allow serving encoder-decoder models in the API server (#664) 4 months ago
  AlpinDale 0e5bb11503 fix: make `merge_async_iterators.is_cancelled()` optional (#656) 4 months ago
  AlpinDale a2344d3617 fix: move zeromq rpc frontend to IPC instead of TCP (#652) 4 months ago
  AlpinDale 31f82da8bd chore: deduplicate nvlink check to cuda platform (#643) 4 months ago
  AlpinDale 77c4fbd5c9 fix: better async request cancellation (#641) 4 months ago
  AlpinDale 308501daa5 fix: default api port and attention selector (#634) 4 months ago
  AlpinDale a0e446a17d feat: initial encoder-decoder support with BART model (#633) 4 months ago
  AlpinDale f1d0b77c92 [0.6.0] Release Candidate (#481) 4 months ago
  AlpinDale 9d81716bfd [v0.5.3] Release Candidate (#388) 8 months ago
  AlpinDale e3252edd07 fix: remove event and stream, add typing (#382) 9 months ago
  AlpinDale 33b3786175 fix: cache neuron checks (#379) 9 months ago
  AlpinDale f8dfac6372 chore: attention refactor and upstream sync apr01 (#365) 9 months ago
  AlpinDale e53842bd5d fix: cuda home detection for fp8 kv cache 9 months ago