Skip to content

olla 0.0.25#6111

Closed
chenrui333 wants to merge 1 commit intomainfrom
bump-olla-0.0.25
Closed

olla 0.0.25#6111
chenrui333 wants to merge 1 commit intomainfrom
bump-olla-0.0.25

Conversation

@chenrui333
Copy link
Copy Markdown
Owner

Created by brew bump


Created with brew bump-formula-pr.

Details

release notes
## What's in this release

Olla is a high-performance proxy and load balancer for LLM infrastructure.

Quick Start

# Docker
docker pull ghcr.io/thushan/olla:v0.0.25

# Binary (see assets below)
./olla --config config.yaml

Release Highlights

Model Aliasing

Thanks to @dnnspaul for contributing the Model Aliasing feature to Olla to alias models easily via the configuration.

Sticky Sessions

We've now got a way of having sticky sessions in Olla to help keep requests aligned to KV Caches across multiple endpoints, taken from the working implementation in TensorFoundry's FoundryOS.

Bugfixes and Chores

Lots of bugfixes and chores from March & April.

Changelog

Features

  • 8378bff0507cf9eaf9370e17e328e4d6face3313: feat: add model alias validation, test coverage, and byte-preserving JSON rewrite (@dnnspaul)
  • pr: sticky-sessions sticky sessions implementation (@thushan)

Bug Fixes

  • 33307eb8d03e6bd47b5cc960f49cf4a88cc1e5de: fix(inspector): copy buffer bytes before pool return to avoid aliasing (@dnnspaul)
  • c54eea97e7d914602f13217c2eef896a59b8cef6: fix(inspector): incremental scan with token-level skipping for field-order independence (@dnnspaul)
  • f7222c9b3413e7eda4aaa0ec6a834c7105247edf: fix(inspector): replace io.NopCloser with readCloser to preserve body Close delegation (@dnnspaul)
  • 2bbb9f7caed122a09ecd9193945576de9eb36dcb: fix(inspector): restore body on error and fix decoder state in extractTopLevelModelField (@dnnspaul)
  • ed1609a2c2149d44996fea92655f5219707e3fba: fix: address CodegRabbit review issues (@thushan)
  • 8a28dbffe913480c189397de2ae5f3813f190dff: fix: extract model name from large requests via streaming JSON prefix scan (@dnnspaul)
  • d92353679a7442eb813a0e0336478f89b4b0008d: fix: propagate model alias rewrite map to translation hanlder (@thushan)
  • 96b67ebdde7a8666f853e7f6c5e72575d3915fc3: fix: replace regex model rewrite with json.Decoder token scanner (@dnnspaul)

Other

  • f4104ef8d06cca72ac5a6f20497be667c1599f39: + documentation (@dnnspaul)
  • df441b325966850943921a41eec6f8c9f24559b4: - OLLA-284: move SetPurgeDeadEndpoints registration from app.go wiring time into applyStickySessions() (@thushan)
  • 8b9e93977af430c0d8ffdb28c870122030426065: CI fix for actions etc. (@thushan)
  • 7e426e94e9e23b27aebb8cc0652edb2ab846d80d: Fix latent rarce on purgedead & replace TTL sleep with poll loop (@thushan)
  • 5268939dc3b78977066b9f6bd2c8b66948e2965e: betteralign adjustments (@dnnspaul)
  • 3cbef317bc76edd7eae1c7d2f1f35131a163ed24: bump x/sync to 0.20.0 (@thushan)
  • 414e5a9d611b3364974edaf1b126cdf78cf9a60a: bump x/time to 0.15.0 (@thushan)
  • 83712ca37b1739348273c68bff9906790ce4795e: concise (@thushan)
  • d15a63d6bf7c1eda8d5985e983aa0a758a718d68: doc updates (@thushan)
  • afb0f2534a5b452e632a1542b1fa4981fceb8cf4: fix leaky test (@thushan)
  • 047cc923c237fe8d3b9eb7af834050f4dfcde75e: fix test failures from typed model key and retry string trim (@thushan)
  • 44999cff834f08744d312f5da5d84e1aef8331bc: fix ttlcache == 0 (@thushan)
  • ca899c56b10f9ccf13903c5c431f440cf7ce9373: guard Cleanup against double-invoke (@thushan)
  • 307b9c4abe3a42b50027f5f14cc94b2fb1423468: guard StopChecking against concurrent double-invoke (@thushan)
  • a99a8bb293ff1d3eed07d895c801fd98a8806f7e: implementation(PR-98): model aliases (@dnnspaul)
  • efb02fa4c65f7d5de0b0a96c19efc343e8ccf32b: initial sticky sessions work (@thushan)
  • 17d167f8cb7157da1618249f895607229069809e: lint test (@thushan)
  • f7174685600b25cadeace04e314e4695c7b86a2b: refactor(inspector): clarify pool-safety asymmetry and add capability regression guard (@dnnspaul)
  • 1c209c0b1831783c768ce01297734ff89213fd3c: refactor: address PR review feedback (round 2) for model aliases (@dnnspaul)
  • 41b544e40fe547e458d4518b4845b4e2f7130523: refactor: address PR review feedback for model aliases (@dnnspaul)
  • 4bbe95b036157c9deec4a77274d8e247b52d98a9: reference updates (@thushan)
  • 6b9e2d09fccee1754e6f06ffe603945e362bf15d: remove dead proxyToSingleEndpointLegacy (@thushan)
  • 29b6bcaffea376cd0876c6b26657ee7e36a9e95a: remove dead responsePool (@thushan)
  • 9c0debcf68e079263872cd76f372695b6a370d0b: reorder Service fields for betteralign (@thushan)
  • 6de613d61a6507bc4c662dfd068de493bbd96a3a: revert x/sync bump, needs go 1.25 (@thushan)
  • ec5d03f6ab419f8d4e30e4a265a6ae0016b8cf59: revert x/time bump, needs go 1.25 (@thushan)
  • da5d8bb66496d1e8a8a6dba0b6acaacd53479ae6: tighten connection error string fallback in retry (@thushan)
  • 01500b78bedf688f907ef413b94e67b937ec8f57: update coderabbit commnts (@thushan)
  • 94c1784fd0ffc8f790fe6e75e1dd6e66f86efadf: update readme (@thushan)
  • ce4aaaa6090033a00fa392740497e0252de32ab8: update readme (@thushan)
  • 8442085d123ca5d3a90cf2076c2020e69f024b8b: update version signature to be a bit more robust (@thushan)
  • c2d9af4db3ddbc51bedc4d0992833931b1585f88: use typed context key for model (@thushan)
  • c9f2223ba6d60dad950320525bc6f08bfa73a7f0: wrong template . (@thushan)

Documentation: thushan.github.io/olla | Issues: github.com/thushan/olla/issues

View the full release notes at https://github.com/thushan/olla/releases/tag/v0.0.25.


@github-actions github-actions Bot added the go Go use is a significant feature of the PR or issue label Apr 17, 2026
@github-actions github-actions Bot deleted the bump-olla-0.0.25 branch April 17, 2026 03:44
@github-actions github-actions Bot closed this in c46685c Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go Go use is a significant feature of the PR or issue pr-pull

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants