blockops-proxy

An async HTTP proxy that sits between OpenAI-compatible clients (agents, Slack gateways, chat UIs) and local LLM serving backends (MLX, vLLM, llama.cpp-style servers). Its primary job is to translate text-format tool calls emitted by raw model output into the structured tool_calls shape that OpenAI clients expect — no more brittle regex hacks on the client side.

Why this exists

Many locally-served models emit tool calls as plain text (<tool_call> tags, TOOL_CALL: JSON lines, or XML-style blocks) because their chat templates weren't trained against the OpenAI tool_calls JSON schema. Agents and gateways built against the OpenAI API don't know how to parse that. This proxy does the translation transparently, so you can point any OpenAI-compatible client at a local model and have tool-calling Just Work.

Architecture

   OpenAI-compatible client
            │
            ▼
   blockops-proxy  :8080
            │
            ▼
   local LLM backend  :8082
   (MLX / vLLM / llama.cpp / ...)

The proxy terminates the client's HTTP/SSE connection, forwards the request upstream, parses the streamed output, rewrites tool-call text into structured tool_calls chunks, and relays everything back to the client.

Features

Tool-call text-format translation — parses 3 formats and emits OpenAI tool_calls
SSE streaming with heartbeats — keeps client sockets alive during tool-call latency and cold starts
Concurrency gating — MAX_CONCURRENT cap prevents backend saturation on single-GPU hosts
Context truncation — soft/hard token thresholds drop middle-of-history turns before the backend OOMs
Memory-pressure warnings — injects visible notices when system RAM gets tight (uses psutil)
KV cache purge on session reset — hits backend DELETE /v1/cache when the client signals /new
Traffic logging — every request/response to a configurable log file

Supported tool-call formats

1. <tool_call> tags with JSON body (Qwen-Agent style)

<tool_call>
{"name": "search", "arguments": {"query": "weather"}}
</tool_call>

2. TOOL_CALL: line-prefix JSON

TOOL_CALL: {"name": "search", "arguments": {"query": "weather"}}

3. XML-style function/parameter blocks

<tool_call><function=search><parameter=query>weather</parameter></function></tool_call>

All three are detected, extracted, stripped from the visible content, and re-emitted as OpenAI tool_calls deltas on the SSE stream.

Configuration

All configuration is via environment variables:

Variable	Default	Description
`BACKEND_URL`	`http://127.0.0.1:8082`	Upstream LLM server URL
`PORT`	`8080`	Port this proxy listens on
`MODEL_NAME`	(empty)	Optional display name shown in session banner
`BLOCKOPS_LOG_FILE`	`~/.blockops/proxy.log`	Traffic log destination
`MAX_CONCURRENT`	`3`	Max concurrent in-flight requests per backend (edit in source)
`TOKEN_WARN_THRESHOLD`	`80000`	Soft context warning threshold (edit in source)
`TOKEN_HARD_THRESHOLD`	`100000`	Hard truncation trigger (edit in source)

Quickstart

git clone https://github.com/trevorgordon981/blockops-proxy.git
cd blockops-proxy
pip install -r requirements.txt

# point the proxy at your local backend and run it
export BACKEND_URL=http://127.0.0.1:8082
export PORT=8080
python3 blockops-proxy.py

Then point any OpenAI-compatible client at http://127.0.0.1:8080/v1:

curl http://127.0.0.1:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"local","messages":[{"role":"user","content":"hi"}],"stream":true}'

Related projects

Part of a self-hosted LLM operations toolkit:

llm-otel-proxy — natural companion layer: put it in front of this proxy for token/cost/latency telemetry
context-bench — characterize the backend this proxy fronts before tuning MAX_CONCURRENT / TOKEN_HARD_THRESHOLD
alfred-infra — infrastructure monitoring that visualizes this proxy's metrics and alerts on concurrency rejections
alfred-rag — RAG stack that fronts-ends through this proxy when serving to OpenAI-compatible clients

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
blockops-proxy.py		blockops-proxy.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

blockops-proxy

Why this exists

Architecture

Features

Supported tool-call formats

Configuration

Quickstart

Related projects

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

blockops-proxy

Why this exists

Architecture

Features

Supported tool-call formats

Configuration

Quickstart

Related projects

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages