langchain-failover

A tiny, dependency-light primary/secondary failover wrapper for LangChain chat models. Point it at two chat models; it serves from the primary, transparently falls back to the secondary on connection errors, and switches back the moment the primary recovers — and tool-calling keeps working across the failover.

Background: SOC-in-a-Box: One LLM, Eight Hats — the production AI SOC this was extracted from, where it fails a local LLM over to a backup mid-incident.

from langchain_openai import ChatOpenAI
from langchain_failover import FailoverChatModel

primary = ChatOpenAI(base_url="http://gpu-box:8001/v1", api_key="x", model="local")
backup  = ChatOpenAI(base_url="http://cpu-box:8002/v1", api_key="x", model="local")

llm = FailoverChatModel(primary=primary, secondary=backup)

llm.invoke("Summarise this incident…")   # served by primary
# …primary host dies…
llm.invoke("And the next one?")           # transparently served by backup
# …primary comes back…
llm.invoke("One more")                     # back on primary, logged as recovered

Install

pip install langchain-failover            # core
pip install "langchain-failover[openai]"  # + langchain-openai for create_failover_llm

Why not `RunnableWithFallbacks` / `.with_fallbacks()`?

LangChain ships per-invocation fallbacks, and they're great for what they do. This package exists for the cases they don't cover well:

Stateful recovery. FailoverChatModel remembers which leg it's on and logs the transition both ways (active property tells you). .with_fallbacks() is stateless — every call re-tries the (possibly still-dead) primary first.
Tool-calling survives failover. bind_tools is overridden to bind on both legs and return another FailoverChatModel. With strict langchain-core (>=1.4, where BaseChatModel.bind_tools raises by default) naïve wrappers break at bind time; agents using this one keep working.
Connection-aware, not blanket. It only fails over on connection/network errors (walking the exception's __cause__/__context__ chain, so a socket error wrapped three layers deep still counts). A ValueError from a bad prompt propagates instead of being silently retried on a second endpoint.
Mid-stream safety. During stream(), it only fails over if the primary dies before the first token — so you never get duplicated, half-streamed output.

Local-model convenience

If you run local OpenAI-compatible servers (vLLM, mlx-lm, Ollama, LM Studio) and don't want to hardcode model names, create_failover_llm auto-discovers the served model id from each endpoint's /models:

from langchain_failover import create_failover_llm

llm = create_failover_llm(
    primary_url="http://localhost:8001/v1",
    secondary_url="http://localhost:8002/v1",
)

Bonus helper

extract_token_metrics(response.response_metadata) normalises token counts and timings across OpenAI-compatible and Ollama metadata shapes into a single {input_tokens, output_tokens, prompt_time, generation_time} dict.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
scripts		scripts
src/langchain_failover		src/langchain_failover
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

langchain-failover

Install

Why not `RunnableWithFallbacks` / `.with_fallbacks()`?

Local-model convenience

Bonus helper

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

langchain-failover

Install

Why not RunnableWithFallbacks / .with_fallbacks()?

Local-model convenience

Bonus helper

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Why not `RunnableWithFallbacks` / `.with_fallbacks()`?

Packages