Skip to content

fix: route around stuck flux-schnell, bound latency, tame hands#73

Merged
meninoebom merged 1 commit into
mainfrom
feldspar-jojoba
May 13, 2026
Merged

fix: route around stuck flux-schnell, bound latency, tame hands#73
meninoebom merged 1 commit into
mainfrom
feldspar-jojoba

Conversation

@meninoebom
Copy link
Copy Markdown
Owner

Summary

  • Bug: Replicate's canonical black-forest-labs/flux-schnell model has been accepting predictions but never scheduling them. With no timeout on replicate.run, the FastAPI request hung forever — browser spinner stuck on "Generating...", no error ever surfaced. Confirmed by inspecting prod predictions: six were stuck in starting state with no started_at (oldest 3+ hours old); sibling flux-schnell-lora succeeded on the same account in 0.6s.
  • Fix: Switched to flux-schnell-lora (same Flux Schnell weights, different worker pool). It OOMs at num_outputs>1, so we now fan out one single-output prediction per candidate, in parallel, with a 60s hard timeout each. Any single failure now costs one of four candidates, never the whole batch — and a fully degraded upstream errors out in 60s instead of hanging forever.
  • Hands: Tightened style suffix to require calm/static hands (at sides, folded, or holding a single object — never reaching, gesturing, or pointing). Flux ignores negative prompts but responds well to positive anatomical framing.

Test plan

  • uv run pytest tests/test_images.py — 38/38 pass (added partial-failure test and timeout test)
  • uv run pytest — full suite 177/177 pass
  • After deploy: trigger Generate on a theme, confirm 4 candidates appear in <10s
  • After deploy: regenerate a theme that previously rendered weird hands, eyeball anatomical improvement
  • After deploy: cancel the 6 stuck Replicate predictions (separate cleanup, not blocking)

🤖 Generated with Claude Code

@railway-app
Copy link
Copy Markdown

railway-app Bot commented May 13, 2026

🚅 Deployed to the breadcrumbs-pr-73 environment in Breadcrumbs

Service Status Web Updated (UTC)
Breadcrumbs Web App Server 🕒 Building (View Logs) Web May 13, 2026 at 1:44 am

@railway-app railway-app Bot temporarily deployed to Breadcrumbs / breadcrumbs-pr-73 May 13, 2026 01:30 Destroyed
…hands

Replicate's canonical `black-forest-labs/flux-schnell` model has been accepting
predictions but never scheduling them, leaving generate-image requests hanging
indefinitely (no timeout on replicate.run, so uvicorn workers stalled and the
browser spun forever). The sibling `flux-schnell-lora` runs the same weights on
a different pool and works — but OOMs at num_outputs>1, so we now fan out one
single-output prediction per candidate, in parallel, with a 60s per-call timeout.
Any single failure now costs one of four candidates instead of the whole batch.

Also tightened the style suffix: hands must rest calmly, be folded, or hold a
single object — never reaching, gesturing, or pointing. Flux ignores negative
prompts but responds well to positive anatomical framing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to Breadcrumbs / breadcrumbs-pr-73 May 13, 2026 01:44 Destroyed
@meninoebom meninoebom merged commit eda40fd into main May 13, 2026
2 of 3 checks passed
@meninoebom meninoebom deleted the feldspar-jojoba branch May 13, 2026 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant