daytonaio · Dowser · May 20, 2026
diff --git a/authors/markus_reimer.md b/authors/markus_reimer.md
@@ -0,0 +1,7 @@
+Author: Markus Reimer
+Title: Software Engineer
+Description: Markus Reimer is a software engineer and open-source contributor focused on pragmatic AI-assisted development, developer workflows, and maintainable automation for engineering teams.
+Author Image: <https://avatars.githubusercontent.com/u/22987960?v=4>
+Author Twitter: <https://twitter.com/markusreimer>
+Company Name: Agilenge AB
+Company Description: Agilenge AB builds pragmatic software and automation for engineering and business teams.
diff --git a/definitions/20260520_definition_model_api_transcription.md b/definitions/20260520_definition_model_api_transcription.md
@@ -0,0 +1,20 @@
+---
+title: "Model API Transcription"
+description: "A speech-to-text workflow that sends prepared audio to a hosted model API and stores the returned transcript."
+date: 2026-05-20
+author: "Markus Reimer"
+---
+
+# Model API Transcription
+
+## Definition
+
+Model API transcription is the process of converting speech to text by sending an audio file to a hosted AI model endpoint and receiving a transcript response.
+
+## Context and Usage
+
+Engineering teams use model API transcription when they want speech-to-text capabilities without operating their own inference infrastructure. A local tool or backend prepares audio, uploads it to a provider, waits for inference, and saves the returned text.
+
+This pattern is useful for demo recordings, product interviews, meeting notes, support calls, accessibility drafts, and transcript archives. Teams still need to manage credentials carefully, check provider limits, and verify transcript quality.
+
+They also need to decide which recordings are appropriate for third-party processing.
diff --git a/guides/20260520_fal_ai_transcription_with_sapat_daytona.md b/guides/20260520_fal_ai_transcription_with_sapat_daytona.md
@@ -0,0 +1,249 @@
+---
+title: "fal.ai Transcription with Sapat"
+description: "Run Sapat with fal.ai Whisper in a Daytona workspace to create a reproducible speech-to-text workflow."
+date: 2026-05-20
+author: "Markus Reimer"
+tags: ["daytona", "sapat", "fal-ai", "transcription", "whisper"]
+---
+
+# fal.ai Transcription with Sapat
+
+## Introduction
+
+AI transcription often starts as a single command on one developer's machine. That works for a quick demo, but it breaks down when the workflow needs to be repeated, reviewed, or handed to another engineer.
+
+Local Python versions differ, `ffmpeg` may be missing, and API credentials can end up in shell history or temporary scripts.
+
+This guide shows how to run Sapat, a small Python video transcription tool, inside a Daytona workspace with a fal.ai Whisper provider. Daytona gives the workflow a clean development environment, while Sapat handles audio conversion, provider selection, and transcript file creation.
+
+The fal.ai provider used in this guide is implemented in the companion Sapat pull request: [nibzard/sapat#30](https://github.com/nibzard/sapat/pull/30). While that PR is under review, use the contributor branch shown below. After it is merged, use the upstream Sapat repository directly.
+
+![fal.ai-backed Sapat workflow in Daytona](assets/20260520_fal_sapat_daytona_workflow.svg)
+
+## TL;DR
+
+- Create a Daytona workspace so the transcription workflow is reproducible.
+- Install Sapat and `ffmpeg` inside the workspace.
+- Configure `FAL_KEY` as an environment variable instead of committing secrets.
+- Run `sapat <recording> --api fal` to transcribe audio through fal.ai Whisper.
+- Review a short smoke clip before processing a batch of recordings.
+
+## Materials Checklist
+
+| Item | Why you need it |
+| --- | --- |
+| Daytona installed and configured | Creates a clean workspace for Sapat |
+| Python 3.9 or newer in the workspace | Runs Sapat and the fal.ai client |
+| `ffmpeg` | Converts video inputs to MP3 before transcription |
+| fal.ai API key | Authenticates the hosted Whisper model call |
+| One short `.mp4`, `.mp3`, `.wav`, or `.m4a` sample | Verifies the workflow before a larger run |
+
+## Step 1: Create a Daytona Workspace
+
+Start with a clean workspace instead of depending on local machine state. If the companion Sapat PR is still under review, create the workspace from the contributor fork and switch to the fal.ai provider branch:
+
+```bash
+daytona create https://github.com/Dowser/sapat --code
+```
+
+Inside the workspace terminal, check out the provider branch:
+
+```bash
+git fetch origin codex/fal-transcription-provider
+git checkout codex/fal-transcription-provider
+```
+
+After [nibzard/sapat#30](https://github.com/nibzard/sapat/pull/30) is merged, use the upstream project instead:
+
+```bash
+daytona create https://github.com/nibzard/sapat --code
+```
+
+This keeps the checkout, dependency metadata, and CLI surface consistent for everyone who needs to reproduce the transcription workflow.
+
+## Step 2: Install Runtime Dependencies
+
+Sapat converts video files to MP3 before passing audio to a transcription provider. Install `ffmpeg` in the workspace:
+
+```bash
+sudo apt-get update
+sudo apt-get install -y ffmpeg
+```
+
+Install Sapat from the checked-out project:
+
+```bash
+python -m pip install -e .
+```
+
+The fal.ai provider adds the `fal-client` Python package to Sapat's project dependencies. The client handles authentication through `FAL_KEY`, uploads local audio to fal.ai storage, and calls the hosted model with `fal_client.subscribe`.
+
+Confirm that the CLI exposes the fal.ai provider:
+
+```bash
+sapat --help
+```
+
+The `--api` option should include `fal`:
+
+```text
+--api [openai|groq|azure|fal]
+```
+
+## Step 3: Configure fal.ai Credentials
+
+Create an API key in the fal.ai dashboard and expose it to the Daytona workspace as an [environment variable](../definitions/20241126_definition_environment_variables.md):
+
+```bash
+export FAL_KEY="your-fal-api-key"
+```
+
+You can also configure the provider with optional variables:
+
+```bash
+export FAL_MODEL_ID="fal-ai/whisper"
+export FAL_TASK="transcribe"
+export FAL_CHUNK_LEVEL="segment"
+export FAL_BATCH_SIZE="64"
+```
+
+Keep real keys out of source code, `.env` files committed to Git, PR descriptions, issue comments, and terminal logs. The fal.ai client reads `FAL_KEY` automatically, so there is no reason to hard-code it in the Sapat project.
+
+## Step 4: Prepare a Smoke Clip
+
+Before processing a meeting archive or a directory of product demos, use a short recording with content you can verify by ear. A thirty-second clip is enough to test the full path:
+
+- The workspace can run Sapat.
+- `ffmpeg` can read and convert the file.
+- Sapat routes the request to the fal.ai provider.
+- fal.ai returns transcript text.
+- Sapat writes the `.txt` file next to the input.
+
+Create a small recordings directory:
+
+```bash
+mkdir -p recordings
+cp ~/Downloads/demo-call.mp4 recordings/demo-call.mp4
+```
+
+Sapat writes the transcript beside the input file:
+
+```text
+recordings/
+  demo-call.mp4
+  demo-call.txt
+```
+
+If your source file is already audio, Sapat can still process it after conversion. The fal.ai Whisper API supports common formats such as MP3, MP4, MPEG, MPGA, M4A, WAV, and WebM.
+
+## Step 5: Run Sapat with fal.ai Whisper
+
+Run the smoke clip through the fal.ai provider:
+
+```bash
+sapat recordings/demo-call.mp4 --api fal --quality M --language en --prompt "Daytona, Sapat, fal.ai"
+```
+
+Under the hood, the provider does three things:
+
+1. Validates that `FAL_KEY` is present and that the local audio file is a supported format.
+2. Uploads the prepared audio file through `fal_client.upload_file`.
+3. Calls `fal_client.subscribe("fal-ai/whisper", arguments={...})` with `audio_url`, `task`, `chunk_level`, `batch_size`, and optional prompt/language settings.
+
+The fal.ai Whisper model returns a JSON response with a `text` field and optional chunk metadata. Sapat's existing base writer stores the `text` value in `recordings/demo-call.txt`.
+
+Review the first transcript:
+
+```bash
+sed -n '1,80p' recordings/demo-call.txt
+```
+
+Check the details that usually decide whether a transcript is usable:
+
+- Product names and acronyms are recognizable.
+- The first and last spoken sections are present.
+- The transcript is not empty or replaced by an error phrase.
+- The output language matches the recording.
+- Any domain-specific terms from the prompt appear more consistently.
+
+## Step 6: Process a Small Batch
+
+After the smoke test passes, process a directory of `.mp4` recordings:
+
+```bash
+sapat recordings --api fal --quality M --language en
+```
+
+Start with a small batch before sending a large set of recordings. Provider quotas, source audio quality, and model behavior are easier to debug with three files than with fifty.
+
+A simple review loop helps you catch obvious failures quickly:
+
+```bash
+for transcript in recordings/*.txt; do
+  echo "===== $transcript ====="
+  sed -n '1,40p' "$transcript"
+done
+```
+
+For production use, keep a manifest next to the outputs:
+
+```markdown
+# demo-call
+
+- Source file: recordings/demo-call.mp4
+- Provider: fal.ai
+- Model: fal-ai/whisper
+- Task: transcribe
+- Chunk level: segment
+- Reviewed by: <name>
+- Known issues: "AcmeDB" appears once as "Acme DB"
+```
+
+That small note turns a raw transcript into an engineering artifact. A teammate can see which [model API transcription](../definitions/20260520_definition_model_api_transcription.md) workflow created it and what still needs review.
+
+## Step 7: Decide Where Correction Belongs
+
+The fal.ai provider in the companion Sapat PR focuses on transcription only. It does not add a second chat-based correction path.
+
+That separation is intentional. Transcription and correction have different failure modes. The first pass should answer, "Did we capture the spoken words?" The second pass should answer, "Did we normalize names, acronyms, punctuation, and formatting correctly?"
+
+If your team needs correction, run it as a separate review step with an approved model and a clear prompt. Keep the original transcript so reviewers can compare the corrected version against the provider output.
+
+## Common Issues and Troubleshooting
+
+**Problem:** `FAL_KEY is required for fal.ai transcription.`
+
+**Solution:** Export `FAL_KEY` in the workspace shell or set it through your team's secret-management workflow. Do not paste the key into PR comments or issue threads.
+
+**Problem:** `ffmpeg` is missing.
+
+**Solution:** Install it with `sudo apt-get install -y ffmpeg`. For repeated use, add `ffmpeg` to the workspace image or dev container configuration.
+
+**Problem:** The transcript file is empty.
+
+**Solution:** Confirm that the source recording has an audio stream. Run `ffprobe recordings/demo-call.mp4` and try a shorter sample before processing a full batch.
+
+**Problem:** fal.ai rejects the request.
+
+**Solution:** Check that `FAL_KEY` is valid, `FAL_MODEL_ID` is set to `fal-ai/whisper`, and `FAL_BATCH_SIZE` is between 1 and 64.
+
+**Problem:** Domain terms are spelled inconsistently.
+
+**Solution:** Add a short prompt with product names, team names, and acronyms. Then run a separate human or model-assisted correction pass before downstream use.
+
+## Conclusion
+
+Running Sapat with fal.ai Whisper inside Daytona gives engineering teams a repeatable speech-to-text workflow instead of a one-off local command. Daytona keeps the environment clean, Sapat handles file conversion and provider routing, and fal.ai hosts the Whisper model behind a simple model API.
+
+The practical habit is to keep the workflow auditable: isolate credentials, start with a short smoke clip, inspect outputs before a batch run, and package transcripts with provider metadata.
+
+That makes the transcript useful beyond the first command that generated it.
+
+## References
+
+- [Sapat repository](https://github.com/nibzard/sapat)
+- [Companion Sapat fal.ai provider PR](https://github.com/nibzard/sapat/pull/30)
+- [fal.ai Whisper API reference](https://fal.ai/docs/model-api-reference/audio-api/whisper)
+- [fal.ai client setup](https://fal.ai/docs/model-apis/client)
+- [fal.ai authentication guide](https://fal.ai/docs/model-apis/authentication)
+- [Daytona documentation](https://www.daytona.io/docs/)
diff --git a/guides/assets/20260520_fal_sapat_daytona_workflow.svg b/guides/assets/20260520_fal_sapat_daytona_workflow.svg