Daily Recorder POC

Headless bot that joins a Daily.co room, records all participants, and produces a composed MP4 with dynamic layout switching — without using Daily's cloud recording API.

Files

File	Description
`bot_recorder.py`	Approach 1 — PNG-based recorder. Saves each video frame as a PNG file during recording, then encodes to MP4 in post-processing via ffmpeg concat demuxer. High disk usage (~2 GB per participant for 8 min), slower composition.
`bot_recorder_v2.py`	Approach 2 — Direct MP4 encoder. Encodes video frames directly to MP4 in real time via PyAV. No intermediate PNG files. Low disk usage (~10–15 MB per participant), faster composition. Recommended.
`compose_video.py`	Standalone composition script — use to retry or recompose from saved session files without re-running the bot. Works with output from both approaches.
`create_room.py`	Creates a Daily room and prints the URL.

Approach 1 — PNG-based (`bot_recorder.py`)

How it works

During recording, each RGBA video frame is saved as a PNG file named by its capture timestamp in microseconds (000000033411.png, etc.). After the session ends:

Per-participant PNGs → MP4 via ffmpeg concat demuxer (exact variable framerate from timestamps)
Per-participant WAV files written from raw PCM
Audio mixed with per-track silence padding for timeline alignment
Timeline windows computed from join/leave timestamps
Segments encoded (one per layout window) via ffmpeg -ss/-t
Segments concatenated into final MP4 with mixed audio
Uploaded to S3

Disk usage

~2.2 GB per participant for an 8-minute session
~4.5 GB total for a 2-participant session

PNG files are kept in recordings/<session_id>/<name>_frames/ and are never deleted.

Run

# Create a room
python create_room.py

# Start the bot
python bot_recorder.py --room-url "https://yourapp.daily.co/<room-name>"

Approach 2 — Direct MP4 encoder (`bot_recorder_v2.py`) ✅ Recommended

How it works

Frames are encoded directly into an H.264 MP4 container in real time using PyAV as they arrive from Daily's on_video_frame callback. No intermediate files written during recording.

Timebase is 1/90000 (standard H.264/MP4). Each frame's pts is:

pts = int(elapsed_us * 90000 / 1_000_000)

After close_video(), the raw MP4 is re-encoded to CFR 30fps via normalize_video() — this is required because PyAV produces a VFR stream (r_frame_rate=90000/1) that confuses ffmpeg's seek, causing frozen frames in composition. After normalization all downstream ffmpeg operations work correctly.

Post-processing pipeline is the same as Approach 1 from Step 2 onwards.

Disk usage

~10–15 MB per participant for an 8-minute session
~25–30 MB total for a 2-participant session

Run

# Create a room
python create_room.py

# Start the bot
python bot_recorder_v2.py --room-url "https://yourapp.daily.co/<room-name>"

Composition timing (logged automatically)

Step 1 done in Xs        ← close + normalize per-participant videos
Composition done in Xs   ← segments + concat
DONE — total post-processing: Xs

Layout switching

Both approaches produce a composed MP4 where the layout reflects exactly who was present at each moment:

Participants present	Layout
1	Full-width (OUTPUT_WIDTH × OUTPUT_HEIGHT)
2	Side-by-side hstack (OUTPUT_WIDTH/2 each)
3+	Equal-width hstack

Example (Aditya joins at T=0, iPhone joins at T=40s, iPhone leaves at T=55s, session ends T=65s):

T=0  → T=40  : Aditya full-width
T=40 → T=55  : Aditya | iPhone  (side-by-side)
T=55 → T=65  : Aditya full-width

Output directory

All files are kept in recordings/<session_id>/ and never deleted:

recordings/<session_id>/
  Aditya_video.mp4              per-participant video
  Aditya_audio.wav              per-participant audio
  Iphone_video.mp4
  Iphone_audio.wav
  Iphone_audio_padded.wav       silence-padded (if iPhone joined later than Aditya)
  mixed.wav                     merged stereo audio (aligned to video timeline)
  session.json                  timeline metadata — join/leave times, file paths
  <session_id>_recording.mp4   final composed output

Committed sample recording

For quick review in this repository, one representative composed output is committed at:

recordings/20260318_132416/20260318_132416_recording.mp4

All other recordings/* content remains ignored to avoid committing large generated artifacts.

session.json example:

{
  "session_id": "20260318_132416",
  "session_dir": "/path/to/recordings/20260318_132416",
  "participants": [
    {
      "name": "Aditya",
      "join_time_s": 4.46,
      "leave_time_s": 54.27,
      "video_path": "...",
      "audio_path": "..."
    },
    {
      "name": "iPhone",
      "join_time_s": 39.01,
      "leave_time_s": 51.49,
      "video_path": "...",
      "audio_path": "..."
    }
  ]
}

Recomposing from saved files (`compose_video.py`)

If the bot ran but composition failed, or you want to recompose with different settings:

From session.json

python compose_video.py --session recordings/<session_id>/session.json

Output: recordings/<session_id>/<session_id>_composed.mp4

Manual

python compose_video.py \
  --p1-video recordings/<session_id>/Aditya_video.mp4 \
  --p1-name  Aditya \
  --p1-join  4.46 \
  --p1-leave 54.27 \
  --p1-audio recordings/<session_id>/Aditya_audio.wav \
  --p2-video recordings/<session_id>/Iphone_video.mp4 \
  --p2-name  iPhone \
  --p2-join  39.01 \
  --p2-leave 51.49 \
  --p2-audio recordings/<session_id>/Iphone_audio.wav \
  --output   recordings/<session_id>/output.mp4

Join/leave times come from session.json.

If videos need normalization (VFR → CFR)

Videos recorded with an older bot version (before normalize_video()) will have r_frame_rate=90000/1. Re-encode them first:

ffmpeg -y -i recordings/<session_id>/Aditya_video.mp4 \
  -c:v libx264 -preset fast -crf 18 -pix_fmt yuv420p -vf "fps=30" \
  recordings/<session_id>/Aditya_video_fixed.mp4

ffmpeg -y -i recordings/<session_id>/Iphone_video.mp4 \
  -c:v libx264 -preset fast -crf 18 -pix_fmt yuv420p -vf "fps=30" \
  recordings/<session_id>/Iphone_video_fixed.mp4

Then pass the _fixed.mp4 files to compose_video.py.

Setup

pip install -r requirements.txt
brew install ffmpeg   # or: apt install ffmpeg

.env:

AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=ap-south-1
S3_BUCKET=your-bucket
BASE_CDN_URL=https://your-cdn.com   # optional
RECORDINGS_DIR=./recordings          # optional, default: ./recordings
OUTPUT_WIDTH=1280                    # optional
OUTPUT_HEIGHT=720                    # optional
LOG_LEVEL=INFO                       # set to DEBUG for verbose timeline/ffmpeg output

Cost comparison vs Daily cloud recording

For an 8-minute session with 2 participants (Daily pricing, 10k–100k minute tier at $0.004/participant-min):

Component	Daily cloud recording	Bot solution
Participant minutes (2 users)	2 × 8 × $0.004 = $0.064	2 × 8 × $0.004 = $0.064
Bot participant minutes	—	1 × 8 × $0.004 = $0.032
Cloud recording	8 × $0.01349 = $0.108	$0.00
Modal compute	—	~$0.013
Total	$0.172	~$0.109

~37% cheaper per session. At 10,000 sessions/month that's ~$630/month saved, driven entirely by eliminating Daily's cloud recording charge.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Daily Recorder POC

Files

Approach 1 — PNG-based (`bot_recorder.py`)

How it works

Disk usage

Run

Approach 2 — Direct MP4 encoder (`bot_recorder_v2.py`) ✅ Recommended

How it works

Disk usage

Run

Composition timing (logged automatically)

Layout switching

Output directory

Committed sample recording

Recomposing from saved files (`compose_video.py`)

From session.json

Manual

If videos need normalization (VFR → CFR)

Setup

Cost comparison vs Daily cloud recording

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
recordings/20260318_132416		recordings/20260318_132416
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
bot_recorder.py		bot_recorder.py
bot_recorder_v2.py		bot_recorder_v2.py
compose_video.py		compose_video.py
create_room.py		create_room.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Daily Recorder POC

Files

Approach 1 — PNG-based (bot_recorder.py)

How it works

Disk usage

Run

Approach 2 — Direct MP4 encoder (bot_recorder_v2.py) ✅ Recommended

How it works

Disk usage

Run

Composition timing (logged automatically)

Layout switching

Output directory

Committed sample recording

Recomposing from saved files (compose_video.py)

From session.json

Manual

If videos need normalization (VFR → CFR)

Setup

Cost comparison vs Daily cloud recording

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Approach 1 — PNG-based (`bot_recorder.py`)

Approach 2 — Direct MP4 encoder (`bot_recorder_v2.py`) ✅ Recommended

Recomposing from saved files (`compose_video.py`)

Packages