Skip to content

adimyth/daily-recorder-poc

Repository files navigation

Daily Recorder POC

Headless bot that joins a Daily.co room, records all participants, and produces a composed MP4 with dynamic layout switching — without using Daily's cloud recording API.


Files

File Description
bot_recorder.py Approach 1 — PNG-based recorder. Saves each video frame as a PNG file during recording, then encodes to MP4 in post-processing via ffmpeg concat demuxer. High disk usage (~2 GB per participant for 8 min), slower composition.
bot_recorder_v2.py Approach 2 — Direct MP4 encoder. Encodes video frames directly to MP4 in real time via PyAV. No intermediate PNG files. Low disk usage (~10–15 MB per participant), faster composition. Recommended.
compose_video.py Standalone composition script — use to retry or recompose from saved session files without re-running the bot. Works with output from both approaches.
create_room.py Creates a Daily room and prints the URL.

Approach 1 — PNG-based (bot_recorder.py)

How it works

During recording, each RGBA video frame is saved as a PNG file named by its capture timestamp in microseconds (000000033411.png, etc.). After the session ends:

  1. Per-participant PNGs → MP4 via ffmpeg concat demuxer (exact variable framerate from timestamps)
  2. Per-participant WAV files written from raw PCM
  3. Audio mixed with per-track silence padding for timeline alignment
  4. Timeline windows computed from join/leave timestamps
  5. Segments encoded (one per layout window) via ffmpeg -ss/-t
  6. Segments concatenated into final MP4 with mixed audio
  7. Uploaded to S3

Disk usage

~2.2 GB per participant for an 8-minute session
~4.5 GB total for a 2-participant session

PNG files are kept in recordings/<session_id>/<name>_frames/ and are never deleted.

Run

# Create a room
python create_room.py

# Start the bot
python bot_recorder.py --room-url "https://yourapp.daily.co/<room-name>"

Approach 2 — Direct MP4 encoder (bot_recorder_v2.py) ✅ Recommended

How it works

Frames are encoded directly into an H.264 MP4 container in real time using PyAV as they arrive from Daily's on_video_frame callback. No intermediate files written during recording.

Timebase is 1/90000 (standard H.264/MP4). Each frame's pts is:

pts = int(elapsed_us * 90000 / 1_000_000)

After close_video(), the raw MP4 is re-encoded to CFR 30fps via normalize_video() — this is required because PyAV produces a VFR stream (r_frame_rate=90000/1) that confuses ffmpeg's seek, causing frozen frames in composition. After normalization all downstream ffmpeg operations work correctly.

Post-processing pipeline is the same as Approach 1 from Step 2 onwards.

Disk usage

~10–15 MB per participant for an 8-minute session
~25–30 MB total for a 2-participant session

Run

# Create a room
python create_room.py

# Start the bot
python bot_recorder_v2.py --room-url "https://yourapp.daily.co/<room-name>"

Composition timing (logged automatically)

Step 1 done in Xs        ← close + normalize per-participant videos
Composition done in Xs   ← segments + concat
DONE — total post-processing: Xs

Layout switching

Both approaches produce a composed MP4 where the layout reflects exactly who was present at each moment:

Participants present Layout
1 Full-width (OUTPUT_WIDTH × OUTPUT_HEIGHT)
2 Side-by-side hstack (OUTPUT_WIDTH/2 each)
3+ Equal-width hstack

Example (Aditya joins at T=0, iPhone joins at T=40s, iPhone leaves at T=55s, session ends T=65s):

T=0  → T=40  : Aditya full-width
T=40 → T=55  : Aditya | iPhone  (side-by-side)
T=55 → T=65  : Aditya full-width

Output directory

All files are kept in recordings/<session_id>/ and never deleted:

recordings/<session_id>/
  Aditya_video.mp4              per-participant video
  Aditya_audio.wav              per-participant audio
  Iphone_video.mp4
  Iphone_audio.wav
  Iphone_audio_padded.wav       silence-padded (if iPhone joined later than Aditya)
  mixed.wav                     merged stereo audio (aligned to video timeline)
  session.json                  timeline metadata — join/leave times, file paths
  <session_id>_recording.mp4   final composed output

Committed sample recording

For quick review in this repository, one representative composed output is committed at:

recordings/20260318_132416/20260318_132416_recording.mp4

All other recordings/* content remains ignored to avoid committing large generated artifacts.

session.json example:

{
  "session_id": "20260318_132416",
  "session_dir": "/path/to/recordings/20260318_132416",
  "participants": [
    {
      "name": "Aditya",
      "join_time_s": 4.46,
      "leave_time_s": 54.27,
      "video_path": "...",
      "audio_path": "..."
    },
    {
      "name": "iPhone",
      "join_time_s": 39.01,
      "leave_time_s": 51.49,
      "video_path": "...",
      "audio_path": "..."
    }
  ]
}

Recomposing from saved files (compose_video.py)

If the bot ran but composition failed, or you want to recompose with different settings:

From session.json

python compose_video.py --session recordings/<session_id>/session.json

Output: recordings/<session_id>/<session_id>_composed.mp4

Manual

python compose_video.py \
  --p1-video recordings/<session_id>/Aditya_video.mp4 \
  --p1-name  Aditya \
  --p1-join  4.46 \
  --p1-leave 54.27 \
  --p1-audio recordings/<session_id>/Aditya_audio.wav \
  --p2-video recordings/<session_id>/Iphone_video.mp4 \
  --p2-name  iPhone \
  --p2-join  39.01 \
  --p2-leave 51.49 \
  --p2-audio recordings/<session_id>/Iphone_audio.wav \
  --output   recordings/<session_id>/output.mp4

Join/leave times come from session.json.

If videos need normalization (VFR → CFR)

Videos recorded with an older bot version (before normalize_video()) will have r_frame_rate=90000/1. Re-encode them first:

ffmpeg -y -i recordings/<session_id>/Aditya_video.mp4 \
  -c:v libx264 -preset fast -crf 18 -pix_fmt yuv420p -vf "fps=30" \
  recordings/<session_id>/Aditya_video_fixed.mp4

ffmpeg -y -i recordings/<session_id>/Iphone_video.mp4 \
  -c:v libx264 -preset fast -crf 18 -pix_fmt yuv420p -vf "fps=30" \
  recordings/<session_id>/Iphone_video_fixed.mp4

Then pass the _fixed.mp4 files to compose_video.py.


Setup

pip install -r requirements.txt
brew install ffmpeg   # or: apt install ffmpeg

.env:

AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=ap-south-1
S3_BUCKET=your-bucket
BASE_CDN_URL=https://your-cdn.com   # optional
RECORDINGS_DIR=./recordings          # optional, default: ./recordings
OUTPUT_WIDTH=1280                    # optional
OUTPUT_HEIGHT=720                    # optional
LOG_LEVEL=INFO                       # set to DEBUG for verbose timeline/ffmpeg output

Cost comparison vs Daily cloud recording

For an 8-minute session with 2 participants (Daily pricing, 10k–100k minute tier at $0.004/participant-min):

Component Daily cloud recording Bot solution
Participant minutes (2 users) 2 × 8 × $0.004 = $0.064 2 × 8 × $0.004 = $0.064
Bot participant minutes 1 × 8 × $0.004 = $0.032
Cloud recording 8 × $0.01349 = $0.108 $0.00
Modal compute ~$0.013
Total $0.172 ~$0.109

~37% cheaper per session. At 10,000 sessions/month that's ~$630/month saved, driven entirely by eliminating Daily's cloud recording charge.

About

Daily Bot Recorder POC

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages