feat: add DataFrame support as RLM context payload by kmad · Pull Request #1 · kmad/rlm_official

kmad · 2026-03-10T01:44:05Z

Summary

Pass pandas DataFrames directly to rlm.completion(prompt=df) — serialized via Parquet so the LLM's REPL gets a real DataFrame, not a stringified table
QueryMetadata generates shape, dtypes, nulls, and sample rows for the system prompt so the LLM knows the schema before writing code
All five environments (Local, Docker, Modal, Prime, Daytona) handle DataFrames: Local/Docker write Parquet to disk, Modal/Prime/Daytona use base64-inline codegen
Docker installs pandas/pyarrow on demand (only when a DataFrame is actually passed)
New rlm/utils/dataframe_utils.py with detection, serialization, metadata, and codegen helpers
158 lines of new tests across test_dataframe_utils.py, test_local_repl.py, test_types.py

Known limitation

Persistent mode's _setup_prompt rebuilds the system prompt from scratch each call using only the latest prompt's metadata. When using persistent=True with multiple completion() calls, the LLM loses schema info for earlier contexts (e.g. context_0). The build_user_prompt does note the context count, but the detailed metadata (dtypes, shape, sample rows) is only shown for the most recent context. This is a pre-existing limitation of persistent mode, not introduced by DataFrame changes.

Test plan

pytest tests/test_dataframe_utils.py — detection, Parquet roundtrip, metadata, codegen
pytest tests/test_local_repl.py — DataFrame load/add in LocalREPL
pytest tests/test_types.py — QueryMetadata for DataFrames
pytest tests/ — full suite (71 tests pass)
Run examples/rlm_dataframe_demo.ipynb end-to-end

🤖 Generated with Claude Code

Adds pandas DataFrame as a first-class context type for RLM. DataFrames are serialized via Parquet for type-preserving transfer to all environments (local, Docker, Modal, Prime, Daytona). Includes metadata generation for LLM prompts with shape, dtypes, nulls, and sample rows. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…Frames Consolidates the two identical `if cols > 20` blocks into one and applies the same column truncation to head/tail sample rows, preventing huge metadata strings for wide DataFrames. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Only install pandas and pyarrow in the Docker container when a DataFrame context is actually passed, avoiding the install overhead for string/dict contexts. Uses a lazy _ensure_pandas() method called from load_context. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move the top-level import of dataframe_utils in core/types.py into QueryMetadata.__init__ so the module is only loaded when actually constructing query metadata, not on every import of rlm.core.types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…egen Remove the unified build_context_code() which handled strings, dicts, and DataFrames with fragile escaping. Keep only build_dataframe_context_code() for DataFrame-specific Parquet-over-base64 codegen. Restore original per-environment string/dict handling in Modal, Prime, and Daytona REPLs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kmad · 2026-03-10T01:46:30Z

Closing - will PR to upstream repo instead

kmad and others added 5 commits March 8, 2026 14:55

kmad closed this Mar 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add DataFrame support as RLM context payload#1

feat: add DataFrame support as RLM context payload#1
kmad wants to merge 5 commits intomainfrom
feature/rlm-dataframe-support

kmad commented Mar 10, 2026

Uh oh!

kmad commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kmad commented Mar 10, 2026

Summary

Known limitation

Test plan

Uh oh!

kmad commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant