Skip to content

feat: add DataFrame support as RLM context payload#1

Closed
kmad wants to merge 5 commits intomainfrom
feature/rlm-dataframe-support
Closed

feat: add DataFrame support as RLM context payload#1
kmad wants to merge 5 commits intomainfrom
feature/rlm-dataframe-support

Conversation

@kmad
Copy link
Copy Markdown
Owner

@kmad kmad commented Mar 10, 2026

Summary

  • Pass pandas DataFrames directly to rlm.completion(prompt=df) — serialized via Parquet so the LLM's REPL gets a real DataFrame, not a stringified table
  • QueryMetadata generates shape, dtypes, nulls, and sample rows for the system prompt so the LLM knows the schema before writing code
  • All five environments (Local, Docker, Modal, Prime, Daytona) handle DataFrames: Local/Docker write Parquet to disk, Modal/Prime/Daytona use base64-inline codegen
  • Docker installs pandas/pyarrow on demand (only when a DataFrame is actually passed)
  • New rlm/utils/dataframe_utils.py with detection, serialization, metadata, and codegen helpers
  • 158 lines of new tests across test_dataframe_utils.py, test_local_repl.py, test_types.py

Known limitation

Persistent mode's _setup_prompt rebuilds the system prompt from scratch each call using only the latest prompt's metadata. When using persistent=True with multiple completion() calls, the LLM loses schema info for earlier contexts (e.g. context_0). The build_user_prompt does note the context count, but the detailed metadata (dtypes, shape, sample rows) is only shown for the most recent context. This is a pre-existing limitation of persistent mode, not introduced by DataFrame changes.

Test plan

  • pytest tests/test_dataframe_utils.py — detection, Parquet roundtrip, metadata, codegen
  • pytest tests/test_local_repl.py — DataFrame load/add in LocalREPL
  • pytest tests/test_types.py — QueryMetadata for DataFrames
  • pytest tests/ — full suite (71 tests pass)
  • Run examples/rlm_dataframe_demo.ipynb end-to-end

🤖 Generated with Claude Code

kmad and others added 5 commits March 8, 2026 14:55
Adds pandas DataFrame as a first-class context type for RLM. DataFrames
are serialized via Parquet for type-preserving transfer to all
environments (local, Docker, Modal, Prime, Daytona). Includes metadata
generation for LLM prompts with shape, dtypes, nulls, and sample rows.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Frames

Consolidates the two identical `if cols > 20` blocks into one and
applies the same column truncation to head/tail sample rows, preventing
huge metadata strings for wide DataFrames.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Only install pandas and pyarrow in the Docker container when a DataFrame
context is actually passed, avoiding the install overhead for string/dict
contexts. Uses a lazy _ensure_pandas() method called from load_context.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move the top-level import of dataframe_utils in core/types.py into
QueryMetadata.__init__ so the module is only loaded when actually
constructing query metadata, not on every import of rlm.core.types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…egen

Remove the unified build_context_code() which handled strings, dicts,
and DataFrames with fragile escaping. Keep only build_dataframe_context_code()
for DataFrame-specific Parquet-over-base64 codegen. Restore original
per-environment string/dict handling in Modal, Prime, and Daytona REPLs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kmad
Copy link
Copy Markdown
Owner Author

kmad commented Mar 10, 2026

Closing - will PR to upstream repo instead

@kmad kmad closed this Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant