feat: add DataFrame support as RLM context payload#1
Closed
Conversation
Adds pandas DataFrame as a first-class context type for RLM. DataFrames are serialized via Parquet for type-preserving transfer to all environments (local, Docker, Modal, Prime, Daytona). Includes metadata generation for LLM prompts with shape, dtypes, nulls, and sample rows. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Frames Consolidates the two identical `if cols > 20` blocks into one and applies the same column truncation to head/tail sample rows, preventing huge metadata strings for wide DataFrames. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Only install pandas and pyarrow in the Docker container when a DataFrame context is actually passed, avoiding the install overhead for string/dict contexts. Uses a lazy _ensure_pandas() method called from load_context. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move the top-level import of dataframe_utils in core/types.py into QueryMetadata.__init__ so the module is only loaded when actually constructing query metadata, not on every import of rlm.core.types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…egen Remove the unified build_context_code() which handled strings, dicts, and DataFrames with fragile escaping. Keep only build_dataframe_context_code() for DataFrame-specific Parquet-over-base64 codegen. Restore original per-environment string/dict handling in Modal, Prime, and Daytona REPLs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Owner
Author
|
Closing - will PR to upstream repo instead |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
rlm.completion(prompt=df)— serialized via Parquet so the LLM's REPL gets a real DataFrame, not a stringified tableQueryMetadatagenerates shape, dtypes, nulls, and sample rows for the system prompt so the LLM knows the schema before writing coderlm/utils/dataframe_utils.pywith detection, serialization, metadata, and codegen helperstest_dataframe_utils.py,test_local_repl.py,test_types.pyKnown limitation
Persistent mode's
_setup_promptrebuilds the system prompt from scratch each call using only the latest prompt's metadata. When usingpersistent=Truewith multiplecompletion()calls, the LLM loses schema info for earlier contexts (e.g.context_0). Thebuild_user_promptdoes note the context count, but the detailed metadata (dtypes, shape, sample rows) is only shown for the most recent context. This is a pre-existing limitation of persistent mode, not introduced by DataFrame changes.Test plan
pytest tests/test_dataframe_utils.py— detection, Parquet roundtrip, metadata, codegenpytest tests/test_local_repl.py— DataFrame load/add in LocalREPLpytest tests/test_types.py— QueryMetadata for DataFramespytest tests/— full suite (71 tests pass)examples/rlm_dataframe_demo.ipynbend-to-end🤖 Generated with Claude Code