Skip to content

Missing DataFrame index in Result.data #88

@menezesandre

Description

@menezesandre

When a DataFrame is displayed, the corresponding Result has the data attribute in the format {column -> [values]} (equivalent to df.to_dict(orient="list")). This means that we lose the table index, which can be relevant. Is it possible to use a format that preserves this information?

To keep consistency with pandas' to_dict, any of these options would work:

  • 'dict' (default) : dict like {column -> {index -> value}}
  • 'split' : dict like {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}
  • 'tight' : dict like {'index' -> [index], 'columns' -> [columns], 'data' -> [values], 'index_names' -> [index.names], 'column_names' -> [column.names]}
  • 'index' : dict like {index -> {column -> value}}

(Note: 'tight' is the only option that preserves the full information, including the index name)

Example

from e2b_code_interpreter import Sandbox

code = """
import pandas as pd
df = pd.DataFrame({"key": ["a", "b", "a", "b"], "value": [1, 2, 3, 4]})
display(df.groupby("key").sum())
"""
with Sandbox() as sandbox:
    execution = sandbox.run_code(code)

result = execution.results[0]
print("Text:")
print(result.text)
print("Data:")
print(result.data)
Text:
     value
key       
a        4
b        6
Data:
{'value': [4, 6]}

Expected (one of the options):

Data:
{'index': ['a', 'b'], 'columns': ['value'], 'data': [[4], [6]], 'index_names': ['key'], 'column_names': [None]}

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions