-
Notifications
You must be signed in to change notification settings - Fork 112
Partial fix for #1078 — [Add Dataframe display config] #1086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Introduced DisplayConfig struct to manage display settings such as max_table_bytes, min_table_rows, and max_cell_length. - Updated PyDataFrame to utilize DisplayConfig for rendering and displaying DataFrames. - Added methods to configure and reset display settings, allowing users to customize their DataFrame presentation in Python.
- Added DisplayConfig struct for configuring DataFrame display in Python. - Introduced fields: max_table_bytes, min_table_rows, and max_cell_length with default values. - Implemented a constructor for DisplayConfig to allow optional customization. - Updated display_config method in PyDataFrame to return a Python object of DisplayConfig.
- Introduced `configure_display` method to set customizable display options for DataFrame representation, including maximum bytes, minimum rows, and maximum cell length. - Added `reset_display_config` method to restore default display settings. - Implemented `display_config` property to retrieve current display configuration.
- Implemented tests for accessing and modifying display configuration properties in the DataFrame class. - Added `test_display_config` to verify default values of display settings. - Created `test_configure_display` to test setting and partially updating display configuration. - Introduced `test_reset_display_config` to ensure resetting configuration restores default values.
- Added validation to ensure max_table_bytes, min_table_rows, and max_cell_length are greater than 0 in the configure_display method of DataFrame class. - Updated test cases to cover scenarios for zero and negative values, ensuring proper error handling. - Enhanced existing tests to validate extreme values and confirm expected behavior for display configurations.
- Updated DataFrame class to include max_table_rows_in_repr parameter for display configuration. - Enhanced configure_display method to accept max_table_rows_in_repr. - Modified DisplayConfig struct to include max_table_rows_in_repr with a default value of 10. - Added tests to verify the functionality of max_table_rows_in_repr in both configuration and display output.
3457121
to
cae89b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this, but I don't think we want users to have to set it on a per-dataframe basis.
- Enhanced the DataFrame class to set display configuration at the session context level, ensuring that changes to one DataFrame's display settings affect all DataFrames created from the same context. - Modified the PyDataFrame struct to accept a display configuration during initialization and updated methods to reference the new display_config field instead of the previous config field. - Added tests to verify that display configurations are shared across DataFrames in the same context and remain independent across different contexts.
Thanks for reviewing this. |
This reverts commit 0d5e900.
- Removed unnecessary cloning of DataFrame in various methods to enhance performance. - Consolidated display configuration handling by removing the DisplayConfig struct and related methods. - Updated methods to use direct references to DataFrame where applicable. - Improved the implementation of select, filter, with_column, and other methods to work with mutable references. - Added a new to_string method for better string representation of DataFrame. - Cleaned up unused imports and commented-out code for better readability.
This reverts commit 0e30af3.
…ptions - Introduced `DataframeDisplayConfig` struct to manage display settings for DataFrames. - Added fields for maximum bytes, minimum rows, maximum cell length, and maximum rows in repr. - Implemented a constructor with default values for easy initialization. - Updated `PySessionConfig` to include `display_config` with default settings.
…fig (python) - Introduced `with_dataframe_display_config` method in `SessionConfig` to allow customization of DataFrame display settings. - Parameters include `max_table_bytes`, `min_table_rows`, `max_cell_length`, and `max_table_rows_in_repr` for flexible display configurations. - Utilizes `DataframeDisplayConfig` for internal management of display settings.
…play options - Introduced DataframeDisplayConfig to manage display settings for DataFrames. - Added properties for max_table_bytes, min_table_rows, max_cell_length, and max_table_rows_in_repr. - Each property includes getter and setter methods for easy configuration. - Default values provided for each parameter to enhance usability.
- Updated `PyDataFrame` constructor to accept a `PyDataframeDisplayConfig` parameter for improved DataFrame display customization. - Modified multiple methods in `PySessionContext` to pass the display configuration when creating `PyDataFrame` instances, ensuring consistent display settings across different DataFrame operations.
…Context integration
…eDisplayConfig - Added a private method `_validate_positive` to encapsulate the logic for validating positive integer values. - Updated setters for `max_table_bytes`, `min_table_rows`, `max_cell_length`, and `max_table_rows_in_repr` to use the new validation method, improving code readability and maintainability.
…lidation - Added validation for max_table_bytes, min_table_rows, max_cell_length, and max_table_rows_in_repr to ensure positive values during initialization. - Removed the deprecated with_dataframe_display_config method to streamline the configuration process.
…orrect row handling
@timsaucer |
- Reduced the size of test data in the `data` fixture from 100 to 10 entries for efficiency. - Added `normalize_uuid` function to standardize UUIDs in HTML representations for consistent testing. - Modified the `test_display_config_in_init` to use a custom display configuration and updated assertions to compare normalized HTML outputs. - Enhanced readability of assertions in `test_display_config_affects_repr` by formatting conditions.
This looks great. I browsed it this morning, but it's a bit long so I will try to make some time tomorrow to get a more thorough review. |
Should I move to the python DataFrameHtmlFormatter class as well? Then we would not need a context display config. |
Moving those over to your other work sounds like a great way to have one point of processing for all of these display options. I really love how all this work is coming together! |
Closing this. Moving the configuration from Rust to Python in #1119 |
Which issue does this PR close?
Closes #1078
Rationale for this change
This PR introduces a customizable display configuration for DataFrames in the Python DataFusion API. Users often need more control over how large datasets are rendered in terminals or notebooks. This feature enhances usability by allowing control over byte limits, row limits, and cell formatting during display.
This makes it easier to work with large or verbose data interactively, improving developer experience and making DataFusion more notebook-friendly.
What changes are included in this PR?
DataframeDisplayConfig
class in Python for customizing DataFrame display settings:max_table_bytes
min_table_rows
max_cell_length
max_table_rows_in_repr
with_display_config()
method toSessionContext
for setting global display options.PyDataFrame
._repr_html_
,__repr__
) to use the provided configuration.Are there any user-facing changes?
✅ Yes
Users can now customize how DataFrames are rendered: