-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
ENH: Adding DataFrame plotting benchmarks for large datasets #61546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ENH: Adding DataFrame plotting benchmarks for large datasets #61546
Conversation
@rhshadrach do you have an opinion about this? |
asv_bench/benchmarks/plotting.py
Outdated
|
||
def time_plot_large_dataframe_single_column(self, size, datetime_index): | ||
"""Baseline: plotting single column for comparison""" | ||
self.df.iloc[:, 0].plot() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do the iloc
operation in setup
so its not part of the timing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated!
54ee8b8
to
2324563
Compare
@shadnikn - what is the runtime of these benchmarks on your machine? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just some nit picks.
[(1000, 10), (1000, 50), (1000, 100), (5000, 20), (10000, 10)], | ||
[True, False] # DatetimeIndex or not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is too many cases. Can you just do [(1000, 100)]
@@ -161,4 +161,45 @@ def time_get_plot_backend_fallback(self): | |||
_get_plot_backend("pandas_dummy_backend") | |||
|
|||
|
|||
from .pandas_vb_common import setup # noqa: F401 isort:skip | |||
class DataFramePlottingLarge: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what differentiates this from FramePlotting
is that these are wide. Can you rename this class to FramePlottingWide
.
[(1000, 10), (1000, 50), (1000, 100), (5000, 20), (10000, 10)], | ||
[True, False] # DatetimeIndex or not | ||
] | ||
param_names = ["size", "datetime_index"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename size
-> shape
.
DataFrame.plot
usingLineCollection
#61532 - Adding in performance benchmarks for DataFrame plotting with large datasets.DataFrame.plot
withDatetimeIndex
#61398 and ENH: speed upDataFrame.plot
usingLineCollection
#61532. Tests multiple DataFrame sizes with/w/o DatetimeIndex & provides a baseline single-column comparison.- DataFrame sizes: (1000,10) to (10000,10)
- DatetimeIndex vs. regular index comparison
- Multi-column vs. single-column plotting.