-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/pandas Performance Warning Issue due to multiple frame.insert
#4246
Changes from all commits
413d41e
fbfd4a8
3a4b466
f028571
d4955b1
940f25b
97b214e
f1ed3d7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
[flake8] | ||
max-line-length = 88 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,11 @@ | ||
import plotly.express as px | ||
import plotly.graph_objects as go | ||
import pandas as pd | ||
import numpy as np | ||
from plotly.express._core import build_dataframe, _is_col_list | ||
from pandas.testing import assert_frame_equal | ||
import pytest | ||
import warnings | ||
|
||
|
||
def test_is_col_list(): | ||
|
@@ -847,3 +849,29 @@ def test_line_group(): | |
assert len(fig.data) == 4 | ||
fig = px.scatter(df, x="x", y=["miss", "score"], color="who") | ||
assert len(fig.data) == 2 | ||
|
||
|
||
def test_no_pd_perf_warning(): | ||
n_cols = 1000 | ||
n_rows = 1000 | ||
|
||
columns = list(f"col_{c}" for c in range(n_cols)) | ||
index = list(f"i_{r}" for r in range(n_rows)) | ||
|
||
df = pd.DataFrame( | ||
np.random.uniform(size=(n_rows, n_cols)), index=index, columns=columns | ||
) | ||
|
||
with warnings.catch_warnings(record=True) as warn_list: | ||
_ = px.bar( | ||
df, | ||
x=df.index, | ||
y=df.columns[:-2], | ||
labels=df.columns[:-2], | ||
) | ||
performance_warnings = [ | ||
warn | ||
for warn in warn_list | ||
if issubclass(warn.category, pd.errors.PerformanceWarning) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does that mean there are other warnings emitted during this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. there might be. but this test is for checking this warning only. we can change it look out for any pandas warning |
||
] | ||
assert len(performance_warnings) == 0, "PerformanceWarning(s) raised!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why this change is required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, it was creating series without a name. I set it as None in case this function was used externally to avoid breaking compatibility.
When we create a dataframe from a dict, it's safer to have named series. Also, it would be easier to debug in line to see which series was created in order to know the column that caused the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change might not be necessary, but i like the explicitness. Can be useful during debugging.