Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: why arrow only work on mac arm? #60714

Open
3 tasks done
wonb168 opened this issue Jan 13, 2025 · 3 comments
Open
3 tasks done

BUG: why arrow only work on mac arm? #60714

wonb168 opened this issue Jan 13, 2025 · 3 comments
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@wonb168
Copy link

wonb168 commented Jan 13, 2025

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
pd.options.mode.copy_on_write = True
def df_merge(
    left,
    right,
    how: Literal["left", "right", "inner", "outer", "cross"] = "inner",
    on=None,
    left_on=None,
    right_on=None,
    left_index: bool = False,
    right_index: bool = False,
    sort: bool = False,
    suffixes=("_x", "_y"),
    copy: bool = True,
    indicator: bool = False,
    validate=None,
):
    if not pd.api.types.is_dtype_backend(left, "pyarrow"):
        left = pa.Table.from_pandas(left).to_pandas()
    if not pd.api.types.is_dtype_backend(right, "pyarrow"):
        right = pa.Table.from_pandas(right).to_pandas()

Issue Description

I have a python project use pickle file ,pandas2.1,
when it run in x86 centos7,cost 107s,
but only need 71s in mac m2,
and I upgrade pandas to 2.2.3,and set:pd.options.mode.copy_on_write = True
and edit function df_merge(which is the most cost time fuciton),
change df to arrow first。
then only need 35s。
but,same in x86 centos7,still need 100s,why arrow not work?

Expected Behavior

use arrow twice quickly in x86 centos7,but no effect!

Installed Versions

2.2.3

@wonb168 wonb168 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 13, 2025
@speco29
Copy link

speco29 commented Jan 17, 2025

There can be few reasons for that like:
Pandas Options: Double-check the Pandas options you're using. Sometimes, tweaking options like pd.options.mode.chained_assignment or pd.options.mode.use_inf_as_na can have an impact on performance.

Arrow Installation: Ensure that Arrow is properly installed and configured on your CentOS 7 system. Sometimes, missing dependencies or incorrect configurations can affect performance

@wonb168
Copy link
Author

wonb168 commented Jan 17, 2025

but,same data and code,faster in mac,not work in x86 server

@wonb168
Copy link
Author

wonb168 commented Jan 22, 2025

How to set default backend to arrow for dataframe in Pandas2.2.3?
and if df not arrow backend,and how to change it to?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

2 participants