Skip to content

Some feedback #1

@Dr-Irv

Description

@Dr-Irv

I saw your note to the pandas-dev list and have some feedback:

  1. I don't agree with your statement about querying data. The .query method is easier to read and understand. Using the pandas expressions can be difficult to parse when you have complex expressions and long DataFrame names, for example:
my_big_dataframe[my_big_dataframe['order_date'] >= "20201001" 
                              & my_big_dataframe['order_date'] <= "20201031 & 
                              & my_big_dataframe['customer'] == "Apple"]

versus

my_big_dataframe.query('order_date >= "20201001" and order_date <= "20201031" and customer == "Apple"']

In addition, "query" statements can be dynamically formatted.

If you believe otherwise, could you add text as to why you don't prefer .query ?

  1. You might want to start using the new nullable types (String, Int64, etc.) and pd.NA in your examples
  2. In the "column selection" section, one advantage of using something like df.column is that if you are in a notebook, you can get autocompletion, which can help with long column names. But your point that all names might not work is also correct.
  3. You might want to take a look at Tom Augspurger's "Modern Pandas" for more ideas: https://tomaugspurger.github.io/modern-1-intro.html

Hope this helps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions