-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
I saw your note to the pandas-dev list and have some feedback:
- I don't agree with your statement about querying data. The
.query
method is easier to read and understand. Using the pandas expressions can be difficult to parse when you have complex expressions and longDataFrame
names, for example:
my_big_dataframe[my_big_dataframe['order_date'] >= "20201001"
& my_big_dataframe['order_date'] <= "20201031 &
& my_big_dataframe['customer'] == "Apple"]
versus
my_big_dataframe.query('order_date >= "20201001" and order_date <= "20201031" and customer == "Apple"']
In addition, "query" statements can be dynamically formatted.
If you believe otherwise, could you add text as to why you don't prefer .query
?
- You might want to start using the new nullable types (
String
,Int64
, etc.) andpd.NA
in your examples - In the "column selection" section, one advantage of using something like
df.column
is that if you are in a notebook, you can get autocompletion, which can help with long column names. But your point that all names might not work is also correct. - You might want to take a look at Tom Augspurger's "Modern Pandas" for more ideas: https://tomaugspurger.github.io/modern-1-intro.html
Hope this helps.
Metadata
Metadata
Assignees
Labels
No labels