Skip to content

Conversation

H0TB0X420
Copy link

  • Strip quotes from column names in drop() method
  • Maintains consistency with other DataFrame operations
  • Both drop('col') and drop('col') now work

Rationale: CSV files with capitalized headers require quotes in select() operations but drop() failed when quotes were provided, creating inconsistent behavior across DataFrame methods.

User facing changes: users can now use either drop('col') or drop('"col"') consistently, matching the behavior of other DataFrame operations like select().

Closes #1212

This is one of my first PRs, please let me know what I can improve!

- Strip quotes from column names in drop() method
- Maintains consistency with other DataFrame operations
- Both drop('col') and drop('col') now work

Fixes apache#1212
@HeWhoHeWho
Copy link

HeWhoHeWho commented Sep 19, 2025

Hi thanks for the PR.

Rationale: CSV files with capitalized headers require quotes in select() operations but drop() failed when quotes were provided, creating inconsistent behavior across DataFrame methods.

User facing changes: users can now use either drop('col') or drop('"col"') consistently, matching the behavior of other DataFrame operations like select().

In the context of CSV capitalised col header, I just tested out select('col') without double quotes in the current version, it raised an Exception Error: FieldNotFound. I believe select() only accepts double quotes to parse capitalised col header i.e. select('"col"').

Of course, it'd be great if one can opt to use double quote or without to parse capitalised col header for DataFrame operations like select(), drop(), sort(), etc.

Let me know if I misunderstood the context.

Comment on lines 415 to 417
Returns:
DataFrame with those columns removed in the projection.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR. This looks good. One request: Would you mind updating the docstring to specify that column case is respected and does not need double quotes like other operations such as select? You can also specify that leading and trailing " are allowable as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Drop column syntax inconsistency
3 participants