fix: ak.to_dataframe() converting None to string "nan" in string columns#3716
Draft
fix: ak.to_dataframe() converting None to string "nan" in string columns#3716
ak.to_dataframe() converting None to string "nan" in string columns#3716Conversation
… string Co-authored-by: TaiSakuma <[email protected]>
…string Co-authored-by: TaiSakuma <[email protected]>
Co-authored-by: TaiSakuma <[email protected]>
Co-authored-by: TaiSakuma <[email protected]>
Copilot
AI
changed the title
[WIP] Fix ak.to_dataframe() handling of None values
Fix Nov 5, 2025
ak.to_dataframe() converting None to string "nan" in string columns
Member
|
Can you follow |
Member
|
@copilot Please update this PR to follow |
Co-authored-by: TaiSakuma <[email protected]>
Author
ak.to_dataframe() converting None to string "nan" in string columnsak.to_dataframe() converting None to string "nan" in string columns
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix
ak.to_dataframe()convertingNoneto string"nan"in string columnsProblem
When converting an Awkward Array with
Nonevalues in string columns to a DataFrame,Nonewas converted to the literal string"nan", making it indistinguishable from actual"nan"string values.Solution
Modified
ak_to_dataframe.pyto convert masked string/bytestring arrays to object arrays withNonevalues instead of filling with the string"nan". This allows pandas to properly handle missing values while preserving the distinction betweenNoneand the string"nan".Changes Made
/src/awkward/operations/ak_to_dataframe.py:numpy.ma.filled(column, "nan")withnumpy.where(column.mask, None, column.data).astype(object)for string/bytestring columnstest_3713_to_dataframe_none_vs_nan_string.py(following CONTRIBUTING.md naming convention)test_3692_to_dataframe_masked_string_dtype_resize.pyandtest_0331_pandas_indexedarray.pyto expectNoneinstead of"nan"Verification
Noneand"nan"are now distinguishableNonevsb"nan"are distinguishableNonestill shows asNaNfor numeric typesSecurity Summary
No security vulnerabilities introduced. CodeQL analysis found 0 alerts.
Fixes #3713
Original prompt
ak.to_dataframe()turnsNoneinto the string"nan"#3713✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.