Skip to content

Fix #1639: stats1 null_count with --fr regex gives wrong results#1994

Open
cobyfrombrooklyn-bot wants to merge 1 commit intojohnkerl:mainfrom
cobyfrombrooklyn-bot:fix-issue-1639
Open

Fix #1639: stats1 null_count with --fr regex gives wrong results#1994
cobyfrombrooklyn-bot wants to merge 1 commit intojohnkerl:mainfrom
cobyfrombrooklyn-bot:fix-issue-1639

Conversation

@cobyfrombrooklyn-bot
Copy link

Problem

When using --fr (regex field selector) with stats1 -a null_count, void (empty) field values are always reported as 0, even when there are actual empty fields. Using -f with explicit field names correctly counts them.

Example from #1639:

# With -f: correctly reports Type_null_count=3
mlr --icsv --ifs "|" stats1 -a null_count -f "Type" input.csv

# With --fr: incorrectly reports Type_null_count=0
mlr --icsv --ifs "|" stats1 -a null_count --fr ".*" input.csv

Root Cause

In stats1.go, ingestWithoutValueFieldRegexes (the non-regex path) has a special case at line 532 that allows void values through for null_count accumulators:

if valueFieldValue.IsVoid() {
    if accumulatorName != "null_count" {
        continue
    }
}

But ingestWithValueFieldRegexes (the regex path) unconditionally skips void values:

if valueFieldValue.IsVoid() {
    continue
}

Fix

Added the same null_count exception to the regex path, making it consistent with the non-regex path.

Test

Added regression test case test/cases/verb-stats1-regexed-field-names/0008 with a pipe-delimited CSV containing empty fields. Verifies that --fr ".*" produces the same null_count results as explicit -f. All 49 stats1-related regression tests pass.

Tested locally on macOS ARM (Apple Silicon).

When using --fr (regex field selector) with stats1 -a null_count, void
(empty) field values were unconditionally skipped in
ingestWithValueFieldRegexes, causing null_count to always report 0.

The non-regex path (ingestWithoutValueFieldRegexes) already had a special
case that allows void values through for null_count accumulators. This
commit adds the same exception to the regex path.

Fixes johnkerl#1639
Copy link
Owner

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants