Skip to content

fix(cmor): make make_simple_varlist return unique variables across all datetimes#804

Open
Copilot wants to merge 4 commits intomainfrom
copilot/fix-unit-test-deduplication
Open

fix(cmor): make make_simple_varlist return unique variables across all datetimes#804
Copilot wants to merge 4 commits intomainfrom
copilot/fix-unit-test-deduplication

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 27, 2026

make_simple_varlist used glob.iglob to grab a single probe file and extracted its datetime component to build the search pattern *{datetime}*.nc. Because iglob returns files in filesystem inode-hash order, the probe was non-deterministic. If it landed on a datetime that covered only a subset of variables (e.g. model.19900201.temp.nc in a dir also containing model.19900101.temp.nc + model.19900101.salt.nc), the search missed variables present at other datetimes. PR #786's 42 commits recycled inodes in CI, flipping the glob order so the minority datetime was consistently selected first — turning a latent flake into a reproducible failure.

Describe your changes

cmor_finder.py — scan all files, collect unique variables across all datetimes

  • Replace next(glob.iglob("*.*.nc")) probe with glob.glob("*.*.nc") to collect all matching files up front
  • Remove the datetime-counting/picking logic and the second glob.glob re-search entirely — the initial glob already has every file
  • Iterate once over all files, extracting variable names and assigning into a dict; dict key assignment naturally deduplicates across datetimes
  • Variables present at any datetime — majority or minority — are now always included
# before — non-deterministic: first file glob returns determines the search window,
#          variables at other datetimes can be silently missed
one_file = next(glob.iglob(os.path.join(dir_targ, "*.*.nc")), None)
one_datetime = os.path.basename(one_file).split('.')[-3]
files = glob.glob(os.path.join(dir_targ, f"*{one_datetime}*.nc"))

# after — deterministic: scan all files, deduplicate via dict assignment
var_list: Dict[str, str] = {}
for f in glob.glob(os.path.join(dir_targ, "*.*.nc")):
    var_name = os.path.basename(f).split('.')[-2]
    var_list[var_name] = var_name

test_cmor_finder_make_simple_varlist.py — test hygiene

  • Removed test_make_simple_varlist_index_error_on_datetime: the try/except IndexError datetime-extraction branch no longer exists
  • Simplified test_make_simple_varlist_no_files_matching_pattern: now patches the single glob.glob call directly
  • Updated stale docstrings on test_make_simple_varlist_deduplicates and test_make_simple_varlist_dedup_across_datetimes to reflect the new all-datetimes behaviour
  • Added test_make_simple_varlist_minority_datetime_var_included to explicitly pin that variables appearing at only a minority datetime are still returned

Issue ticket number and link (if applicable)

Checklist before requesting a review

  • I ran my code
  • I tried to make my code readable
  • I tried to comment my code
  • I wrote a new test, if applicable
  • I wrote new instructions/documentation, if applicable
  • I ran pytest and inspected it's output
  • I ran pylint and attempted to implement some of it's feedback
  • No print statements; all user-facing info uses logging module

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Replace the non-deterministic glob.iglob single-file probe with
glob.glob to collect all *.*.nc files, count them by their datetime
component, and pick the datetime with the most files. This avoids
flaky behaviour caused by filesystem inode ordering (which changed
after PR #786 merged).

Update test_make_simple_varlist_index_error_on_datetime to patch
glob.glob (with an os.path.basename check) instead of the now-unused
glob.iglob. Also tighten the stale comment in
test_make_simple_varlist_dedup_across_datetimes.

Agent-Logs-Url: https://github.com/NOAA-GFDL/fre-cli/sessions/ef0227dc-6efa-4218-88d6-30d1d64cd2be

Co-authored-by: ilaflott <[email protected]>
Copilot AI changed the title [WIP] Fix unit test for test_make_simple_varlist_dedup fix(cmor): make make_simple_varlist datetime selection deterministic Mar 27, 2026
Copilot AI requested a review from ilaflott March 27, 2026 19:15
@ilaflott ilaflott marked this pull request as ready for review March 27, 2026 19:28
@ilaflott ilaflott requested review from ceblanton and singhd789 March 27, 2026 19:28
@ilaflott ilaflott added the bug Something isn't working label Mar 27, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.18%. Comparing base (9c36131) to head (e4bb057).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #804      +/-   ##
==========================================
- Coverage   84.20%   84.18%   -0.03%     
==========================================
  Files          71       71              
  Lines        4971     4975       +4     
==========================================
+ Hits         4186     4188       +2     
- Misses        785      787       +2     
Flag Coverage Δ
unittests 84.18% <100.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
fre/cmor/cmor_finder.py 98.16% <100.00%> (-1.84%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 71e7da2...e4bb057. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

since i'm seeings others with builds on `3.14`, lets toss this in
search_pattern = None
try:
one_datetime = os.path.basename(one_file).split('.')[-3]
# Count files per datetime stamp and pick the most common one.
Copy link
Copy Markdown
Contributor

@singhd789 singhd789 Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I'm not fully understanding is why the datetime with the most files is picked. I understand it's for the representative datetime, but is that just to have like a full list of variables across datetimes? Would other (maybe not representative) datetimes have other variables present at all?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the new form, it is 100% possible that other datetimes that do not have the most number of files do not contain a variable under another datetime. the function isn't supposed to remove all ambiguity for the user nor be an "ironclad" solid generator of variable lists, hence the function name, make_simple_var_list. it's supposed to get a user started

@ilaflott
Copy link
Copy Markdown
Member

something is going on... i never felt like fre-cli was tortured by flakey tests until recently. we may have another flakey test in fre/make/tests/test_create_checkout.py::test_checkout_execute

second attempt that succeeded on rerun (no changes) here

first attempt that failed here and error:

...
fre/make/tests/test_create_checkout.py ...F...                           [ 42%]
...
...
=================================== FAILURES ===================================
____________________________ test_checkout_execute _____________________________

monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f811c8f4c90>

    def test_checkout_execute(monkeypatch):
        """
        check if --execute option works
        """
        # Set output directory as home for fre make output
        monkeypatch.setenv("TEST_BUILD_DIR", OUT)
    
        shutil.rmtree(f"{OUT}/fremake_canopy/test", ignore_errors=True)
>       create_checkout_script.checkout_create(YAMLFILE,
                                               PLATFORM,
                                               TARGET,
                                               no_parallel_checkout = False,
                                               njobs = 2,
                                               execute = True,
                                               force_checkout = False)

fre/make/tests/test_create_checkout.py:69: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
fre/make/create_checkout_script.py:168: in checkout_create
    baremetal_checkout_write(model_yaml, src_dir, jobs_str, parallel_cmd, execute)
fre/make/create_checkout_script.py:57: in baremetal_checkout_write
    fre_checkout.run()
fre/make/gfdlfremake/checkout.py:128: in run
    subprocess.run(args=[self.src+"/"+self.fname], check=True)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

input = None, capture_output = False, timeout = None, check = True
popenargs = ()
kwargs = {'args': ['fre/make/tests/checkout_out/fremake_canopy/test/null_model_full/src/checkout.sh']}
process = <Popen: returncode: 128 args: ['fre/make/tests/checkout_out/fremake_canopy/t...>
stdout = None, stderr = None, retcode = 128

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.
    
        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them,
        or pass capture_output=True to capture both.
    
        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.
    
        If timeout (seconds) is given and the process takes too long,
         a TimeoutExpired exception will be raised.
    
        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.
    
        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.
    
        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE
    
        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE
    
        with Popen(*popenargs, **kwargs) as process:
            try:
                stdout, stderr = process.communicate(input, timeout=timeout)
            except TimeoutExpired as exc:
                process.kill()
                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                raise
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                raise
            retcode = process.poll()
            if check and retcode:
>               raise CalledProcessError(retcode, process.args,
                                         output=stdout, stderr=stderr)
E               subprocess.CalledProcessError: Command '['fre/make/tests/checkout_out/fremake_canopy/test/null_model_full/src/checkout.sh']' returned non-zero exit status 128.

/opt/software/linux-rocky9-zen2/gcc-11.4.1/miniconda3-24.7.1-akg5caxhoqv3oasili3pgukbyryceq74/envs/fre-cli/lib/python3.13/subprocess.py:577: CalledProcessError
...
...
=========================== short test summary info ============================
FAILED fre/make/tests/test_create_checkout.py::test_checkout_execute - subprocess.CalledProcessError: Command '['fre/make/tests/checkout_out/fremake_canopy/test/null_model_full/src/checkout.sh']' returned non-zero exit status 128.
= 1 failed, 489 passed, 12 skipped, 23 xfailed, 82 warnings in 468.49s (0:07:48) =
...

@ilaflott
Copy link
Copy Markdown
Member

@copilot the feedback from @singhd789 has a good point, we can do better pretty easily. make the filelist of unique variables found across all datetimes. update tests as-needed

Per @ilaflott's feedback: instead of picking the most-common datetime
and re-globbing only that slice, iterate over all *.*.nc files found
by the initial glob and collect unique variable names directly via
dict assignment. This is simpler, correct by construction, and catches
variables that only appear at a minority datetime.

Remove the now-dead datetime-counting logic, the second glob.glob
call, and the try/except IndexError around split('.')[-3].

In tests: remove test_make_simple_varlist_index_error_on_datetime
(branch no longer exists), simplify the no_files_matching_pattern
test, update stale docstrings, and add
test_make_simple_varlist_minority_datetime_var_included to pin the
new guarantee.

Agent-Logs-Url: https://github.com/NOAA-GFDL/fre-cli/sessions/762aa18f-e013-40d7-b215-163459373b68

Co-authored-by: ilaflott <[email protected]>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 27, 2026

@copilot the feedback from @singhd789 has a good point, we can do better pretty easily. make the filelist of unique variables found across all datetimes. update tests as-needed

Done in 088b05b. The datetime-picking logic is gone entirely — the initial glob.glob("*.*.nc") already has every file, so we just iterate once over all of them and assign into a dict (which deduplicates naturally). Variables at any datetime, majority or minority, are always included.

Also added test_make_simple_varlist_minority_datetime_var_included to pin that guarantee explicitly.

Copilot AI changed the title fix(cmor): make make_simple_varlist datetime selection deterministic fix(cmor): make make_simple_varlist return unique variables across all datetimes Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working priority: HIGH

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fre.cmor: failing unit test for test_cmor_finder_make_simple_varlist::test_make_simple_varlist_dedup

3 participants