Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-74598: add fnmatch.filterfalse for excluding names #121185

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

picnixz
Copy link
Member

@picnixz picnixz commented Jun 30, 2024

In this implementation, I did not use a lambda function as it was proposed in the original patch nor did I use auxiliary functions. The reason is:

# without lambda function
python -m pyperf timeit -s "import fnmatch; data=['Python', 'Ruby', 'Perl', 'Tcl']" 'fnmatch.filterfalse(data, "P*")'
.....................
Mean +- std dev: 1.72 us +- 0.04 us

# with lambda function
./python -m pyperf timeit -s "import fnmatch; data=['Python', 'Ruby', 'Perl', 'Tcl']" 'fnmatch.filterfalse(data, "P*")'
.....................
Mean +- std dev: 2.24 us +- 0.06 us

Therefore, I think it's better not to use a lambda function. I have a small question but would it make sense to re-implement fnmatch in C? (this may be a question for @barneygale)

Additional benchmarks
small_match_none_loop: Mean +- std dev: 1.54 us +- 0.05 us
small_match_none_iter: Mean +- std dev: 1.61 us +- 0.07 us

small_match_some_loop: Mean +- std dev: 1.71 us +- 0.04 us
small_match_some_iter: Mean +- std dev: 1.89 us +- 0.06 us

small_match_all_loop: Mean +- std dev: 1.76 us +- 0.03 us
small_match_all_iter: Mean +- std dev: 2.00 us +- 0.05 us

medium_match_none_loop: Mean +- std dev: 10.3 us +- 0.2 us
medium_match_none_iter: Mean +- std dev: 8.75 us +- 0.26 us

medium_match_some_loop: Mean +- std dev: 12.1 us +- 0.1 us
medium_match_some_iter: Mean +- std dev: 11.1 us +- 0.1 us

medium_match_all_loop: Mean +- std dev: 12.9 us +- 0.2 us
medium_match_all_iter: Mean +- std dev: 12.4 us +- 0.3 us

large_match_none_loop: Mean +- std dev: 98.2 us +- 1.3 us
large_match_none_iter: Mean +- std dev: 80.6 us +- 2.7 us

large_match_some_loop: Mean +- std dev: 119 us +- 4 us
large_match_some_iter: Mean +- std dev: 104 us +- 2 us

large_match_all_loop: Mean +- std dev: 127 us +- 2 us
large_match_all_iter: Mean +- std dev: 115 us +- 2 us

The benchmark script is:

import os
import posixpath
import pyperf
import itertools
from fnmatch import _compile_pattern

def filterfalse_loop(names, pat):
    result = []
    pat = os.path.normcase(pat)
    match = _compile_pattern(pat)
    if os.path is posixpath:
        # normcase on posix is NOP. Optimize it away from the loop.
        for name in names:
            if match(name) is None:
                result.append(name)
    else:
        for name in names:
            if match(os.path.normcase(name)) is None:
                result.append(name)
    return result

def filterfalse_iter(names, pat):
    pat = os.path.normcase(pat)
    match = _compile_pattern(pat)
    if os.path is posixpath:
        # normcase on posix is NOP. Optimize it away from the loop.
        return list(itertools.filterfalse(match, names))

    result = []
    for name in names:
        if match(os.path.normcase(name)) is None:
            result.append(name)
    return result

runner = pyperf.Runner()
base_data = ['Python', 'Ruby', 'Perl', 'Tcl']
for data_label, data_size in [('small', 1), ('medium', 10), ('large', 100)]:
    for pat_label, pattern in [('none', 'A*'), ('some', 'P*'), ('all', '*')]:
        data = base_data * data_size
        runner.bench_func(f'{data_label}_match_{pat_label}_loop', filterfalse_loop, data, pattern)
        runner.bench_func(f'{data_label}_match_{pat_label}_iter', filterfalse_iter, data, pattern)

📚 Documentation preview 📚: https://cpython-previews--121185.org.readthedocs.build/

@picnixz picnixz changed the title gh-74598: add fnmatch.filterfalse for excluding patterns gh-74598: add fnmatch.filterfalse for excluding names Jun 30, 2024
@picnixz picnixz added the stdlib Python modules in the Lib dir label Aug 1, 2024
@picnixz picnixz requested a review from barneygale December 15, 2024 14:17
@picnixz picnixz requested a review from serhiy-storchaka March 1, 2025 11:58
@picnixz
Copy link
Member Author

picnixz commented Mar 1, 2025

FT failure is known so I'll wait until the fix (#130724) is merged.

@picnixz picnixz added the stale Stale PR or inactive for long period of time. label Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review stale Stale PR or inactive for long period of time. stdlib Python modules in the Lib dir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants