main: using regex for choosing a parser for given file name#4270
main: using regex for choosing a parser for given file name#4270masatake merged 11 commits intouniversal-ctags:masterfrom
Conversation
a5a3a28 to
c50f467
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4270 +/- ##
==========================================
+ Coverage 85.89% 85.90% +0.01%
==========================================
Files 250 251 +1
Lines 62341 62510 +169
==========================================
+ Hits 53545 53699 +154
- Misses 8796 8811 +15 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull Request Overview
This PR extends the --map-<LANG> option to support regular expression matching for file names, addressing limitations where glob patterns and extension matching are insufficient for generic file names. The implementation adds a new regex-based mapping type alongside existing extension and pattern mappings.
Key Changes:
- Introduced regex pattern support using
%regex%[i]syntax for language file mappings - Added new
rexprcodemodule to handle regex compilation and matching - Extended optlib2c to generate C code from regex mapping definitions
Reviewed Changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| main/rexprcode.c | New module implementing regex pattern compilation, matching, and encoding |
| main/rexprcode_p.h | Public interface for regex code operations |
| main/parse.c | Core integration of regex matching into language detection logic |
| main/parse.h | Added rExprSrc structure definition and REXPR_LAST_ENTRY macro |
| main/parse_p.h | Extended langmapType enum with LMAP_REXPR flag |
| main/options.c | Command-line option parsing for regex patterns with icase flag support |
| optlib/rpmMacros.ctags | Example usage replacing commented-out patterns with regex |
| optlib/rpmMacros.c | Generated C code with regex mapping definitions |
| misc/optlib2c | Extended Perl script to parse and generate regex mapping code |
| source.mak | Build system updates for new source files |
| win32/ctags_vs2013.vcxproj | Visual Studio project file updates |
| win32/ctags_vs2013.vcxproj.filters | Visual Studio filter file updates |
| Tmain/list-map-rexprs.d/* | Test cases for new --list-map-rexprs option |
| Tmain/versioning.d/stdout-expected.txt | Updated test output expectations |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
misc/optlib2c
Outdated
| unless ($_[0]->{'langdef'} eq $1); | ||
| my $spec = $2; | ||
| if ($spec =~ /\((.*)\)/) { | ||
| if ($spec =~ /%(.+)%(i)?/) { |
There was a problem hiding this comment.
The regex pattern %(.+)% is greedy and will match incorrectly if the expression contains '%' characters, even with escaping. For example, %a%b%c% would capture a%b%c instead of a. The pattern should be non-greedy: %(.+?)%(i)?/ or better yet, should properly handle escaped '%' characters in the capture group.
| if ($spec =~ /%(.+)%(i)?/) { | |
| if ($spec =~ /%(.+?)%(i)?/) { |
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 16 out of 17 changed files in this pull request and generated 1 comment.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
misc/optlib2c
Outdated
| unless ($_[0]->{'langdef'} eq $1); | ||
| my $spec = $2; | ||
| if ($spec =~ /\((.*)\)/) { | ||
| if ($spec =~ /%(.+)%(i)?/) { |
There was a problem hiding this comment.
The regex should use a non-greedy quantifier .+? instead of .+ to prevent matching across multiple patterns when there are multiple % characters in the input. This could cause incorrect parsing of escaped % characters.
| if ($spec =~ /%(.+)%(i)?/) { | |
| if ($spec =~ /%(.+?)%(i)?/) { |
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 16 out of 17 changed files in this pull request and generated 5 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 16 out of 17 changed files in this pull request and generated 3 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
misc/optlib2c
Outdated
| if ($spec =~ /%(.+?)%(i|\{icase\})?/) { | ||
| my $rexpr = { expr => $1, | ||
| iCase => (defined $2 && ($2 eq 'i' || $2 eq 'icase'))? 1: 0 }; |
There was a problem hiding this comment.
The regex allows {icase} as an alternative to i, but this syntax is not documented in the PR description or help text. Either document this alternative syntax or remove it to avoid confusion.
| if ($spec =~ /%(.+?)%(i|\{icase\})?/) { | |
| my $rexpr = { expr => $1, | |
| iCase => (defined $2 && ($2 eq 'i' || $2 eq 'icase'))? 1: 0 }; | |
| if ($spec =~ /%(.+?)%(i)?/) { | |
| my $rexpr = { expr => $1, | |
| iCase => (defined $2 && $2 eq 'i')? 1: 0 }; |
|
|
||
| static flagDefinition langmapRexprFlagDef[] = { | ||
| { 'i', "icase", langmap_rexpr_icase_short, langmap_rexpr_icase_long, | ||
| NULL, "applied in a case-insensitive manner"}, |
There was a problem hiding this comment.
The long flag name is 'icase', but the optlib2c script also accepts '{icase}' syntax (line 296 in misc/optlib2c). These should be consistent, or the alternative syntax should be documented.
| NULL, "applied in a case-insensitive manner"}, | |
| NULL, "applied in a case-insensitive manner (accepts both 'icase' and '{icase}' syntax)"}, |
231a606 to
f865e33
Compare
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 16 out of 17 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 21 out of 23 changed files in this pull request and generated no new comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 22 out of 24 changed files in this pull request and generated no new comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 30 out of 32 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
man/ctags.1.rst.in
Outdated
|
|
||
| If you run @CTAGS_NAME_EXECUTABLE@ with ``@CTAGS_NAME_EXECUTABLE@ -R src``, | ||
| the match is performed with ``src/lib/data.c`` and ``src/lib/logic.c`` If you | ||
| give ``--langmap='YourParser:%src/lib/.*\.c%'``, @CTAGS_NAME_EXECUTABLE@ |
There was a problem hiding this comment.
Corrected option name from '--langmap' to '--map-YourParser'.
| give ``--langmap='YourParser:%src/lib/.*\.c%'``, @CTAGS_NAME_EXECUTABLE@ | |
| give ``--map-YourParser='%src/lib/.*\.c%'``, @CTAGS_NAME_EXECUTABLE@ |
docs/man/ctags.1.rst
Outdated
|
|
||
| If you run ctags with ``ctags -R src``, | ||
| the match is performed with ``src/lib/data.c`` and ``src/lib/logic.c`` If you | ||
| give ``--langmap='YourParser:%src/lib/.*\.c%'``, ctags |
There was a problem hiding this comment.
Corrected option name from '--langmap' to '--map-YourParser'.
| give ``--langmap='YourParser:%src/lib/.*\.c%'``, ctags | |
| give ``--map-YourParser='%src/lib/.*\.c%'``, ctags |
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 30 out of 32 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
5e33670 to
2242572
Compare
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 35 out of 37 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 35 out of 37 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 35 out of 37 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
…nsExtensionNew Delete baseFilenameSansExtensionNew() from the source tree. Signed-off-by: Masatake YAMATO <yamato@redhat.com>
The original code used a boolean value to toggle how filenames were mapped to the parser by glob-like pattern or by extension. To support the third way mapping a file name to a parser, by regular expression pattern, we will use an enum value instead of Boolean. Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
This change extends --map-<LANG> option to support regular expression matching with the full file name. The original --map-<LANG> option supports the glob based matching and the extension comparison with the file basename. However, two methods are not enough if the file names are too generic. See universal-ctags#3287 . The regular expression passed to --map-<LANG> must be surrounded by % character like --map-RpmMacros='%(.*/)?macros\.d/macros\.([^/]+)$%' If you want to match in a case-insensitive way, append `i' after the second % like --map-RpmMacros='%(.*/)?macros\.d/macros\.([^/]+)$%i' If you want to use % as part of an expression, put \ before % for escaping. Signed-off-by: Masatake YAMATO <yamato@redhat.com>
…options Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 30 out of 32 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
man/ctags.1.rst.in
Outdated
|
|
||
| ``--list-maps[=(<language>|all)]`` | ||
| Lists the file name patterns, the file extensions, and the relative-path | ||
| regular extensions which associate a file name with a language for either the |
There was a problem hiding this comment.
The term 'regular extensions' on line 1302 should be 'regular expressions' to match terminology used elsewhere in the documentation.
| regular extensions which associate a file name with a language for either the | |
| regular expressions which associate a file name with a language for either the |
docs/man/ctags.1.rst
Outdated
|
|
||
| ``--list-maps[=(<language>|all)]`` | ||
| Lists the file name patterns, the file extensions, and the relative-path | ||
| regular extensions which associate a file name with a language for either the |
There was a problem hiding this comment.
The term 'regular extensions' on line 1302 should be 'regular expressions' to match terminology used elsewhere in the documentation.
| regular extensions which associate a file name with a language for either the | |
| regular expressions which associate a file name with a language for either the |
… parser Signed-off-by: Masatake YAMATO <yamato@redhat.com>
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 30 out of 32 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
main: using regex for choosing a parser for the given file name
This change extends --map- option to support regular
expression matching with the full file name.
The original --map- option supports glob based matching
and extension comparison with the file basename.
However, two methods are not enough if the file names are too
generic. See #3287 .
The regular expression passed to --map- must be surround
by % character like
--map-RpmMacros='%(.*/)?macros.d/macros.([^/]+)$%'
If you want to match in a case-insensitive way, append `i' after the second % like
--map-RpmMacros='%(.*/)?macros.d/macros.([^/]+)$%i'
If you want to use % as part of an expression, put \ before % for escaping.
TODO: