Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result of Regex.Match("wtfb",@"(.)()+?b") is not same as new Regex(@"(.)()+?b",RegexOptions.Compiled).Match("wtfb") #111051

Open
longxya opened this issue Jan 3, 2025 · 8 comments
Assignees
Labels
area-System.Text.RegularExpressions help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@longxya
Copy link

longxya commented Jan 3, 2025

Description

If pattern includes ()+?, then Regex.Match(input,pattern) will match differnet new Regex(pattern ,RegexOptions.Compiled).Match(input)

Some discussions in GitHub Discussion #110976

Reproduction Steps

using System.Text.RegularExpressions;

string input = "wtfb";
string pattern="^(.)+()+?b";
Match matchInterpreted = new Regex(pattern, RegexOptions.None).Match(input);
Match matchCompiled = new Regex(pattern, RegexOptions.Compiled).Match(input);

Console.WriteLine($"Interpreted: {matchInterpreted.Value}");
Console.WriteLine($"Compiled: {matchCompiled.Value}");

Output:

Interpreted: b
Compiled: wtfb

Expected behavior

Output:

Interpreted: wtfb
Compiled: wtfb

Actual behavior

Output:

Interpreted: b
Compiled: wtfb

It looks like ()+? would make nearest outside brackets start at where the empty loop began, and this effect maybe gradually spread to group 0 or not.

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

No response

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jan 3, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

@DL444
Copy link

DL444 commented Jan 3, 2025

Interpreted: b

Typo in the expected behavior. It is expected that the interpreted result matches compiled.

@steveharter
Copy link
Member

Verified - the interpreted mode has a different result than both compiled and generated via [GeneratedRegex] and .NET v8, v9 and current v10 have the same issue.

I'll mark this as a bug and for v10 however with "help wanted".

cc @stephentoub

@steveharter steveharter added this to the 10.0.0 milestone Jan 14, 2025
@steveharter steveharter added the help wanted [up-for-grabs] Good issue for external contributors label Jan 14, 2025
@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Jan 14, 2025
@longxya
Copy link
Author

longxya commented Jan 14, 2025

Do I need to submit new issues for other regular expression problems that appear to be bugs? Such as:

Using new Regex("(.)(?'2-1'(?'-1'.))", RegexOptions.Compiled).Matches("wtf") will throw an exception "Index was outside the bounds of the array".

  • For regularexpression (?'2-1'(?'1'.)) , the interpreted mode has a different result than both compiled and generated via [GeneratedRegex].

@ovidiucosteanet
Copy link
Contributor

ovidiucosteanet commented Jan 16, 2025

Hello! I'd like to create a PR for this. Can you please assign the issue to me?

@steveharter steveharter removed their assignment Jan 27, 2025
@stephentoub
Copy link
Member

@ovidiucosteanet, if you'd like to give it a go, we'd welcome the help. Thanks.

@longxya
Copy link
Author

longxya commented Mar 25, 2025

@stephentoub @ovidiucosteanet
Similar regular expressions can cause infinite matching

Maybe it is related to this issue

using System;
using System.Text.RegularExpressions;

string pattern = @"(()+?){2}";
//pattern = @"(()+?)+";
//pattern = @"(()+?)*";
//pattern = @"(()+?){1,2}";
string input = "WTF123a1";
int timeOut = 10;
Regex regex = new Regex(pattern, RegexOptions.None, TimeSpan.FromMilliseconds(timeOut));
var mhes = regex.Matches(input);
try
{
	//var matchCount = mhes.Count;// will throw TimeOutException if memory is enough
	for (var i = 0; i < 1000; i++)
	{
		Console.WriteLine(mhes[i].Index + " , " + mhes[i].Length);
	}
}catch(Exception e)
{
	Console.WriteLine("Interpreted : "+e.Message);
}

Output:

0 , 3
0 , 3
0 , 3
0 , 3
0 , 3
0 , 3
  ·
  ·
  ·

And I do not know why changing the quantifier to something like {2,100} would directly throw an exception System.OutOfMemoryException

using System;
using System.Text.RegularExpressions;

string pattern = @"(()+?){2,100}";
string input = "WTF123a1";
int timeOut = 10;
Regex regex = new Regex(pattern, RegexOptions.None, TimeSpan.FromMilliseconds(timeOut));
var mhes = regex.Matches(input);
try
{
	for (var i = 0; i < 1000; i++)
	{
		Console.WriteLine(mhes[i].Index + " , " + mhes[i].Length);
	}
}catch(Exception e)
{
	Console.WriteLine("Interpreted : "+e.Message);
}

Output:

Interpreted : Exception of type 'System.OutOfMemoryException' was thrown.

@longxya
Copy link
Author

longxya commented Mar 25, 2025

using System;
using System.Text.RegularExpressions;

string pattern = @"(()+?){2,100}";
string input = "1";
int timeOut = 10;
Regex regex = new Regex(pattern, RegexOptions.None, TimeSpan.FromMilliseconds(timeOut));
//var match = regex.Match(input);
try
{
	var b = regex.IsMatch(input);
	//Console.WriteLine(match.Index + " , " + match.Length);
}catch(Exception e)
{
	Console.WriteLine("Interpreted : "+e.Message);
}

Output:

Interpreted : Exception of type 'System.OutOfMemoryException' was thrown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Text.RegularExpressions help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

No branches or pull requests

5 participants