Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT]Handling error response when a comma is present in the operand name of tags #107

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
import java.util.regex.Pattern;

public final class TagExpressionParser {
//regex for token to ensure no token has ',' in them later can be customized further
private final static String VALID_TOKEN = "^(?:@[^@]*|and|or|not|\\(|\\))$";
Copy link
Contributor

@mpkorstanje mpkorstanje Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By trying to validate all tokens rather than the litterals you are making this more complex than needed. Considering that this implementation will have to be repeated for a few languages, perhaps a review in the middle of just the Java implementation can save some duplication of effort.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @mpkorstanje, for your reply
I assume that a tag expression consists of operands and operators. When splitting the expression, there are two possibilities. The current token or string could be an operator, which includes "(", ")", "and", "or", and "not". Alternatively, it could be an operand, which should adhere to the Gherkin standard of starting with "@" and not containing "@" anywhere else.

To handle this, I have used a regex pattern to ensure that only valid tokens, which form the expression, are evaluated. By considering both operands and operators, it simplifies the code and eliminates the need for additional checks to determine the token type.

I believed this approach reduces complexity and improves the readability of the code, but maybe I might be missing someting, I would appreciate your opinion on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, actually the situation is slightly different (as you can see from the tests / testdata):

  • Tags in a tag-expression may either be @foo (with-leading-atsign) or foo (without-leading-atsign)
  • @ (atsign) should not occur in a tag (after the first char) in a tag-expression
  • Gherkin parser removes the @ (atsign) from tags for Features/Rules/Scenarios/… . Therefore, @foo becomes“foo“ internally

SEE ALSO:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gherkin parser removes the @ (atsign)

@jenisys are you sure about that? Or is that true only for some Gherkin parsers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but maybe I might be missing someting, I would appreciate your opinion on this.

@shivam-sehgal without repeating myself that will be a bit difficult.

But do consider validating only the litterals. In essence, do not put the validation in the middle of tokenizer but rather after tokenization, just prior to litteral creation.

Copy link
Author

@shivam-sehgal shivam-sehgal Jul 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpkorstanje cucumber-jvm basically calls this library to parse the tag expression as well as to validate if the tag expression is valid or not so the flow is like this in cucmberOptions annotation user provides the tags expression shown in the below code

# — FILE: SomeClass.java
@CucumberOptions(features = "src/it/resources/features",
        glue = {"com.adobe.aep.devex.exim.integrationtests.stepDefinitions",
                "com.adobe.aep.devex.exim.integrationtests.config",
                "com.adobe.aep.devex.exim.integrationtests.hooks"
        },
        plugin = {"pretty", "html:target/cucumber-html-report.html",
                "json:target/cucumber.json",
                "rerun:target/cucumber-api-rerun.txt"
        },
    tags = "not @ignore and @hello" // Exclude the feature with the "@ignore" tag
)
…

this annotation is parsed by CucumberOptionsAnnotationParser.class.
addTags method in this class is called which then calls this repo to validate the tag expression, now the intent of this PR was users by mistake put tag expressions like @tag1,@tag2 in the tags section in the CucumberOptions, and they don't get any error message, to avoid that we need to validate the tag expression not having any tags or literals like @tag,@againInTag to prevent this confusion and give users a valid error response so they can understand that they need to use logical operator, instead of commas, I can put this tag validation for the provided tag expression in addTags method of the CucumberOptionsAnnotationParser clas i.e in cucumber jvm, but my mind says it's better to keep validation logic inside this tag expression validation repo only to follow the single responsibility principle of oops,
to address this concern with the least impacting change, for now, I see if there is an expression having a tag starting with @ that tag shouldn't have @ after first, it will prevent this confusion with a proper error message letting user understand that they are doing mistake and cucumber is reading their expression as a single tag, and also won't affect any case where we allow the literals that are not starting with @

In short, cucumber-jvm calls this repo to validate the tag expression passed by the user and uses a message from TagExpressionExceptionto let the user know with a message what is wrong with the expression, the message should be coming from here i.e from this repo's java directory, else the responsibility of this code we have to handle separately in cucumber-jvm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One solution approach could be:

  • if CucumberOptions had an optional parameter, like tag_expression_validator, tag_expression_parser or an tag-expression preprocess hook you could perform a pre-process check
  • Pre-process check would just split the tag-expression string value into words and check if comma(s) are contained in any word
  • If a comma is contained, you issue a log-warning or raise an exception (whatever the „right“ consequent action is), …

This approach would have the following advantages:

  • Strict checking is optional
  • If you want/need strict checking of tag-expressions, you can plug-in the functionality

HINT:
By looking at the current implementation of the CucumberOptions class, this needed parameter would need to be added to support such a kind of solution.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jenisys, I find the approach of using an optional flag for literal validation quite appealing. However, I have some doubts about the term "strict checking." If libraries using this repository are enforcing Gherkins rules, then it seems reasonable for this library to do the same. However, if we believe and would like to continue supporting users who define literals not starting with @ in new versions, then the optional parameter approach makes sense.

One potential concern raised by @mpkorstanje is that checking for commas alone may cause issues. To differentiate between v1 and v2, perhaps we should consider checking for ^@[^@]+ which he mentioned if the optional strict validation flag is passed in CucumberOptions.

I would appreciate hearing both of your opinions on this matter. Let's collaborate and I can begin working on it.

cc: @mpkorstanje

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cost of the complexity is starting to exceed the value of the solution.

Perhaps a simpler solution such as improving the documentation in Cucumber JVM might help?

Copy link
Author

@shivam-sehgal shivam-sehgal Aug 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @mpkorstanje for your advice, have opened a pull request to update the doc a little bit, to add info for users to avoid the confusion of separating tags using ,, attaching the screenshot of the doc changes below
image
would like to request your review of this PR .
Thanks in advance

private static final Map<String, Assoc> ASSOC = new HashMap<String, Assoc>() {{
put("or", Assoc.LEFT);
put("and", Assoc.LEFT);
Expand Down Expand Up @@ -106,6 +108,7 @@ private static List<String> tokenize(String expr) {
isEscaped = true;
} else if (c == '(' || c == ')' || Character.isWhitespace(c)) {
if (token.length() > 0) {
isTokenValid(token,expr);
tokens.add(token.toString());
token = new StringBuilder();
}
Expand All @@ -116,12 +119,28 @@ private static List<String> tokenize(String expr) {
token.append(c);
}
}
if (token.length() > 0) {
if (token.length() > 0) {
isTokenValid(token,expr);
tokens.add(token.toString());
}
return tokens;
}

/**
* this method checks if the token comply with the req
* regex if not throws exception
* @param token supposed tag or operator of the expresiion
* @param expr entire expression
*/
private static void isTokenValid(StringBuilder token,String expr){

if(token.length()>0&&!String.valueOf(token).matches(VALID_TOKEN)){
throw new TagExpressionException("Tag expression \"%s\" could not be parsed because of syntax error: Please adhere to the Gherkin tag naming convention, using tags like \"@tag1\" and avoiding more than one \"@\" in the tag name.",
expr);
}

}

private void check(TokenType expectedTokenType, TokenType tokenType) {
if (expectedTokenType != tokenType) {
throw new TagExpressionException("Tag expression \"%s\" could not be parsed because of syntax error: Expected %s.", infix, expectedTokenType.toString().toLowerCase());
Expand Down
11 changes: 11 additions & 0 deletions javascript/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ const ASSOC: { [key: string]: string } = {
and: 'left',
not: 'right',
}
const VALID_TOKEN = /(?:@[^@]*|and|or|not|/(|/))$/;

/**
* Parses infix boolean expression (using Dijkstra's Shunting Yard algorithm)
Expand Down Expand Up @@ -109,6 +110,7 @@ function tokenize(expr: string): string[] {
isEscaped = true
} else if (c === '(' || c === ')' || /\s/.test(c)) {
if (token.length > 0) {
isTokenValid(token,expr);
tokens.push(token.join(''))
token = []
}
Expand All @@ -120,11 +122,20 @@ function tokenize(expr: string): string[] {
}
}
if (token.length > 0) {
isTokenValid(token,expr);
tokens.push(token.join(''))
}
return tokens
}

function isTokenValid(token: string[], expr: string): void {
if (!token.toString().match(VALID_TOKEN)) {
throw new Error(
`Tag expression "${expr}" could not be parsed because of syntax error: An invalid tag combination operator was detected. The use of a comma (",") to combine tags is not supported. Please replace it with either the "or" or "and" operators for tag combinations. For example, use "@tag1 or @tag2" or "@tag1 and @tag2"`
);
}
}

function isUnary(token: string) {
return 'not' === token
}
Expand Down
9 changes: 9 additions & 0 deletions ruby/lib/cucumber/tag_expressions/parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ class Parser
def initialize
@expressions = []
@operators = []
@valid_token = /^(?:@[^@]*|and|or|not|\(|\))$/

@operator_types = {
'or' => { type: :binary_operator, precedence: 0, assoc: :left },
Expand Down Expand Up @@ -80,6 +81,7 @@ def tokenize(infix_expression)
escaped = true
elsif ch == '(' || ch == ')' || ch.match(/\s/)
if token.length > 0
is_token_valid(token,infix_expression)
tokens.push(token)
token = ""
end
Expand All @@ -91,11 +93,18 @@ def tokenize(infix_expression)
end
end
if token.length > 0
is_token_valid(token,infix_expression)
tokens.push(token)
end
tokens
end

def is_token_valid(token, expr)
unless token.to_s.match?(@valid_token)
raise %Q{Tag expression "#{expr}" could not be parsed because of syntax error: Please adhere to the Gherkin tag naming convention, using tags like \"@tag1\" and avoiding more than one \"@\" in the tag name.}
end
end

def push_expression(token)
case token
when 'and'
Expand Down
28 changes: 16 additions & 12 deletions testdata/errors.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,21 @@
error: 'Tag expression "@a and (@b @c) or" could not be parsed because of syntax error: Expected operator.'
- expression: '@a and or'
error: 'Tag expression "@a and or" could not be parsed because of syntax error: Expected operand.'
- expression: '@a,@b'
error: 'Tag expression "@a,@b" could not be parsed because of syntax error: Please adhere to the Gherkin tag naming convention, using tags like "@tag1" and avoiding more than one "@" in the tag name.'
- expression: '(@a ,@b)'
error: 'Tag expression "(@a ,@b)" could not be parsed because of syntax error: Please adhere to the Gherkin tag naming convention, using tags like "@tag1" and avoiding more than one "@" in the tag name.'
- expression: 'or or'
error: 'Tag expression "or or" could not be parsed because of syntax error: Expected operand.'
- expression: 'a and or'
error: 'Tag expression "a and or" could not be parsed because of syntax error: Expected operand.'
- expression: 'a b'
error: 'Tag expression "a b" could not be parsed because of syntax error: Expected operator.'
- expression: '( a and b ) )'
error: 'Tag expression "( a and b ) )" could not be parsed because of syntax error: Unmatched ).'
- expression: '( ( a and b )'
error: 'Tag expression "( ( a and b )" could not be parsed because of syntax error: Unmatched (.'
- expression: 'x or \y or z'
error: 'Tag expression "x or \y or z" could not be parsed because of syntax error: Illegal escape before "y".'
- expression: 'x\ or y'
error: 'Tag expression "x\ or y" could not be parsed because of syntax error: Expected operator.'
- expression: '@a and or'
error: 'Tag expression "@a and or" could not be parsed because of syntax error: Expected operand.'
- expression: '@a @b'
error: 'Tag expression "@a @b" could not be parsed because of syntax error: Expected operator.'
- expression: '( @a and @b ) )'
error: 'Tag expression "( @a and @b ) )" could not be parsed because of syntax error: Unmatched ).'
- expression: '( ( @a and @b )'
error: 'Tag expression "( ( @a and @b )" could not be parsed because of syntax error: Unmatched (.'
- expression: '@x or @\y or @z'
error: 'Tag expression "@x or @\y or @z" could not be parsed because of syntax error: Illegal escape before "y".'
- expression: '@x\ or @y'
error: 'Tag expression "@x\ or @y" could not be parsed because of syntax error: Expected operator.'
56 changes: 28 additions & 28 deletions testdata/evaluations.yml
Original file line number Diff line number Diff line change
@@ -1,59 +1,59 @@
- expression: 'not x'
- expression: 'not @x'
tests:
- variables: ['x']
- variables: ['@x']
result: false
- variables: ['y']
- variables: ['@y']
result: true
- expression: 'x and y'
- expression: '@x and @y'
tests:
- variables: ['x', 'y']
- variables: ['@x', '@y']
result: true
- variables: ['x']
- variables: ['@x']
result: false
- variables: ['y']
- variables: ['@y']
result: false

- expression: 'x or y'
- expression: '@x or @y'
tests:
- variables: []
result: false
- variables: ['x', 'y']
- variables: ['@x', '@y']
result: true
- variables: ['x']
- variables: ['@x']
result: true
- variables: ['y']
- variables: ['@y']
result: true
- expression: 'x\(1\) or y\(2\)'
- expression: '@x\(1\) or @y\(2\)'
tests:
- variables: ['x(1)']
- variables: ['@x(1)']
result: true
- variables: ['y(2)']
- variables: ['@y(2)']
result: true
- expression: 'x\\ or y\\\) or z\\'
- expression: '@x\\ or @y\\\) or @z\\'
tests:
- variables: ['x\']
- variables: ['@x\']
result: true
- variables: ['y\)']
- variables: ['@y\)']
result: true
- variables: ['z\']
- variables: ['@z\']
result: true
- variables: ['x']
- variables: ['@x']
result: false
- variables: ['y)']
- variables: ['@y)']
result: false
- variables: ['z']
- variables: ['@z']
result: false
- expression: '\\x or y\\ or z\\'
- expression: '@\\x or @y\\ or @z\\'
tests:
- variables: ['\x']
- variables: ['@\x']
result: true
- variables: ['y\']
- variables: ['@y\']
result: true
- variables: ['z\']
- variables: ['@z\']
result: true
- variables: ['x']
- variables: ['@x']
result: false
- variables: ['y']
- variables: ['@y']
result: false
- variables: ['z']
- variables: ['@z']
result: false
90 changes: 44 additions & 46 deletions testdata/parsing.yml
Original file line number Diff line number Diff line change
@@ -1,48 +1,46 @@
- expression: ''
formatted: 'true'
- expression: 'a and b'
formatted: '( a and b )'
- expression: 'a or b'
formatted: '( a or b )'
- expression: 'not a'
formatted: 'not ( a )'
- expression: 'a and b and c'
formatted: '( ( a and b ) and c )'
- expression: '( a and b ) or ( c and d )'
formatted: '( ( a and b ) or ( c and d ) )'
- expression: 'not a or b and not c or not d or e and f'
formatted: '( ( ( not ( a ) or ( b and not ( c ) ) ) or not ( d ) ) or ( e and f ) )'
- expression: 'not a\(\) or b and not c or not d or e and f'
formatted: '( ( ( not ( a\(\) ) or ( b and not ( c ) ) ) or not ( d ) ) or ( e and f ) )'
- expression: '@a and @b'
formatted: '( @a and @b )'
- expression: '@a or @b'
formatted: '( @a or @b )'
- expression: 'not @a'
formatted: 'not ( @a )'
- expression: '@a and @b and @c'
formatted: '( ( @a and @b ) and @c )'
- expression: '( @a and @b ) or ( @c and @d )'
formatted: '( ( @a and @b ) or ( @c and @d ) )'
- expression: 'not @a or @b and not @c or not @d or @e and @f'
formatted: '( ( ( not ( @a ) or ( @b and not ( @c ) ) ) or not ( @d ) ) or ( @e and @f ) )'
- expression: 'not @a\(\) or @b and not @c or not @d or @e and @f'
formatted: '( ( ( not ( @a\(\) ) or ( @b and not ( @c ) ) ) or not ( @d ) ) or ( @e and @f ) )'

- expression: 'not (a and b)'
formatted: 'not ( a and b )'
- expression: 'not (a or b)'
formatted: 'not ( a or b )'
- expression: 'not (a and b) and c or not (d or f)'
formatted: '( ( not ( a and b ) and c ) or not ( d or f ) )'
- expression: 'not (@a and @b)'
formatted: 'not ( @a and @b )'
- expression: 'not (@a or @b)'
formatted: 'not ( @a or @b )'
- expression: 'not (@a and @b) and @c or not (@d or @f)'
formatted: '( ( not ( @a and @b ) and @c ) or not ( @d or @f ) )'

- expression: 'a\\ and b'
formatted: '( a\\ and b )'
- expression: '\\a and b'
formatted: '( \\a and b )'
- expression: 'a\\ and b'
formatted: '( a\\ and b )'
- expression: 'a and b\\'
formatted: '( a and b\\ )'
- expression: '( a and b\\\\)'
formatted: '( a and b\\\\ )'
- expression: 'a\\\( and b\\\)'
formatted: '( a\\\( and b\\\) )'
- expression: '(a and \\b)'
formatted: '( a and \\b )'
- expression: 'x or(y) '
formatted: '( x or y )'
- expression: 'x\(1\) or(y\(2\))'
formatted: '( x\(1\) or y\(2\) )'
- expression: '\\x or y\\ or z\\'
formatted: '( ( \\x or y\\ ) or z\\ )'
- expression: 'x\\ or(y\\\)) or(z\\)'
formatted: '( ( x\\ or y\\\) ) or z\\ )'
- expression: 'x\ or y'
formatted: '( x\ or y )'
- expression: '@a\\ and @b'
formatted: '( @a\\ and @b )'
- expression: '@\\a and @b'
formatted: '( @\\a and @b )'
- expression: '@a\\ and @b'
formatted: '( @a\\ and @b )'
- expression: '@a and @b\\'
formatted: '( @a and @b\\ )'
- expression: '( @a and @b\\\\)'
formatted: '( @a and @b\\\\ )'
- expression: '@a\\\( and @b\\\)'
formatted: '( @a\\\( and @b\\\) )'
- expression: '(@a and @\\b)'
formatted: '( @a and @\\b )'
- expression: '@x or(@y) '
formatted: '( @x or @y )'
- expression: '@x\(1\) or(@y\(2\))'
formatted: '( @x\(1\) or @y\(2\) )'
- expression: '@\\x or @y\\ or @z\\'
formatted: '( ( @\\x or @y\\ ) or @z\\ )'
- expression: '@x\\ or(@y\\\)) or(@z\\)'
formatted: '( ( @x\\ or @y\\\) ) or @z\\ )'
- expression: '@x\ or @y'
formatted: '( @x\ or @y )'