Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

substr index calculation does not match JavaScript implementation #27

Open
panzi opened this issue Aug 23, 2021 · 1 comment
Open

substr index calculation does not match JavaScript implementation #27

panzi opened this issue Aug 23, 2021 · 1 comment

Comments

@panzi
Copy link

panzi commented Aug 23, 2021

Assuming the JavaScript implementation if JsonLogic is normative, then the substr implementation in this library is wrong. This is because in JavaScript strings are stored as UTF-16 in memory and indexing happens on UTF-16 code unis. Meaning unicode code points that need two UTF-16 code units can be individually addressed in JavaScript. A string with a single such character has the length of 2 in JavaScript.

To achieve compatibility you would need to convert the string to UTF-16 (Vec<u16>), perform the operations on that, and then convert it back. However, this now means the JsonLogic could create invalid unicode data when it only cuts out a single surrogate of a surrogate pair. Rust doesn't allow strings broken like that (String::from_utf16() returns a Result and String::from_utf16_lossy() may return a string different to what would happen in JavaScript).

I don't know what best to do, but in any case this needs to be documented so that users are aware of any difference to the JavaScript implementation.

Another even bigger problem (that should be fixed more easily): The substr implementation in this library used String::len(), which returns the number of bytes in the string! Strings in Rust are UTF-8 and as such this implementation would already fail on my last name ("Panzenböck"). I.e. it fails on any non-7-bit-ASCII characters. Instead of that string.chars().len() (or something more intelligent that doesn't iterate all chars twice) should be used. This has nothing to do with the UTF-16 problem from above, but without that this implementation isn't if consistent in itself.

Yet another problem: Type coercionrules don't match those of the JavaScript implementation.
There you can use any garbage as the indices and it will convert the garbage into 0. And
strings of numbers just work like the numbers they represent and arrays with a single element
just act like their single element etc. This will probably also be a problem in other operations, but I have only checked substr.

Example 1:

Diverging UTF-16 handling.

Rule:

{"substr": [{"var":"s"},1]}

Data:

{"s":"\uD80C\uDC00"}

JsonLogic playground output:

"\udc00"

json-logic-rs output:

""

Example 2:

Broken non-ASCII character handling.

Rule:

{"substr": [{"var":"s"},0,-2]}

Data:

{"s":"äöü"}

JsonLogic playground output:

"ä"

json-logic-rs output:

"äöü"

Example 3:

Diverging type coercion.

Rule:

{"substr": [{"var":"s"},[1]]}

Data:

{"s":"abc"}

JsonLogic playground output:

"bc"

json-logic-rs output:

Error: Could not execute logic

Caused by:
    Invalid argument for 'substr' - 'Array([Number(1)])', reason: Second argument to substr must be a number
@mkatychev
Copy link

Hi @panzi I would suggest you create an issue in https://github.com/mplanchard/json-logic-plus as it is the hard forked version of this repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants