substr index calculation does not match JavaScript implementation #27

panzi · 2021-08-23T04:16:58Z

Assuming the JavaScript implementation if JsonLogic is normative, then the substr implementation in this library is wrong. This is because in JavaScript strings are stored as UTF-16 in memory and indexing happens on UTF-16 code unis. Meaning unicode code points that need two UTF-16 code units can be individually addressed in JavaScript. A string with a single such character has the length of 2 in JavaScript.

To achieve compatibility you would need to convert the string to UTF-16 (Vec<u16>), perform the operations on that, and then convert it back. However, this now means the JsonLogic could create invalid unicode data when it only cuts out a single surrogate of a surrogate pair. Rust doesn't allow strings broken like that (String::from_utf16() returns a Result and String::from_utf16_lossy() may return a string different to what would happen in JavaScript).

I don't know what best to do, but in any case this needs to be documented so that users are aware of any difference to the JavaScript implementation.

Another even bigger problem (that should be fixed more easily): The substr implementation in this library used String::len(), which returns the number of bytes in the string! Strings in Rust are UTF-8 and as such this implementation would already fail on my last name ("Panzenböck"). I.e. it fails on any non-7-bit-ASCII characters. Instead of that string.chars().len() (or something more intelligent that doesn't iterate all chars twice) should be used. This has nothing to do with the UTF-16 problem from above, but without that this implementation isn't if consistent in itself.

Yet another problem: Type coercionrules don't match those of the JavaScript implementation.
There you can use any garbage as the indices and it will convert the garbage into 0. And
strings of numbers just work like the numbers they represent and arrays with a single element
just act like their single element etc. This will probably also be a problem in other operations, but I have only checked substr.

Example 1:

Diverging UTF-16 handling.

Rule:

{"substr": [{"var":"s"},1]}

Data:

{"s":"\uD80C\uDC00"}

JsonLogic playground output:

"\udc00"

json-logic-rs output:

""

Example 2:

Broken non-ASCII character handling.

Rule:

{"substr": [{"var":"s"},0,-2]}

Data:

{"s":"äöü"}

JsonLogic playground output:

"ä"

json-logic-rs output:

"äöü"

Example 3:

Diverging type coercion.

Rule:

{"substr": [{"var":"s"},[1]]}

Data:

{"s":"abc"}

JsonLogic playground output:

"bc"

json-logic-rs output:

Error: Could not execute logic

Caused by:
    Invalid argument for 'substr' - 'Array([Number(1)])', reason: Second argument to substr must be a number

The text was updated successfully, but these errors were encountered:

mkatychev · 2021-08-25T18:16:48Z

Hi @panzi I would suggest you create an issue in https://github.com/mplanchard/json-logic-plus as it is the hard forked version of this repo

panzi mentioned this issue Aug 23, 2021

substr index calculation does not match JavaScript implementation marvindv/jsonlogic_rs#12

Open

panzi mentioned this issue Aug 25, 2021

substr index calculation does not match JavaScript implementation mplanchard/json-logic-plus#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

substr index calculation does not match JavaScript implementation #27

substr index calculation does not match JavaScript implementation #27

panzi commented Aug 23, 2021 •

edited

Loading

mkatychev commented Aug 25, 2021

substr index calculation does not match JavaScript implementation #27

substr index calculation does not match JavaScript implementation #27

Comments

panzi commented Aug 23, 2021 • edited Loading

Example 1:

Example 2:

Example 3:

mkatychev commented Aug 25, 2021

panzi commented Aug 23, 2021 •

edited

Loading