You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Assuming the JavaScript implementation if JsonLogic is normative, then the substr implementation in this library is wrong. This is because in JavaScript strings are stored as UTF-16 in memory and indexing happens on UTF-16 code unis. Meaning unicode code points that need two UTF-16 code units can be individually addressed in JavaScript. A string with a single such character has the length of 2 in JavaScript.
To achieve compatibility you would need to convert the string to UTF-16 (Vec<u16>), perform the operations on that, and then convert it back. However, this now means the JsonLogic could create invalid unicode data when it only cuts out a single surrogate of a surrogate pair. Rust doesn't allow strings broken like that (String::from_utf16() returns a Result and String::from_utf16_lossy() may return a string different to what would happen in JavaScript).
I don't know what best to do, but in any case this needs to be documented so that users are aware of any difference to the JavaScript implementation.
Another even bigger problem (that should be fixed more easily): The substr implementation in this library used String::len(), which returns the number of bytes in the string! Strings in Rust are UTF-8 and as such this implementation would already fail on my last name ("Panzenböck"). I.e. it fails on any non-7-bit-ASCII characters. Instead of that string.chars().len() (or something more intelligent that doesn't iterate all chars twice) should be used. This has nothing to do with the UTF-16 problem from above, but without that this implementation isn't if consistent in itself.
Yet another problem: Type coercionrules don't match those of the JavaScript implementation.
There you can use any garbage as the indices and it will convert the garbage into 0. And
strings of numbers just work like the numbers they represent and arrays with a single element
just act like their single element etc. This will probably also be a problem in other operations, but I have only checked substr.
Example 1:
Diverging UTF-16 handling.
Rule:
{"substr": [{"var":"s"},1]}
Data:
{"s":"\uD80C\uDC00"}
JsonLogic playground output:
"\udc00"
json-logic-rs output:
""
Example 2:
Broken non-ASCII character handling.
Rule:
{"substr": [{"var":"s"},0,-2]}
Data:
{"s":"äöü"}
JsonLogic playground output:
"ä"
json-logic-rs output:
"äöü"
Example 3:
Diverging type coercion.
Rule:
{"substr": [{"var":"s"},[1]]}
Data:
{"s":"abc"}
JsonLogic playground output:
"bc"
json-logic-rs output:
Error: Could not execute logic
Caused by:
Invalid argument for 'substr' - 'Array([Number(1)])', reason: Second argument to substr must be a number
The text was updated successfully, but these errors were encountered:
Assuming the JavaScript implementation if JsonLogic is normative, then the substr implementation in this library is wrong. This is because in JavaScript strings are stored as UTF-16 in memory and indexing happens on UTF-16 code unis. Meaning unicode code points that need two UTF-16 code units can be individually addressed in JavaScript. A string with a single such character has the length of 2 in JavaScript.
To achieve compatibility you would need to convert the string to UTF-16 (
Vec<u16>
), perform the operations on that, and then convert it back. However, this now means the JsonLogic could create invalid unicode data when it only cuts out a single surrogate of a surrogate pair. Rust doesn't allow strings broken like that (String::from_utf16()
returns aResult
andString::from_utf16_lossy()
may return a string different to what would happen in JavaScript).I don't know what best to do, but in any case this needs to be documented so that users are aware of any difference to the JavaScript implementation.
Another even bigger problem (that should be fixed more easily): The substr implementation in this library used
String::len()
, which returns the number of bytes in the string! Strings in Rust are UTF-8 and as such this implementation would already fail on my last name ("Panzenböck"). I.e. it fails on any non-7-bit-ASCII characters. Instead of thatstring.chars().len()
(or something more intelligent that doesn't iterate all chars twice) should be used. This has nothing to do with the UTF-16 problem from above, but without that this implementation isn't if consistent in itself.Yet another problem: Type coercionrules don't match those of the JavaScript implementation.
There you can use any garbage as the indices and it will convert the garbage into
0
. Andstrings of numbers just work like the numbers they represent and arrays with a single element
just act like their single element etc. This will probably also be a problem in other operations, but I have only checked substr.
Example 1:
Diverging UTF-16 handling.
Rule:
Data:
JsonLogic playground output:
"\udc00"
json-logic-rs output:
""
Example 2:
Broken non-ASCII character handling.
Rule:
Data:
JsonLogic playground output:
"ä"
json-logic-rs output:
"äöü"
Example 3:
Diverging type coercion.
Rule:
Data:
JsonLogic playground output:
"bc"
json-logic-rs output:
The text was updated successfully, but these errors were encountered: