JavaScript implementation crashes on Unicode code points

I stumbled upon this project from a bug in a downstream project that uses this library, Codiad.

The following function throws an exception:

```javascript
function testPatchUnicode() {
  var cp = '\uD800\uDDE4'; // U+101E4; cannot put directly in source file
  var patches = dmp.patch_make(cp + cp + cp + cp + cp + 'a', cp + cp + cp + cp + cp + 'ab');
  dmp.patch_toText(patches);
}
```

In general, any string that contains a supplemental code point, which are much more common recently with the rise of emoji, causes diff indices to be offset by some number of code points.  This leads to strange or undefined behavior when applying the outputted patches.

This is a rather serious bug that is quietly affecting any downstream project that uses this library.

I think the best fix would be to rewrite the patch-to-string function to operate entirely in code point space instead of JavaScript's default code unit space.

This might also affect non-JavaScript implementations; I haven't looked.

P.S. I am on Google's i18n team and have seen issues like this before.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JavaScript implementation crashes on Unicode code points #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

JavaScript implementation crashes on Unicode code points #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions