Skip to content

EntityParser can't handle encoded emoji #67

@aduth

Description

@aduth

While the tokenizer will gracefully decode most encoded characters:

⇒ node
> var Tokenizer = require( 'simple-html-tokenizer' );
undefined
> Tokenizer.tokenize( '&' )[ 0 ].chars === '&'
true

It doesn't handle characters whose encodings exceed 16 bits (e.g. emoji):

⇒ node
> var Tokenizer = require( 'simple-html-tokenizer' );
undefined
> Tokenizer.tokenize( '😅' )[ 0 ].chars === '😅'
false

It may be that EntityParser should use String.fromCodePoint in place of String.fromCharCode instead, or an equivalent polyfill?

Related:

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions