diff --git a/encoding.bs b/encoding.bs index f34ab30..9309ec5 100644 --- a/encoding.bs +++ b/encoding.bs @@ -29,7 +29,7 @@ Translate IDs: dictdef-textdecoderoptions textdecoderoptions,dictdef-textdecodeo
The UTF-8 encoding is the most appropriate encoding for interchange of Unicode, the -universal coded character set. Therefore for new protocols and formats, as well as +universal coded character set. Therefore, for new protocols and formats, as well as existing formats deployed in new contexts, this specification requires (and defines) the UTF-8 encoding. @@ -56,17 +56,18 @@ specification does not provide a mechanism for extending any aspect of encodings
There is a set of encoding security issues when the producer and consumer do not agree on the encoding in use, or on the way a given encoding is to be implemented. For instance, an attack was -reported in 2011 where a Shift_JIS lead byte 0x82 was used to “mask” a 0x22 trail byte in a -JSON resource of which an attacker could control some field. The producer did not see the problem -even though this is an illegal byte combination. The consumer decoded it as a single U+FFFD (�) and -therefore changed the overall interpretation as U+0022 (") is an important delimiter. Decoders of -encodings that use multiple bytes for scalar values now require that in case of an illegal byte -combination, a scalar value in the range U+0000 to U+007F, inclusive, cannot be “masked”. For the -aforementioned sequence the output would be U+FFFD U+0022. (As an unfortunate exception to this, the -gb18030 decoder will “mask” up to one such byte at end-of-queue.) +reported in 2011 where a Shift_JIS leading byte 0x82 was used to “mask” a 0x22 trailing byte +in a JSON resource of which an attacker could control some field. The producer did not see the +problem even though this is an illegal byte combination. The consumer decoded it as a single +U+FFFD (�) and therefore changed the overall interpretation as U+0022 (") is an important delimiter. +Decoders of encodings that use multiple bytes for scalar values now require that in case of an +illegal byte combination, a scalar value in the range U+0000 to U+007F, inclusive, cannot be +“masked”. For the aforementioned sequence the output would be U+FFFD U+0022. (As an unfortunate +exception to this, the gb18030 decoder will “mask” up to one such byte at +end-of-queue.)
This is a larger issue for encodings that map anything that is an ASCII byte to something -that is not an ASCII code point, when there is no lead byte present. These are +that is not an ASCII code point, when there is no leading byte present. These are “ASCII-incompatible” encodings and other than ISO-2022-JP and UTF-16BE/LE, which are unfortunately required due to deployed content, they are not supported. (Investigation is ongoing @@ -901,9 +902,9 @@ specification, excluding index single-byte, which have their own table:
Let offset be the last pointer in index gb18030 ranges that is less than - or equal to pointer and let code point offset be its corresponding code - point. + or equal to pointer and let codePointOffset be its corresponding code point.
Return a code point whose value is - code point offset + pointer − offset. + codePointOffset + pointer − offset. @@ -979,11 +979,11 @@ the return value of these steps:
If codePoint is U+E7C7, then return pointer 7457.
Let offset be the last code point in index gb18030 ranges that is less - than or equal to codePoint and let pointer offset be its corresponding + than or equal to codePoint and let pointerOffset be its corresponding pointer.
Return a pointer whose value is - pointer offset + codePoint − offset. + pointerOffset + codePoint − offset. @@ -1017,8 +1017,8 @@ these steps:
Avoid returning Hong Kong Supplementary Character Set extensions literally.
If codePoint is U+2550, U+255E, U+2561, U+256A, U+5341, or U+5345, - return the last pointer corresponding to codePoint in +
If codePoint is U+2550 (═), U+255E (╞), U+2561 (╡), U+256A (╪), U+5341 (十), or + U+5345 (卅), then return the last pointer corresponding to codePoint in index. @@ -2159,17 +2159,26 @@ that are split between strings. [[!INFRA]] in deployed content. Therefore it is not part of the UTF-8 decoder algorithm, but rather the decode and UTF-8 decode algorithms. -
UTF-8's decoder has an associated -UTF-8 code point, UTF-8 bytes seen, and -UTF-8 bytes needed (all initially 0), a UTF-8 lower boundary -(initially 0x80), and a UTF-8 upper boundary (initially 0xBF). +
UTF-8's decoder has an associated: + +
UTF-8's decoder's handler, given ioQueue and byte, runs these steps:
If byte is end-of-queue and - UTF-8 bytes needed is not 0, set +
If byte is end-of-queue and UTF-8 bytes needed is not 0, then set UTF-8 bytes needed to 0 and return error.
If byte is end-of-queue, then return finished. @@ -2195,11 +2204,9 @@ in deployed content. Therefore it is not part of the UTF-8 decoder algori
If byte is 0xE0, set - UTF-8 lower boundary to 0xA0. +
If byte is 0xE0, then set UTF-8 lower boundary to 0xA0. -
If byte is 0xED, set - UTF-8 upper boundary to 0x9F. +
If byte is 0xED, then set UTF-8 upper boundary to 0x9F.
Set UTF-8 bytes needed to 2. @@ -2212,11 +2219,9 @@ in deployed content. Therefore it is not part of the UTF-8 decoder algori
If byte is 0xF0, set - UTF-8 lower boundary to 0x90. +
If byte is 0xF0, then set UTF-8 lower boundary to 0x90. -
If byte is 0xF4, set - UTF-8 upper boundary to 0x8F. +
If byte is 0xF4, then set UTF-8 upper boundary to 0x8F.
Set UTF-8 bytes needed to 3. @@ -2438,8 +2443,14 @@ consumers of content generated with GBK's encoder.
gb18030's decoder has an associated gb18030 first, -gb18030 second, and gb18030 third (all initially 0x00). +
gb18030's decoder has an associated: + +
gb18030's decoder's handler, given ioQueue and byte, runs these steps: @@ -2484,8 +2495,8 @@ consumers of content generated with GBK's encoder.
If gb18030 second is not 0x00:
If byte is in the range 0x81 to 0xFE, inclusive, set - gb18030 third to byte and return continue. +
If byte is in the range 0x81 to 0xFE, inclusive, then set gb18030 third + to byte and return continue.
Restore « gb18030 second, byte » to ioQueue, set gb18030 first and gb18030 second to 0x00, and return error. @@ -2495,17 +2506,20 @@ consumers of content generated with GBK's encoder.
If gb18030 first is not 0x00:
If byte is in the range 0x30 to 0x39, inclusive, set - gb18030 second to byte and return continue. +
If byte is in the range 0x30 to 0x39, inclusive, then set gb18030 second + to byte and return continue. + +
Let leading be gb18030 first. -
Let lead be gb18030 first, let - pointer be null, and set gb18030 first to 0x00. +
Set gb18030 first to 0x00. + +
Let pointer be null.
Let offset be 0x40 if byte is less than 0x7F; otherwise 0x41. -
If byte is in the range 0x40 to 0x7E, inclusive, or - 0x80 to 0xFE, inclusive, set pointer to - (lead − 0x81) × 190 + (byte − offset). +
If byte is in the range 0x40 to 0x7E, inclusive, or 0x80 to 0xFE, inclusive, + then set pointer to + (leading − 0x81) × 190 + (byte − offset).
Let codePoint be null if pointer is null; otherwise the index code point for pointer in index gb18030. @@ -2533,8 +2547,8 @@ consumers of content generated with GBK's encoder.
gb18030's encoder has an associated is GBK -(initially false). +
gb18030's encoder has an associated is GBK, which is a +boolean, initially false.
gb18030's encoder's handler, given unused and codePoint, runs these steps: @@ -2548,8 +2562,8 @@ consumers of content generated with GBK's encoder.
If codePoint is U+E5E5, then return error with codePoint. -
Index gb18030 maps 0xA3 0xA0 to U+3000 rather than U+E5E5 for - compatibility with deployed content. Therefore it cannot roundtrip. +
Index gb18030 maps 0xA3 0xA0 to U+3000 IDEOGRAPHIC SPACE rather than U+E5E5 + for compatibility with deployed content. Therefore it cannot roundtrip.
If is GBK is true and codePoint is U+20AC (€), then return byte 0x80. @@ -2627,15 +2641,15 @@ consumers of content generated with GBK's encoder.
If pointer is non-null:
Let lead be pointer / 190 + 0x81. +
Let leading be pointer / 190 + 0x81. -
Let trail be pointer % 190. +
Let trailing be pointer % 190. -
Let offset be 0x40 if trail is less than 0x3F, +
Let offset be 0x40 if trailing is less than 0x3F, otherwise 0x41. -
Return two bytes whose values are lead and - trail + offset. +
Return two bytes whose values are leading and + trailing + offset.
If is GBK is true, then return error with codePoint. @@ -2674,35 +2688,35 @@ consumers of content generated with GBK's encoder.
Big5's decoder has an associated -Big5 lead (initially 0x00). +
Big5's decoder has an associated Big5 leading, which +is a byte, initially 0x00. -Big5's decoder's handler, given ioQueue -and byte, runs these steps: +
Big5's decoder's handler, given ioQueue and +byte, runs these steps:
If byte is end-of-queue and Big5 lead is not 0x00, then set - Big5 lead to 0x00 and return error. +
If byte is end-of-queue and Big5 leading is not 0x00, then set + Big5 leading to 0x00 and return error. -
If byte is end-of-queue and Big5 lead is 0x00, then return +
If byte is end-of-queue and Big5 leading is 0x00, then return finished.
If Big5 lead is not 0x00: +
If Big5 leading is not 0x00:
Let lead be Big5 lead. +
Let leading be Big5 leading. -
Set Big5 lead to 0x00. +
Set Big5 leading to 0x00.
Let pointer be null.
Let offset be 0x40 if byte is less than 0x7F; otherwise 0x62. -
If byte is in the range 0x40 to 0x7E, inclusive, or - 0xA1 to 0xFE, inclusive, set pointer to - (lead − 0x81) × 157 + (byte − offset). +
If byte is in the range 0x40 to 0x7E, inclusive, or 0xA1 to 0xFE, inclusive, + then set pointer to + (leading − 0x81) × 157 + (byte − offset).
If there is a row in the table below whose first column is pointer, then return @@ -2736,7 +2750,7 @@ and byte, runs these steps: byte.
If byte is in the range 0x81 to 0xFE, inclusive, then set - Big5 lead to byte and return continue. + Big5 leading to byte and return continue.
Return error.
If pointer is null, then return error with codePoint. -
Let lead be pointer / 157 + 0x81. +
Let leading be pointer / 157 + 0x81. -
Let trail be pointer % 157. +
Let trailing be pointer % 157. -
Let offset be 0x40 if trail is less than 0x3F, +
Let offset be 0x40 if trailing is less than 0x3F, otherwise 0x62. -
Return two bytes whose values are lead and - trail + offset. +
Return two bytes whose values are leading and + trailing + offset.
EUC-JP's decoder has an associated -EUC-JP jis0212 (initially false) and -EUC-JP lead (initially 0x00). +
EUC-JP's decoder has an associated: + +
EUC-JP's decoder's handler, given ioQueue and byte, runs these steps:
If byte is end-of-queue and EUC-JP lead is not 0x00, then set - EUC-JP lead to 0x00 and return error. +
If byte is end-of-queue and EUC-JP leading is not 0x00, then set + EUC-JP leading to 0x00 and return error. -
If byte is end-of-queue and EUC-JP lead is 0x00, then return +
If byte is end-of-queue and EUC-JP leading is 0x00, then return finished. -
If EUC-JP lead is 0x8E and byte is - in the range 0xA1 to 0xDF, inclusive, set EUC-JP lead to 0x00 and return - a code point whose value is 0xFF61 − 0xA1 + byte. +
If EUC-JP leading is 0x8E and byte is in the range 0xA1 to 0xDF, + inclusive, then set EUC-JP leading to 0x00 and return a code point whose value is + 0xFF61 − 0xA1 + byte. -
If EUC-JP lead is 0x8F and byte is in the range - 0xA1 to 0xFE, inclusive, set EUC-JP jis0212 to true, set - EUC-JP lead to byte, and return continue. +
If EUC-JP leading is 0x8F and byte is in the range 0xA1 to 0xFE, + inclusive, then set EUC-JP jis0212 to true, set EUC-JP leading to byte, + and return continue.
If EUC-JP lead is not 0x00: +
If EUC-JP leading is not 0x00:
Let lead be EUC-JP lead. +
Let leading be EUC-JP leading. -
Set EUC-JP lead to 0x00. +
Set EUC-JP leading to 0x00.
Let codePoint be null. -
If lead and byte are both in the range 0xA1 to 0xFE, inclusive, then +
If leading and byte are both in the range 0xA1 to 0xFE, inclusive, then set codePoint to the index code point for - (lead − 0xA1) × 94 + byte − 0xA1 + (leading − 0xA1) × 94 + byte − 0xA1 in index jis0208 if EUC-JP jis0212 is false and in index jis0212 otherwise. @@ -2831,7 +2851,7 @@ and byte, runs these steps: byte.
If byte is 0x8E, 0x8F, or in the range 0xA1 to 0xFE, inclusive, then set - EUC-JP lead to byte and return continue. + EUC-JP leading to byte and return continue.
Return error.
If codePoint is U+203E (‾), then return byte 0x7E. -
If codePoint is in the range U+FF61 to U+FF9F, inclusive, then return two bytes - whose values are 0x8E and codePoint − 0xFF61 + 0xA1. +
If codePoint is in the range U+FF61 (。) to U+FF9F (゚), inclusive, then return two + bytes whose values are 0x8E and codePoint − 0xFF61 + 0xA1.
If codePoint is U+2212 (−), then set it to U+FF0D (-). @@ -2866,11 +2886,11 @@ and byte, runs these steps:
If pointer is null, then return error with codePoint. -
Let lead be pointer / 94 + 0xA1. +
Let leading be pointer / 94 + 0xA1. -
Let trail be pointer % 94 + 0xA1. +
Let trailing be pointer % 94 + 0xA1. -
Return two bytes whose values are lead and trail. +
Return two bytes whose values are leading and trailing.
ISO-2022-JP's decoder has an associated -ISO-2022-JP decoder state (initially -ASCII), -ISO-2022-JP decoder output state (initially -ASCII), -ISO-2022-JP lead (initially 0x00), and -ISO-2022-JP output (initially false). +
ISO-2022-JP's decoder has an associated: + +
ISO-2022-JP's decoder's handler, given ioQueue and byte, runs these steps, switching on @@ -2965,7 +2993,7 @@ and byte, runs these steps:
Set ISO-2022-JP output to false and return error. -
Based on byte:
Set ISO-2022-JP output to false, - ISO-2022-JP lead to byte, - ISO-2022-JP decoder state to - trail byte, and return - continue. +
Set ISO-2022-JP output to false, ISO-2022-JP leading to byte, + ISO-2022-JP decoder state to trailing byte, + and return continue.
Return finished. @@ -2988,24 +3014,23 @@ and byte, runs these steps:
Set ISO-2022-JP output to false and return error.
Based on byte:
Set ISO-2022-JP decoder state to - escape start and return - error. - + escape start and return error. +
Set the ISO-2022-JP decoder state to - lead byte. + leading byte.
Let pointer be - (ISO-2022-JP lead − 0x21) × 94 + byte − 0x21. + (ISO-2022-JP leading − 0x21) × 94 + byte − 0x21.
Let codePoint be the index code point for pointer in index jis0208. @@ -3017,52 +3042,48 @@ and byte, runs these steps:
Set the ISO-2022-JP decoder state to - lead byte and return error. + leading byte and return error.
Set ISO-2022-JP decoder state to - lead byte and return + leading byte and return error. - +
If byte is either 0x24 or 0x28, set - ISO-2022-JP lead to byte, - ISO-2022-JP decoder state to - escape, and return - continue. +
If byte is either 0x24 or 0x28, then set + ISO-2022-JP leading to byte, ISO-2022-JP decoder state to + escape, and return continue.
If byte is not end-of-queue, then restore byte to ioQueue. -
Set ISO-2022-JP output to false, - ISO-2022-JP decoder state to +
Set ISO-2022-JP output to false, ISO-2022-JP decoder state to ISO-2022-JP decoder output state, and return error.
Let lead be ISO-2022-JP lead and set - ISO-2022-JP lead to 0x00. +
Let leading be ISO-2022-JP leading and set + ISO-2022-JP leading to 0x00.
Let state be null. -
If lead is 0x28 and byte is 0x42, set +
If leading is 0x28 and byte is 0x42, then set state to ASCII. -
If lead is 0x28 and byte is 0x4A, set +
If leading is 0x28 and byte is 0x4A, then set state to Roman. -
If lead is 0x28 and byte is 0x49, set +
If leading is 0x28 and byte is 0x49, then set state to katakana. -
If lead is 0x24 and byte is either - 0x40 or 0x42, set state to - lead byte. +
If leading is 0x24 and byte is either 0x40 or 0x42, + then set state to leading byte.
If state is non-null: @@ -3079,8 +3100,8 @@ and byte, runs these steps: error otherwise.
If byte is end-of-queue, then restore lead to - ioQueue; otherwise, restore « lead, byte » to +
If byte is end-of-queue, then restore leading to + ioQueue; otherwise, restore « leading, byte » to ioQueue.
Set ISO-2022-JP output to false, @@ -3097,16 +3118,16 @@ and byte, runs these steps: multiple outputs can result in an error when run through the corresponding decoder. -
Encoding U+00A5 gives 0x1B 0x28 0x4A 0x5C - 0x1B 0x28 0x42. Doing that twice, concatenating the results, and then decoding yields U+00A5 U+FFFD - U+00A5. +
Encoding U+00A5 (¥) gives 0x1B 0x28 0x4A + 0x5C 0x1B 0x28 0x42. Doing that twice, concatenating the results, and then decoding yields U+00A5 + U+FFFD U+00A5.
ISO-2022-JP's encoder has an associated ISO-2022-JP encoder state which is ASCII, Roman, or -jis0208 (initially -ASCII). +jis0208, initially +ASCII.
ISO-2022-JP's encoder's handler, given ioQueue and codePoint, runs these steps: @@ -3157,8 +3178,8 @@ and byte, runs these steps:
If codePoint is U+2212 (−), then set it to U+FF0D (-). -
If codePoint is in the range U+FF61 to U+FF9F, inclusive, then set it to the - index code point for codePoint − 0xFF61 in +
If codePoint is in the range U+FF61 (。) to U+FF9F (゚), inclusive, then set it to + the index code point for codePoint − 0xFF61 in index ISO-2022-JP katakana.
Return error with codePoint.
If ISO-2022-JP encoder state is not - jis0208, - restore codePoint to - ioQueue, set ISO-2022-JP encoder state to - jis0208, and return three bytes - 0x1B 0x24 0x42. +
If ISO-2022-JP encoder state is not jis0208, + then restore codePoint to ioQueue, set + ISO-2022-JP encoder state to jis0208, and return + three bytes 0x1B 0x24 0x42. -
Let lead be pointer / 94 + 0x21. +
Let leading be pointer / 94 + 0x21. -
Let trail be pointer % 94 + 0x21. +
Let trailing be pointer % 94 + 0x21. -
Return two bytes whose values are lead and trail. +
Return two bytes whose values are leading and trailing.
Shift_JIS's decoder has an associated -Shift_JIS lead (initially 0x00). +Shift_JIS leading, which is a byte, initially 0x00. -
Shift_JIS's decoder's handler, given -ioQueue and byte, runs these steps: +
Shift_JIS's decoder's handler, given ioQueue and +byte, runs these steps:
If byte is end-of-queue and Shift_JIS lead is not 0x00, then set - Shift_JIS lead to 0x00 and return error. +
If byte is end-of-queue and Shift_JIS leading is not 0x00, then set + Shift_JIS leading to 0x00 and return error. -
If byte is end-of-queue and Shift_JIS lead is 0x00, then return +
If byte is end-of-queue and Shift_JIS leading is 0x00, then return finished.
If Shift_JIS lead is not 0x00: +
If Shift_JIS leading is not 0x00:
Let lead be Shift_JIS lead. +
Let leading be Shift_JIS leading. -
Set Shift_JIS lead to 0x00. +
Set Shift_JIS leading to 0x00.
Let pointer be null.
Let offset be 0x40 if byte is less than 0x7F; otherwise 0x41. -
Let lead offset be 0x81 if lead is less than 0xA0; otherwise 0xC1. +
Let leadingOffset be 0x81 if leading is less than 0xA0; otherwise + 0xC1.
If byte is in the range 0x40 to 0x7E, inclusive, or 0x80 to 0xFC, inclusive, then set pointer to - (lead − lead offset) × 188 + byte − offset. + (leading − leadingOffset) × 188 + byte − offset.
If pointer is in the range 8836 to 10715, inclusive, then return a code point @@ -3259,7 +3279,7 @@ and byte, runs these steps:
If byte is in the range 0x81 to 0x9F, inclusive, or 0xE0 to 0xFC, inclusive, then - set Shift_JIS lead to byte and return continue. + set Shift_JIS leading to byte and return continue.
Return error.
If codePoint is U+203E (‾), then return byte 0x7E. -
If codePoint is in the range U+FF61 to U+FF9F, inclusive, then return a byte - whose value is codePoint − 0xFF61 + 0xA1. +
If codePoint is in the range U+FF61 (。) to U+FF9F (゚), inclusive, then return a + byte whose value is codePoint − 0xFF61 + 0xA1.
If codePoint is U+2212 (−), then set it to U+FF0D (-). @@ -3289,17 +3309,17 @@ and byte, runs these steps:
If pointer is null, then return error with codePoint. -
Let lead be pointer / 188. +
Let leading be pointer / 188. -
Let lead offset be 0x81 if lead is less than 0x1F; otherwise 0xC1. +
Let leadingOffset be 0x81 if leading is less than 0x1F; otherwise 0xC1. -
Let trail be pointer % 188. +
Let trailing be pointer % 188. -
Let offset be 0x40 if trail is less than 0x3F; otherwise 0x41. +
Let offset be 0x40 if trailing is less than 0x3F; otherwise 0x41. -
Return two bytes whose values are lead + lead offset and - trail + offset. +
Return two bytes whose values are leading + leadingOffset and + trailing + offset.
EUC-KR's decoder has an associated -EUC-KR lead (initially 0x00). +
EUC-KR's decoder has an associated EUC-KR leading, +which is a byte, initially 0x00. -
EUC-KR's decoder's handler, given -ioQueue and byte, runs these steps: +
EUC-KR's decoder's handler, given ioQueue and +byte, runs these steps:
If byte is end-of-queue and EUC-KR lead is not 0x00, then set - EUC-KR lead to 0x00 and return error. +
If byte is end-of-queue and EUC-KR leading is not 0x00, then set + EUC-KR leading to 0x00 and return error. -
If byte is end-of-queue and EUC-KR lead is 0x00, then return +
If byte is end-of-queue and EUC-KR leading is 0x00, then return finished.
If EUC-KR lead is not 0x00: +
If EUC-KR leading is not 0x00:
Let lead be EUC-KR lead. +
Let leading be EUC-KR leading. -
Set EUC-KR lead to 0x00. +
Set EUC-KR leading to 0x00.
Let pointer be null. -
If byte is in the range 0x41 to 0xFE, inclusive, set - pointer to - (lead − 0x81) × 190 + (byte − 0x41). +
If byte is in the range 0x41 to 0xFE, inclusive, then set pointer + to (leading − 0x81) × 190 + (byte − 0x41).
Let codePoint be null if pointer is null; otherwise the index code point for pointer in index EUC-KR. @@ -3352,7 +3371,7 @@ and byte, runs these steps:
If byte is an ASCII byte, then return a code point whose value is byte. -
If byte is in the range 0x81 to 0xFE, inclusive, then set EUC-KR lead to +
If byte is in the range 0x81 to 0xFE, inclusive, then set EUC-KR leading to byte and return continue.
Return error. @@ -3375,11 +3394,11 @@ and byte, runs these steps:
If pointer is null, then return error with codePoint. -
Let lead be pointer / 190 + 0x81. +
Let leading be pointer / 190 + 0x81. -
Let trail be pointer % 190 + 0x41. +
Let trailing be pointer % 190 + 0x41. -
Return two bytes whose values are lead and trail. +
Return two bytes whose values are leading and trailing.
replacement's decoder has an associated -replacement error returned (initially false). +replacement error returned, which is a boolean, +initially false.
replacement's decoder's handler, given unused and byte, runs these steps: @@ -3422,36 +3442,44 @@ the server and the client. in deployed content. Therefore it is not part of the shared UTF-16 decoder algorithm, but rather the decode algorithm. -
shared UTF-16 decoder has an associated UTF-16 lead byte and -UTF-16 leading surrogate (both initially null), and -is UTF-16BE decoder (initially false). +
shared UTF-16 decoder has an associated: + +
shared UTF-16 decoder's handler, given ioQueue and byte, runs these steps:
If byte is end-of-queue and either - UTF-16 lead byte or UTF-16 leading surrogate is non-null, set - UTF-16 lead byte and UTF-16 leading surrogate to null, and return - error. +
If byte is end-of-queue and either UTF-16 leading byte or + UTF-16 leading surrogate is non-null, then set UTF-16 leading byte and + UTF-16 leading surrogate to null, and return error. -
If byte is end-of-queue and UTF-16 lead byte and +
If byte is end-of-queue and UTF-16 leading byte and UTF-16 leading surrogate are null, then return finished. -
If UTF-16 lead byte is null, then set UTF-16 lead byte to byte and - return continue. +
If UTF-16 leading byte is null, then set UTF-16 leading byte to + byte and return continue.
Let codeUnit be the result of:
(UTF-16 lead byte << 8) + byte. +
(UTF-16 leading byte << 8) + byte.
(byte << 8) + UTF-16 lead byte. +
(byte << 8) + UTF-16 leading byte.
Then set UTF-16 lead byte to null. +
Set UTF-16 leading byte to null.
If UTF-16 leading surrogate is non-null: