diff --git a/encoding.bs b/encoding.bs index f34ab30..9309ec5 100644 --- a/encoding.bs +++ b/encoding.bs @@ -29,7 +29,7 @@ Translate IDs: dictdef-textdecoderoptions textdecoderoptions,dictdef-textdecodeo

Preface

The UTF-8 encoding is the most appropriate encoding for interchange of Unicode, the -universal coded character set. Therefore for new protocols and formats, as well as +universal coded character set. Therefore, for new protocols and formats, as well as existing formats deployed in new contexts, this specification requires (and defines) the UTF-8 encoding. @@ -56,17 +56,18 @@ specification does not provide a mechanism for extending any aspect of encodings

There is a set of encoding security issues when the producer and consumer do not agree on the encoding in use, or on the way a given encoding is to be implemented. For instance, an attack was -reported in 2011 where a Shift_JIS lead byte 0x82 was used to “mask” a 0x22 trail byte in a -JSON resource of which an attacker could control some field. The producer did not see the problem -even though this is an illegal byte combination. The consumer decoded it as a single U+FFFD (�) and -therefore changed the overall interpretation as U+0022 (") is an important delimiter. Decoders of -encodings that use multiple bytes for scalar values now require that in case of an illegal byte -combination, a scalar value in the range U+0000 to U+007F, inclusive, cannot be “masked”. For the -aforementioned sequence the output would be U+FFFD U+0022. (As an unfortunate exception to this, the -gb18030 decoder will “mask” up to one such byte at end-of-queue.) +reported in 2011 where a Shift_JIS leading byte 0x82 was used to “mask” a 0x22 trailing byte +in a JSON resource of which an attacker could control some field. The producer did not see the +problem even though this is an illegal byte combination. The consumer decoded it as a single +U+FFFD (�) and therefore changed the overall interpretation as U+0022 (") is an important delimiter. +Decoders of encodings that use multiple bytes for scalar values now require that in case of an +illegal byte combination, a scalar value in the range U+0000 to U+007F, inclusive, cannot be +“masked”. For the aforementioned sequence the output would be U+FFFD U+0022. (As an unfortunate +exception to this, the gb18030 decoder will “mask” up to one such byte at +end-of-queue.)

This is a larger issue for encodings that map anything that is an ASCII byte to something -that is not an ASCII code point, when there is no lead byte present. These are +that is not an ASCII code point, when there is no leading byte present. These are “ASCII-incompatible” encodings and other than ISO-2022-JP and UTF-16BE/LE, which are unfortunately required due to deployed content, they are not supported. (Investigation is ongoing @@ -901,9 +902,9 @@ specification, excluding index single-byte, which have their own table: index gb18030 visualization index gb18030 BMP coverage This matches the GB18030-2022 standard for code points encoded as two bytes, except for - 0xA3 0xA0 which maps to U+3000 to be compatible with deployed content. This index covers the - CJK Unified Ideographs block of Unicode in its entirety. Entries from that block that are above or - to the left of (the first) U+3000 in the visualization are in the Unicode order. + 0xA3 0xA0 which maps to U+3000 IDEOGRAPHIC SPACE to be compatible with deployed content. This + index covers the CJK Unified Ideographs block of Unicode in its entirety. Entries from that block + that are above or to the left of (the first) U+3000 in the visualization are in the Unicode order.

  • Let offset be the last pointer in index gb18030 ranges that is less than - or equal to pointer and let code point offset be its corresponding code - point. + or equal to pointer and let codePointOffset be its corresponding code point.

  • Return a code point whose value is - code point offset + pointeroffset. + codePointOffset + pointeroffset. @@ -979,11 +979,11 @@ the return value of these steps:

  • If codePoint is U+E7C7, then return pointer 7457.

  • Let offset be the last code point in index gb18030 ranges that is less - than or equal to codePoint and let pointer offset be its corresponding + than or equal to codePoint and let pointerOffset be its corresponding pointer.

  • Return a pointer whose value is - pointer offset + codePointoffset. + pointerOffset + codePointoffset. @@ -1017,8 +1017,8 @@ these steps:

    Avoid returning Hong Kong Supplementary Character Set extensions literally.

  • -

    If codePoint is U+2550, U+255E, U+2561, U+256A, U+5341, or U+5345, - return the last pointer corresponding to codePoint in +

    If codePoint is U+2550 (═), U+255E (╞), U+2561 (╡), U+256A (╪), U+5341 (十), or + U+5345 (卅), then return the last pointer corresponding to codePoint in index. @@ -2159,17 +2159,26 @@ that are split between strings. [[!INFRA]] in deployed content. Therefore it is not part of the UTF-8 decoder algorithm, but rather the decode and UTF-8 decode algorithms. -

    UTF-8's decoder has an associated -UTF-8 code point, UTF-8 bytes seen, and -UTF-8 bytes needed (all initially 0), a UTF-8 lower boundary -(initially 0x80), and a UTF-8 upper boundary (initially 0xBF). +

    UTF-8's decoder has an associated: + +

    +
    UTF-8 code point +
    UTF-8 bytes seen +
    UTF-8 bytes needed +
    Each a number, initially 0. + +
    UTF-8 lower boundary +
    A byte, initially 0x80. + +
    UTF-8 upper boundary +
    A byte, initially 0xBF. +

    UTF-8's decoder's handler, given ioQueue and byte, runs these steps:

      -
    1. If byte is end-of-queue and - UTF-8 bytes needed is not 0, set +

    2. If byte is end-of-queue and UTF-8 bytes needed is not 0, then set UTF-8 bytes needed to 0 and return error.

    3. If byte is end-of-queue, then return finished. @@ -2195,11 +2204,9 @@ in deployed content. Therefore it is not part of the UTF-8 decoder algori

      0xE0 to 0xEF
        -
      1. If byte is 0xE0, set - UTF-8 lower boundary to 0xA0. +

      2. If byte is 0xE0, then set UTF-8 lower boundary to 0xA0. -

      3. If byte is 0xED, set - UTF-8 upper boundary to 0x9F. +

      4. If byte is 0xED, then set UTF-8 upper boundary to 0x9F.

      5. Set UTF-8 bytes needed to 2. @@ -2212,11 +2219,9 @@ in deployed content. Therefore it is not part of the UTF-8 decoder algori

        0xF0 to 0xF4
          -
        1. If byte is 0xF0, set - UTF-8 lower boundary to 0x90. +

        2. If byte is 0xF0, then set UTF-8 lower boundary to 0x90. -

        3. If byte is 0xF4, set - UTF-8 upper boundary to 0x8F. +

        4. If byte is 0xF4, then set UTF-8 upper boundary to 0x8F.

        5. Set UTF-8 bytes needed to 3. @@ -2438,8 +2443,14 @@ consumers of content generated with GBK's encoder.

          gb18030 decoder

          -

          gb18030's decoder has an associated gb18030 first, -gb18030 second, and gb18030 third (all initially 0x00). +

          gb18030's decoder has an associated: + +

          +
          gb18030 first +
          gb18030 second +
          gb18030 third +
          Each a byte, initially 0x00. +

          gb18030's decoder's handler, given ioQueue and byte, runs these steps: @@ -2484,8 +2495,8 @@ consumers of content generated with GBK's encoder.

          If gb18030 second is not 0x00:

            -
          1. If byte is in the range 0x81 to 0xFE, inclusive, set - gb18030 third to byte and return continue. +

          2. If byte is in the range 0x81 to 0xFE, inclusive, then set gb18030 third + to byte and return continue.

          3. Restore « gb18030 second, byte » to ioQueue, set gb18030 first and gb18030 second to 0x00, and return error. @@ -2495,17 +2506,20 @@ consumers of content generated with GBK's encoder.

            If gb18030 first is not 0x00:

              -
            1. If byte is in the range 0x30 to 0x39, inclusive, set - gb18030 second to byte and return continue. +

            2. If byte is in the range 0x30 to 0x39, inclusive, then set gb18030 second + to byte and return continue. + +

            3. Let leading be gb18030 first. -

            4. Let lead be gb18030 first, let - pointer be null, and set gb18030 first to 0x00. +

            5. Set gb18030 first to 0x00. + +

            6. Let pointer be null.

            7. Let offset be 0x40 if byte is less than 0x7F; otherwise 0x41. -

            8. If byte is in the range 0x40 to 0x7E, inclusive, or - 0x80 to 0xFE, inclusive, set pointer to - (lead − 0x81) × 190 + (byteoffset). +

            9. If byte is in the range 0x40 to 0x7E, inclusive, or 0x80 to 0xFE, inclusive, + then set pointer to + (leading − 0x81) × 190 + (byteoffset).

            10. Let codePoint be null if pointer is null; otherwise the index code point for pointer in index gb18030. @@ -2533,8 +2547,8 @@ consumers of content generated with GBK's encoder.

              gb18030 encoder

              -

              gb18030's encoder has an associated is GBK -(initially false). +

              gb18030's encoder has an associated is GBK, which is a +boolean, initially false.

              gb18030's encoder's handler, given unused and codePoint, runs these steps: @@ -2548,8 +2562,8 @@ consumers of content generated with GBK's encoder.

            11. If codePoint is U+E5E5, then return error with codePoint. -

              Index gb18030 maps 0xA3 0xA0 to U+3000 rather than U+E5E5 for - compatibility with deployed content. Therefore it cannot roundtrip. +

              Index gb18030 maps 0xA3 0xA0 to U+3000 IDEOGRAPHIC SPACE rather than U+E5E5 + for compatibility with deployed content. Therefore it cannot roundtrip.

            12. If is GBK is true and codePoint is U+20AC (€), then return byte 0x80. @@ -2627,15 +2641,15 @@ consumers of content generated with GBK's encoder.

              If pointer is non-null:

                -
              1. Let lead be pointer / 190 + 0x81. +

              2. Let leading be pointer / 190 + 0x81. -

              3. Let trail be pointer % 190. +

              4. Let trailing be pointer % 190. -

              5. Let offset be 0x40 if trail is less than 0x3F, +

              6. Let offset be 0x40 if trailing is less than 0x3F, otherwise 0x41. -

              7. Return two bytes whose values are lead and - trail + offset. +

              8. Return two bytes whose values are leading and + trailing + offset.

            13. If is GBK is true, then return error with codePoint. @@ -2674,35 +2688,35 @@ consumers of content generated with GBK's encoder.

              Big5 decoder

              -

              Big5's decoder has an associated -Big5 lead (initially 0x00). +

              Big5's decoder has an associated Big5 leading, which +is a byte, initially 0x00. -Big5's decoder's handler, given ioQueue -and byte, runs these steps: +

              Big5's decoder's handler, given ioQueue and +byte, runs these steps:

                -
              1. If byte is end-of-queue and Big5 lead is not 0x00, then set - Big5 lead to 0x00 and return error. +

              2. If byte is end-of-queue and Big5 leading is not 0x00, then set + Big5 leading to 0x00 and return error. -

              3. If byte is end-of-queue and Big5 lead is 0x00, then return +

              4. If byte is end-of-queue and Big5 leading is 0x00, then return finished.

              5. -

                If Big5 lead is not 0x00: +

                If Big5 leading is not 0x00:

                  -
                1. Let lead be Big5 lead. +

                2. Let leading be Big5 leading. -

                3. Set Big5 lead to 0x00. +

                4. Set Big5 leading to 0x00.

                5. Let pointer be null.

                6. Let offset be 0x40 if byte is less than 0x7F; otherwise 0x62. -

                7. If byte is in the range 0x40 to 0x7E, inclusive, or - 0xA1 to 0xFE, inclusive, set pointer to - (lead − 0x81) × 157 + (byteoffset). +

                8. If byte is in the range 0x40 to 0x7E, inclusive, or 0xA1 to 0xFE, inclusive, + then set pointer to + (leading − 0x81) × 157 + (byteoffset).

                9. If there is a row in the table below whose first column is pointer, then return @@ -2736,7 +2750,7 @@ and byte, runs these steps: byte.

                10. If byte is in the range 0x81 to 0xFE, inclusive, then set - Big5 lead to byte and return continue. + Big5 leading to byte and return continue.

                11. Return error.

                @@ -2757,15 +2771,15 @@ and byte, runs these steps:
              6. If pointer is null, then return error with codePoint. -

              7. Let lead be pointer / 157 + 0x81. +

              8. Let leading be pointer / 157 + 0x81. -

              9. Let trail be pointer % 157. +

              10. Let trailing be pointer % 157. -

              11. Let offset be 0x40 if trail is less than 0x3F, +

              12. Let offset be 0x40 if trailing is less than 0x3F, otherwise 0x62. -

              13. Return two bytes whose values are lead and - trail + offset. +

              14. Return two bytes whose values are leading and + trailing + offset.

              @@ -2777,42 +2791,48 @@ and byte, runs these steps:

              EUC-JP decoder

              -

              EUC-JP's decoder has an associated -EUC-JP jis0212 (initially false) and -EUC-JP lead (initially 0x00). +

              EUC-JP's decoder has an associated: + +

              +
              EUC-JP jis0212 +
              A boolean, initially false. + +
              EUC-JP leading +
              A byte, initially 0x00. +

              EUC-JP's decoder's handler, given ioQueue and byte, runs these steps:

                -
              1. If byte is end-of-queue and EUC-JP lead is not 0x00, then set - EUC-JP lead to 0x00 and return error. +

              2. If byte is end-of-queue and EUC-JP leading is not 0x00, then set + EUC-JP leading to 0x00 and return error. -

              3. If byte is end-of-queue and EUC-JP lead is 0x00, then return +

              4. If byte is end-of-queue and EUC-JP leading is 0x00, then return finished. -

              5. If EUC-JP lead is 0x8E and byte is - in the range 0xA1 to 0xDF, inclusive, set EUC-JP lead to 0x00 and return - a code point whose value is 0xFF61 − 0xA1 + byte. +

              6. If EUC-JP leading is 0x8E and byte is in the range 0xA1 to 0xDF, + inclusive, then set EUC-JP leading to 0x00 and return a code point whose value is + 0xFF61 − 0xA1 + byte. -

              7. If EUC-JP lead is 0x8F and byte is in the range - 0xA1 to 0xFE, inclusive, set EUC-JP jis0212 to true, set - EUC-JP lead to byte, and return continue. +

              8. If EUC-JP leading is 0x8F and byte is in the range 0xA1 to 0xFE, + inclusive, then set EUC-JP jis0212 to true, set EUC-JP leading to byte, + and return continue.

              9. -

                If EUC-JP lead is not 0x00: +

                If EUC-JP leading is not 0x00:

                  -
                1. Let lead be EUC-JP lead. +

                2. Let leading be EUC-JP leading. -

                3. Set EUC-JP lead to 0x00. +

                4. Set EUC-JP leading to 0x00.

                5. Let codePoint be null. -

                6. If lead and byte are both in the range 0xA1 to 0xFE, inclusive, then +

                7. If leading and byte are both in the range 0xA1 to 0xFE, inclusive, then set codePoint to the index code point for - (lead − 0xA1) × 94 + byte − 0xA1 + (leading − 0xA1) × 94 + byte − 0xA1 in index jis0208 if EUC-JP jis0212 is false and in index jis0212 otherwise. @@ -2831,7 +2851,7 @@ and byte, runs these steps: byte.

                8. If byte is 0x8E, 0x8F, or in the range 0xA1 to 0xFE, inclusive, then set - EUC-JP lead to byte and return continue. + EUC-JP leading to byte and return continue.

                9. Return error.

                @@ -2852,8 +2872,8 @@ and byte, runs these steps:
              10. If codePoint is U+203E (‾), then return byte 0x7E. -

              11. If codePoint is in the range U+FF61 to U+FF9F, inclusive, then return two bytes - whose values are 0x8E and codePoint − 0xFF61 + 0xA1. +

              12. If codePoint is in the range U+FF61 (。) to U+FF9F (゚), inclusive, then return two + bytes whose values are 0x8E and codePoint − 0xFF61 + 0xA1.

              13. If codePoint is U+2212 (−), then set it to U+FF0D (-). @@ -2866,11 +2886,11 @@ and byte, runs these steps:

              14. If pointer is null, then return error with codePoint. -

              15. Let lead be pointer / 94 + 0xA1. +

              16. Let leading be pointer / 94 + 0xA1. -

              17. Let trail be pointer % 94 + 0xA1. +

              18. Let trailing be pointer % 94 + 0xA1. -

              19. Return two bytes whose values are lead and trail. +

              20. Return two bytes whose values are leading and trailing.

              @@ -2883,13 +2903,21 @@ and byte, runs these steps:

              ISO-2022-JP decoder

              -

              ISO-2022-JP's decoder has an associated -ISO-2022-JP decoder state (initially -ASCII), -ISO-2022-JP decoder output state (initially -ASCII), -ISO-2022-JP lead (initially 0x00), and -ISO-2022-JP output (initially false). +

              ISO-2022-JP's decoder has an associated: + +

              +
              ISO-2022-JP decoder state +
              A state, initially ASCII. + +
              ISO-2022-JP decoder output state +
              A state, initially ASCII. + +
              ISO-2022-JP leading +
              A byte, initially 0x00. + +
              ISO-2022-JP output +
              A boolean, initially false. +

              ISO-2022-JP's decoder's handler, given ioQueue and byte, runs these steps, switching on @@ -2965,7 +2993,7 @@ and byte, runs these steps:

              Set ISO-2022-JP output to false and return error. -

              Lead byte +
              Leading byte

              Based on byte:

              @@ -2975,11 +3003,9 @@ and byte, runs these steps: continue.
              0x21 to 0x7E -

              Set ISO-2022-JP output to false, - ISO-2022-JP lead to byte, - ISO-2022-JP decoder state to - trail byte, and return - continue. +

              Set ISO-2022-JP output to false, ISO-2022-JP leading to byte, + ISO-2022-JP decoder state to trailing byte, + and return continue.

              end-of-queue

              Return finished. @@ -2988,24 +3014,23 @@ and byte, runs these steps:

              Set ISO-2022-JP output to false and return error.

              -
              Trail byte +
              Trailing byte

              Based on byte:

              0x1B

              Set ISO-2022-JP decoder state to - escape start and return - error. - + escape start and return error. +

              0x21 to 0x7E
              1. Set the ISO-2022-JP decoder state to - lead byte. + leading byte.

              2. Let pointer be - (ISO-2022-JP lead − 0x21) × 94 + byte − 0x21. + (ISO-2022-JP leading − 0x21) × 94 + byte − 0x21.

              3. Let codePoint be the index code point for pointer in index jis0208. @@ -3017,52 +3042,48 @@ and byte, runs these steps:

                end-of-queue

                Set the ISO-2022-JP decoder state to - lead byte and return error. + leading byte and return error.

                Otherwise

                Set ISO-2022-JP decoder state to - lead byte and return + leading byte and return error. - +

              Escape start
                -
              1. If byte is either 0x24 or 0x28, set - ISO-2022-JP lead to byte, - ISO-2022-JP decoder state to - escape, and return - continue. +

              2. If byte is either 0x24 or 0x28, then set + ISO-2022-JP leading to byte, ISO-2022-JP decoder state to + escape, and return continue.

              3. If byte is not end-of-queue, then restore byte to ioQueue. -

              4. Set ISO-2022-JP output to false, - ISO-2022-JP decoder state to +

              5. Set ISO-2022-JP output to false, ISO-2022-JP decoder state to ISO-2022-JP decoder output state, and return error.

              Escape
                -
              1. Let lead be ISO-2022-JP lead and set - ISO-2022-JP lead to 0x00. +

              2. Let leading be ISO-2022-JP leading and set + ISO-2022-JP leading to 0x00.

              3. Let state be null. -

              4. If lead is 0x28 and byte is 0x42, set +

              5. If leading is 0x28 and byte is 0x42, then set state to ASCII. -

              6. If lead is 0x28 and byte is 0x4A, set +

              7. If leading is 0x28 and byte is 0x4A, then set state to Roman. -

              8. If lead is 0x28 and byte is 0x49, set +

              9. If leading is 0x28 and byte is 0x49, then set state to katakana. -

              10. If lead is 0x24 and byte is either - 0x40 or 0x42, set state to - lead byte. +

              11. If leading is 0x24 and byte is either 0x40 or 0x42, + then set state to leading byte.

              12. If state is non-null: @@ -3079,8 +3100,8 @@ and byte, runs these steps: error otherwise.

              -
            14. If byte is end-of-queue, then restore lead to - ioQueue; otherwise, restore « lead, byte » to +

            15. If byte is end-of-queue, then restore leading to + ioQueue; otherwise, restore « leading, byte » to ioQueue.

            16. Set ISO-2022-JP output to false, @@ -3097,16 +3118,16 @@ and byte, runs these steps: multiple outputs can result in an error when run through the corresponding decoder. -

              Encoding U+00A5 gives 0x1B 0x28 0x4A 0x5C - 0x1B 0x28 0x42. Doing that twice, concatenating the results, and then decoding yields U+00A5 U+FFFD - U+00A5. +

              Encoding U+00A5 (¥) gives 0x1B 0x28 0x4A + 0x5C 0x1B 0x28 0x42. Doing that twice, concatenating the results, and then decoding yields U+00A5 + U+FFFD U+00A5.

              ISO-2022-JP's encoder has an associated ISO-2022-JP encoder state which is ASCII, Roman, or -jis0208 (initially -ASCII). +jis0208, initially +ASCII.

              ISO-2022-JP's encoder's handler, given ioQueue and codePoint, runs these steps: @@ -3157,8 +3178,8 @@ and byte, runs these steps:

            17. If codePoint is U+2212 (−), then set it to U+FF0D (-). -

            18. If codePoint is in the range U+FF61 to U+FF9F, inclusive, then set it to the - index code point for codePoint − 0xFF61 in +

            19. If codePoint is in the range U+FF61 (。) to U+FF9F (゚), inclusive, then set it to + the index code point for codePoint − 0xFF61 in index ISO-2022-JP katakana.

            20. @@ -3180,18 +3201,16 @@ and byte, runs these steps:
            21. Return error with codePoint.

            -
          4. If ISO-2022-JP encoder state is not - jis0208, - restore codePoint to - ioQueue, set ISO-2022-JP encoder state to - jis0208, and return three bytes - 0x1B 0x24 0x42. +

          5. If ISO-2022-JP encoder state is not jis0208, + then restore codePoint to ioQueue, set + ISO-2022-JP encoder state to jis0208, and return + three bytes 0x1B 0x24 0x42. -

          6. Let lead be pointer / 94 + 0x21. +

          7. Let leading be pointer / 94 + 0x21. -

          8. Let trail be pointer % 94 + 0x21. +

          9. Let trailing be pointer % 94 + 0x21. -

          10. Return two bytes whose values are lead and trail. +

          11. Return two bytes whose values are leading and trailing.

          @@ -3200,35 +3219,36 @@ and byte, runs these steps:

          Shift_JIS decoder

          Shift_JIS's decoder has an associated -Shift_JIS lead (initially 0x00). +Shift_JIS leading, which is a byte, initially 0x00. -

          Shift_JIS's decoder's handler, given -ioQueue and byte, runs these steps: +

          Shift_JIS's decoder's handler, given ioQueue and +byte, runs these steps:

            -
          1. If byte is end-of-queue and Shift_JIS lead is not 0x00, then set - Shift_JIS lead to 0x00 and return error. +

          2. If byte is end-of-queue and Shift_JIS leading is not 0x00, then set + Shift_JIS leading to 0x00 and return error. -

          3. If byte is end-of-queue and Shift_JIS lead is 0x00, then return +

          4. If byte is end-of-queue and Shift_JIS leading is 0x00, then return finished.

          5. -

            If Shift_JIS lead is not 0x00: +

            If Shift_JIS leading is not 0x00:

              -
            1. Let lead be Shift_JIS lead. +

            2. Let leading be Shift_JIS leading. -

            3. Set Shift_JIS lead to 0x00. +

            4. Set Shift_JIS leading to 0x00.

            5. Let pointer be null.

            6. Let offset be 0x40 if byte is less than 0x7F; otherwise 0x41. -

            7. Let lead offset be 0x81 if lead is less than 0xA0; otherwise 0xC1. +

            8. Let leadingOffset be 0x81 if leading is less than 0xA0; otherwise + 0xC1.

            9. If byte is in the range 0x40 to 0x7E, inclusive, or 0x80 to 0xFC, inclusive, then set pointer to - (leadlead offset) × 188 + byteoffset. + (leadingleadingOffset) × 188 + byteoffset.

            10. If pointer is in the range 8836 to 10715, inclusive, then return a code point @@ -3259,7 +3279,7 @@ and byte, runs these steps:

            11. If byte is in the range 0x81 to 0x9F, inclusive, or 0xE0 to 0xFC, inclusive, then - set Shift_JIS lead to byte and return continue. + set Shift_JIS leading to byte and return continue.

            12. Return error.

            @@ -3280,8 +3300,8 @@ and byte, runs these steps:
          6. If codePoint is U+203E (‾), then return byte 0x7E. -

          7. If codePoint is in the range U+FF61 to U+FF9F, inclusive, then return a byte - whose value is codePoint − 0xFF61 + 0xA1. +

          8. If codePoint is in the range U+FF61 (。) to U+FF9F (゚), inclusive, then return a + byte whose value is codePoint − 0xFF61 + 0xA1.

          9. If codePoint is U+2212 (−), then set it to U+FF0D (-). @@ -3289,17 +3309,17 @@ and byte, runs these steps:

          10. If pointer is null, then return error with codePoint. -

          11. Let lead be pointer / 188. +

          12. Let leading be pointer / 188. -

          13. Let lead offset be 0x81 if lead is less than 0x1F; otherwise 0xC1. +

          14. Let leadingOffset be 0x81 if leading is less than 0x1F; otherwise 0xC1. -

          15. Let trail be pointer % 188. +

          16. Let trailing be pointer % 188. -

          17. Let offset be 0x40 if trail is less than 0x3F; otherwise 0x41. +

          18. Let offset be 0x40 if trailing is less than 0x3F; otherwise 0x41. -

          19. Return two bytes whose values are lead + lead offset and - trail + offset. +

          20. Return two bytes whose values are leading + leadingOffset and + trailing + offset.

          @@ -3310,32 +3330,31 @@ and byte, runs these steps:

          EUC-KR decoder

          -

          EUC-KR's decoder has an associated -EUC-KR lead (initially 0x00). +

          EUC-KR's decoder has an associated EUC-KR leading, +which is a byte, initially 0x00. -

          EUC-KR's decoder's handler, given -ioQueue and byte, runs these steps: +

          EUC-KR's decoder's handler, given ioQueue and +byte, runs these steps:

            -
          1. If byte is end-of-queue and EUC-KR lead is not 0x00, then set - EUC-KR lead to 0x00 and return error. +

          2. If byte is end-of-queue and EUC-KR leading is not 0x00, then set + EUC-KR leading to 0x00 and return error. -

          3. If byte is end-of-queue and EUC-KR lead is 0x00, then return +

          4. If byte is end-of-queue and EUC-KR leading is 0x00, then return finished.

          5. -

            If EUC-KR lead is not 0x00: +

            If EUC-KR leading is not 0x00:

              -
            1. Let lead be EUC-KR lead. +

            2. Let leading be EUC-KR leading. -

            3. Set EUC-KR lead to 0x00. +

            4. Set EUC-KR leading to 0x00.

            5. Let pointer be null. -

            6. If byte is in the range 0x41 to 0xFE, inclusive, set - pointer to - (lead − 0x81) × 190 + (byte − 0x41). +

            7. If byte is in the range 0x41 to 0xFE, inclusive, then set pointer + to (leading − 0x81) × 190 + (byte − 0x41).

            8. Let codePoint be null if pointer is null; otherwise the index code point for pointer in index EUC-KR. @@ -3352,7 +3371,7 @@ and byte, runs these steps:

            9. If byte is an ASCII byte, then return a code point whose value is byte. -

            10. If byte is in the range 0x81 to 0xFE, inclusive, then set EUC-KR lead to +

            11. If byte is in the range 0x81 to 0xFE, inclusive, then set EUC-KR leading to byte and return continue.

            12. Return error. @@ -3375,11 +3394,11 @@ and byte, runs these steps:

            13. If pointer is null, then return error with codePoint. -

            14. Let lead be pointer / 190 + 0x81. +

            15. Let leading be pointer / 190 + 0x81. -

            16. Let trail be pointer % 190 + 0x41. +

            17. Let trailing be pointer % 190 + 0x41. -

            18. Return two bytes whose values are lead and trail. +

            19. Return two bytes whose values are leading and trailing.

            @@ -3396,7 +3415,8 @@ the server and the client.

            replacement decoder

            replacement's decoder has an associated -replacement error returned (initially false). +replacement error returned, which is a boolean, +initially false.

            replacement's decoder's handler, given unused and byte, runs these steps: @@ -3422,36 +3442,44 @@ the server and the client. in deployed content. Therefore it is not part of the shared UTF-16 decoder algorithm, but rather the decode algorithm. -

            shared UTF-16 decoder has an associated UTF-16 lead byte and -UTF-16 leading surrogate (both initially null), and -is UTF-16BE decoder (initially false). +

            shared UTF-16 decoder has an associated: + +

            +
            UTF-16 leading byte +
            Null or a byte, initially null. + +
            UTF-16 leading surrogate +
            Null or a leading surrogate, initially null. + +
            is UTF-16BE decoder +
            A boolean, initially false. +

            shared UTF-16 decoder's handler, given ioQueue and byte, runs these steps:

              -
            1. If byte is end-of-queue and either - UTF-16 lead byte or UTF-16 leading surrogate is non-null, set - UTF-16 lead byte and UTF-16 leading surrogate to null, and return - error. +

            2. If byte is end-of-queue and either UTF-16 leading byte or + UTF-16 leading surrogate is non-null, then set UTF-16 leading byte and + UTF-16 leading surrogate to null, and return error. -

            3. If byte is end-of-queue and UTF-16 lead byte and +

            4. If byte is end-of-queue and UTF-16 leading byte and UTF-16 leading surrogate are null, then return finished. -

            5. If UTF-16 lead byte is null, then set UTF-16 lead byte to byte and - return continue. +

            6. If UTF-16 leading byte is null, then set UTF-16 leading byte to + byte and return continue.

            7. Let codeUnit be the result of:

              is UTF-16BE decoder is true -

              (UTF-16 lead byte << 8) + byte. +

              (UTF-16 leading byte << 8) + byte.

              is UTF-16BE decoder is false -

              (byte << 8) + UTF-16 lead byte. +

              (byte << 8) + UTF-16 leading byte.

              -

              Then set UTF-16 lead byte to null. +

            8. Set UTF-16 leading byte to null.

            9. If UTF-16 leading surrogate is non-null: