diff --git a/encoding.bs b/encoding.bs index b5a539d..b0cce89 100644 --- a/encoding.bs +++ b/encoding.bs @@ -10,6 +10,18 @@ Markup Shorthands: css off Translate IDs: dictdef-textdecoderoptions textdecoderoptions,dictdef-textdecodeoptions textdecodeoptions,index section-index +
+{
+ "ISO8859-1": {
+  "href": "https://www.iso.org/standard/28245.html",
+  "title": "Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1",
+  "publisher": "International Organization for Standardization (ISO)",
+  "status": "Published",
+  "date": "April 1998"
+ }
+}
+
+ @@ -568,7 +580,10 @@ prescribes, as that is necessary to be compatible with deployed content. "windows-1251" "x-cp1251" - windows-1252 + + windows-1252 +

See below for the relationship to historical + "Latin1" and "ASCII" concepts. "ansi_x3.4-1968" "ascii" "cp1252" @@ -732,6 +747,29 @@ part of the ISO 8859 series. In particular, the necessity of the inclusion of ISO-8859-16 is doubtful for the purpose of supporting existing content, but there are no plans to remove these.

+
+

The windows-1252 encoding has various labels, such as + "latin1", "iso-8859-1", and "ascii", which have historically + been confusing for developers. On the web, and in any software that seeks to be web-compatible by + implementing this standard, these are synonyms: "latin1" and "ascii" are + just labels for windows-1252, and any software following this standard will, for example, + decode 0x80 as U+20AC (€) when asked for the "Latin1" or "ASCII" decoding of that byte. + +

Software that does not follow this standard does not always give the same answers. The root of + this is that the original document that specified Latin1 (ISO/IEC 8859-1) did not provide any + mappings for bytes in the inclusive ranges 0x00 to 0x1F or 0x7F to 0x9F. Similarly, the original + documents that specified ASCII (ISO/IEC 646, among others) did not provide any mappings for bytes + in the inclusive range 0x80 to 0xFF. This means different software has chosen different code point + mappings for those bytes when asked to use Latin1 or ASCII encodings. Web browsers and + browser-compatible software have chosen to map those bytes according to windows-1252, which + is a superset of both, and this choice was codified in this standard. Other software throws errors, + or uses isomorphic decoding, or other mappings. [[ISO8859-1]] [[ISO646]] + +

As such, implementers and developers need to be careful whenever they are using libraries which + expose APIs in terms of "Latin1" or "ASCII". It's very possible such libraries will not give + answers in line with this standard, if they have chosen other behaviors for the bytes which were + left undefined in the original specifications. +

Output encodings

diff --git a/tools-label-table.py b/tools-label-table.py index b1a6db3..62670c6 100644 --- a/tools-label-table.py +++ b/tools-label-table.py @@ -14,7 +14,14 @@ def create_table(): if label_len > 1: rowspan = " rowspan=" + str(label_len) - table += " \n " + encoding["name"] + "" + if encoding["name"] != "windows-1252": + table += " \n " + encoding["name"] + "" + else: + table += f""" + + {encoding["name"]} +

See below for the relationship to historical + "Latin1" and "ASCII" concepts.""" i = 0 for label in encoding["labels"]: if i > 0: