Skip to content

Commit 7d631f2

Browse files
committed
version bump 1.3.6: codepage cmd fixes
1 parent 67d9bed commit 7d631f2

15 files changed

+110
-37
lines changed

NOTES.md

+67
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Verifying Codepages
2+
3+
After installing every language pack in Windows 7, many codepages are available
4+
via the .NET System.Text.Encoding class. The MakeEncoding.cs source included with
5+
the project generates a full manifest that can be parsed into a mapping table.
6+
7+
The included `nls2tbl` script extracts data from the various `C_#####.NLS` files
8+
available in the system or system32 directories in various versions of Windows.
9+
10+
Many codepages are also available in various iconv libraries, but there are some
11+
differences. For example, some codepages break ASCII by using the Arabic percent
12+
sign ٪ U+066A but other libraries assume they preserve the ASCII space.
13+
14+
# Missing Codepages
15+
16+
The following codepages are not implemented. Normative references may not be
17+
available in all cases. Furthermore, other software packages are known to hack
18+
certain codepages (for example, Mozilla treats ASMO-708 as an alias of Arabic
19+
ISO-8869-6 when in fact there are many differences), so all implementations
20+
*should* be cleanroom when possible.
21+
22+
- 709 Arabic (ASMO-449+, BCON V4)
23+
- 710 Arabic - Transparent Arabic
24+
- 50229 ISO 2022 Traditional Chinese
25+
- 50930 EBCDIC Japanese (Katakana) Extended
26+
- 50931 EBCDIC US-Canada and Japanese
27+
- 50933 EBCDIC Korean Extended and Korean
28+
- 50935 EBCDIC Simplified Chinese Extended and Simplified Chinese
29+
- 50936 EBCDIC Simplified Chinese
30+
- 50937 EBCDIC US-Canada and Traditional Chinese
31+
- 50939 EBCDIC Japanese (Latin) Extended and Japanese
32+
- 51950 EUC Traditional Chinese
33+
34+
Each version of Windows adds a few and removes a few codepages, so the missing
35+
codepages most likely reside in a specific version that we may not be able to
36+
obtain. These notes document our progress.
37+
38+
## Arabic codepages 709-710
39+
40+
These codepages are not available in the Arabic version of Windows XP. They may
41+
be available in the Arabic versions of MS-DOS or Windows 3.1/95/98/2000.
42+
43+
The "Code Page and Text Layout Conversion Utility" CONVTEXT.EXE ships with some
44+
versions of Office. It can convert from the various codepages to ANSI.
45+
46+
To produce a UTF16LE (1200) manifest, convert from the relevant codepage to ANSI
47+
and then convert from ANSI to "Unicode using Arabic ANSI Code Page".
48+
49+
Since there is no way to convert directly to unicode using the tool, CONVTEXT is
50+
useful only for the characters which exist in both the relevant codepage and in
51+
codepage 1256. There are various non-Microsoft sources which claim to document
52+
both codepages, but there is no way to verify the claim.
53+
54+
## EUC Traditional Chinese 51950
55+
56+
The raw NLS file C_51950.NLS supposedly exists, although there is no way for a US
57+
version of Windows to obtain the file. As with the Arabic Codepages, most likely
58+
the manifest is only available in Chinese versions of Windows 95/98/2000
59+
60+
### ISO 2022 Traditional Chinese 50229
61+
62+
Some sources claim 50229 is ISO-2022-TW and others claim it is ISO-2022-CN.
63+
64+
### EBCDIC Codepages 50930-50939
65+
66+
WHATWG claims that the supposed-EBCDIC codepages are really hybrids of ASCII (even
67+
though the Microsoft name suggests they should be the same as the originals)

README.md

+3-19
Original file line numberDiff line numberDiff line change
@@ -245,25 +245,9 @@ Note that MakeEncoding.cs deviates from unicode.org for some codepages. In the
245245
case of direct conflicts, unicode.org takes precedence. In cases where the
246246
unicode.org listing does not prescribe a value, MakeEncoding.cs value is used.
247247

248-
## Missing Codepages
249-
250-
The following codepages are not implemented. Normative references may not be
251-
available in all cases. Furthermore, other software packages are known to hack
252-
certain codepages (for example, Mozilla treats ASMO-708 as an alias of Arabic
253-
ISO-8869-6 when in fact there are many differences), so all implementations
254-
*should* be cleanroom when possible.
255-
256-
- 709 Arabic (ASMO-449+, BCON V4)
257-
- 710 Arabic - Transparent Arabic
258-
- 50229 ISO 2022 Traditional Chinese
259-
- 50930 EBCDIC Japanese (Katakana) Extended
260-
- 50931 EBCDIC US-Canada and Japanese
261-
- 50933 EBCDIC Korean Extended and Korean
262-
- 50935 EBCDIC Simplified Chinese Extended and Simplified Chinese
263-
- 50936 EBCDIC Simplified Chinese
264-
- 50937 EBCDIC US-Canada and Traditional Chinese
265-
- 50939 EBCDIC Japanese (Latin) Extended and Japanese
266-
- 51950 EUC Traditional Chinese
248+
NLS refers to the National Language Support files supplied in various versions of
249+
Windows. In older versions of Windows (e.g. Windows 98) these files followed the
250+
pattern `CP_#.NLS`, but newer versions use the pattern `C_#.NLS`.
267251

268252
## Sources
269253

bin/codepage.njs

+28-6
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ program
1010
.option('-t, --to-code <code>', 'codepage of output (default 65001 utf8)')
1111
.option('-o, --output <file>', 'output file (<file>.<to> if specified)')
1212
.option('-B, --bom', 'write BOM (for unicode codepages)')
13+
.option('-F, --force', 'force writing to stdout for non-utf8 codepages')
1314
.option('-l, --list', 'List supported codepages');
1415

1516
program.on('--help', function() {
@@ -25,8 +26,8 @@ if(program.list) {
2526
process.exit();
2627
}
2728

28-
var fr = program.fromCode || 65001;
29-
var to = program.toCode || 65001;
29+
var fr = +program.fromCode || 65001;
30+
var to = +program.toCode || 65001;
3031
var f = program.args[0];
3132
var o = program.output;
3233

@@ -53,10 +54,31 @@ function process_text(text) {
5354
65001: new Buffer([0xEF, 0xBB, 0xBF])
5455
}
5556

56-
if(!program.toCode && !o) console.log(dec.toString('utf8'));
57-
else if(!program.bom || !bom[fr]) fs.writeFileSync(o || (f + "." + to), codepage.utils.encode(to, dec));
57+
var mybom = (program.bom && bom[fr] ? bom[fr] : "");
58+
var out = to === 65001 ? dec.toString('utf8') : codepage.utils.encode(to, dec);
59+
60+
/* if output file is specified */
61+
if(o) writefile(o, out, mybom);
62+
/* utf8 -> print to stdout */
63+
else if(to === 65001) logit(out, mybom);
64+
/* stdout piped to process -> print */
65+
else if(!process.stdout.isTTY) logit(out, mybom);
66+
/* forced */
67+
else if(program.force) logit(out, mybom);
68+
/* input file specified -> write to file */
69+
else if(f !== "-") writefile(f + "." + to, out, mybom);
5870
else {
59-
fs.writeFileSync(o || (f + "." + to), bom[fr]);
60-
fs.appendFileSync(o || (f + "." + to), codepage.utils.encode(to, dec));
71+
console.error('codepage: use force (-F, --force) to print ' + to + ' codes');
72+
process.exit(14);
6173
}
6274
}
75+
76+
function logit(out, bom) {
77+
process.stdout.write(bom);
78+
process.stdout.write(out);
79+
}
80+
81+
function writefile(o, out, bom) {
82+
fs.writeFileSync(o, bom);
83+
fs.appendFileSync(o, out);
84+
}

codepage.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -719,7 +719,7 @@ describe('failures', function() {
719719
```json>package.json
720720
{
721721
"name": "codepage",
722-
"version": "1.3.5",
722+
"version": "1.3.6",
723723
"author": "SheetJS",
724724
"description": "pure-JS library to handle codepages",
725725
"keywords": [ "codepage", "iconv", "convert", "strings" ],

cpexcel.js

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

cptable.js

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

dist/cpexcel.full.js

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

dist/cpexcel.js

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

dist/cptable.full.js

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

dist/cptable.js

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)