You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
//////////// If the code reaches here, no BOM/signature was found, so now
55
+
//////////// we need to 'taste' the file to see if can manually discover
56
+
//////////// the encoding. A high taster value is desired for UTF-8
57
+
if(taster==0||taster>b.Length)taster=b.Length;// Taster size can't be bigger than the filesize obviously.
58
+
59
+
60
+
// Some text files are encoded in UTF8, but have no BOM/signature. Hence
61
+
// the below manually checks for a UTF8 pattern. This code is based off
62
+
// the top answer at: https://stackoverflow.com/questions/6555015/check-for-invalid-utf8
63
+
// For our purposes, an unnecessarily strict (and terser/slower)
64
+
// implementation is shown at: https://stackoverflow.com/questions/1031645/how-to-detect-utf-8-in-plain-c
65
+
// For the below, false positives should be exceedingly rare (and would
66
+
// be either slightly malformed UTF-8 (which would suit our purposes
67
+
// anyway) or 8-bit extended ASCII/UTF-16/32 at a vanishingly long shot).
68
+
inti=0;
69
+
boolutf8=false;
70
+
while(i<taster-4){
71
+
if(b[i]<=0x7F){i+=1;continue;}// If all characters are below 0x80, then it is valid UTF8, but UTF8 is not 'required' (and therefore the text is more desirable to be treated as the default codepage of the computer). Hence, there's no "utf8 = true;" code unlike the next three checks.
publicconststringDescription=@"An extension that allows you to easily convert the encoding of multiple files.";
12
+
publicconststringDescription=@"An extension that allows you to easily convert the encoding of multiple files. (System Encoding / UTF-8 with / without BOM)";
<Descriptionxml:space="preserve">An extension that allows you to easily convert the encoding of multiple files.</Description>
6
+
<Descriptionxml:space="preserve">An extension that allows you to easily convert the encoding of multiple files. (System Encoding / UTF-8 with / without BOM)</Description>
0 commit comments