Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Japanese Windows 2byte code #880

Open
IamKojiFurukawa opened this issue Jan 16, 2025 · 6 comments
Open

Japanese Windows 2byte code #880

IamKojiFurukawa opened this issue Jan 16, 2025 · 6 comments

Comments

@IamKojiFurukawa
Copy link

IamKojiFurukawa commented Jan 16, 2025

能率.csv
**Describe the bug**
A concise description of the problem you found with cloc.
Using with Japanese Windows 2byte code file name.
cloc; OS; OS version

  • cloc version:2.02
  • If running the cloc source, Perl version:
  • OS (eg Linux, Windows, macOS, etc): Windows
  • OS version: 11h2

To Reproduce
Steps one can follow reproduce the behavior you're seeing.
cloc.exe 能率.c
cloc.exe 日本語\source

included japanese 2byte code in path

Expected result
A concise description of what you expected to happen.
The cloc command can not find source files.

Additional context
Add any other context about the problem here.

Hello
The cloc command is very useful.
On Japanese Windows, there seems to be a problem if the path name is Japanese double byte code.
Do not convert between uppercase and lowercase characters. Do not convert between backslashes and slashes.
The <: encoding (shiftjis)
#ex: open_file("<:encoding(shiftjis)", $file, 1);

Japanese windows use Shift-JIS code for file path
#import use Encode qw(encode decode);
#ex. open mode: open(my $fh,"<:encoding(shiftjis)", $file);
#do not this# $file =~ s{\}{/}g if $ON_WINDOWS;
#this is not good# $upper_lower_map{$lc} = $file;

Please fix it.
Thank you.
Koji Furukawa

@AlDanial
Copy link
Owner

This problem will be difficult for me to fix since I won't be able to test the correctness of my changes.

Can you install a Perl interpreter (eg https://strawberryperl.com/) on your computer? If yes, I can send you my code changes and you can test for me.

@IamKojiFurukawa
Copy link
Author

Thank you responce, I have installed that already.
I can use pp command.
Best Regards,

AlDanial added a commit that referenced this issue Jan 20, 2025
eg "cloc --encoding shiftjis" to properly handle Japanese file paths
on Windows
@AlDanial
Copy link
Owner

Let's tackle this in steps. At the moment cloc still does / -> \ and uses the upper_lower_map. However I added a new option to take an arbitrary encoding so this
cloc.exe --encoding shiftjis 能率.c
should work on Windows with 3f7257f.

Also, if it is possible, please see if the original cloc works from the Windows Subsystem for Linux on your computer. On my Linux computer there are no problems with Japanese file names or directories:

» find . -type f
./能.c
./日本語/能率.c


» cloc --by-file .        
       2 text files.
       2 unique files.                              
       0 files ignored.

github.com/AlDanial/cloc v 2.03  T=0.00 s (443.8 files/s, 1109.4 lines/s)
----------------------------------------------------------------------------------
File                                blank        comment           code
----------------------------------------------------------------------------------
./日本語/能率.c                    0              0              2
./能.c                                 1              0              2
----------------------------------------------------------------------------------
SUM:                                    1              0              4
----------------------------------------------------------------------------------

@IamKojiFurukawa
Copy link
Author

Thank you for updateing.
I will chek with Japanese Windows.
Regards.
Koji Furukawa

@IamKojiFurukawa
Copy link
Author

I have tested with --encodeing shiftjis
it dose not work correctry.

doble byte char file path has special value 0x80 to 0xff
it needs 1st char is ASCII code or not.
don't replace "" to "/" and lower case upper case in double byte code as below.

insted code of $file =~ s{\}{/}g if $ON_WINDOWS;

$file =~ s{
(\) #
|
([^\x00-\x7F]) # double byte char ( not ASCII )
}{
$1 ? '/' : $2 # single byte char
}egx if $ON_WINDOWS;

I did debug , but not finished.
I will try more..
Thank you .

@AlDanial
Copy link
Owner

Let's start with the basics. Save this small program to your Windows computer and let me know if it can read the problematic files you mentioned above:

#!/usr/bin/env perl
use warnings;
use strict;
foreach my $in_file (@ARGV) {
    my $fh;
    open($fh,"<:encoding(shiftjis)",$in_file) or die "Can't open $in_file: $!";
    while (<$fh>) {
        print "$in_file -> $_";
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants