Skip to content

Using .gitattributes to encode file as IBM-1047 results in some characters being converted incorrectly #63

@zsw007

Description

@zsw007

Description

The way I have it set up is that the file is stored as UTF-8 in git, and I'm using .gitattributes to set the working tree encoding to ibm-1047. When cloning the repo, git converts and tags the file as IBM-1047. The issue is that for some characters such as ® (the registered symbol), it appears as ▒ after the conversion.

Reproduce

  1. Create a new repository.
  2. Create file.txt containing the ® character.
  3. Create a .gitattributes file containing either file.txt zos-working-tree-encoding=ibm-1047 git-encoding=iso8859-1 or file.txt working-tree-encoding=ibm-1047
  4. Clone the repository.
  5. Read file.txt using vim or cat. It will display ▒ instead of ®

Additional info

Upon examination of the file with a hex editor, it appears that it's converting the ® character from C2 AE to AF which results in it being unreadable. Whereas for it to be readable, it would have to be 62 AF.

This is consistent with the behavior when using iconv to convert from UTF-8 to IBM-1047.

Whereas converting from ISO8859-1 to IBM-1047 seems to result in the correct conversion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions