fix: prefer message body fileName over Content-Disposition for non-ASCII filenames#379
Open
meijing0114 wants to merge 1 commit into
Open
fix: prefer message body fileName over Content-Disposition for non-ASCII filenames#379meijing0114 wants to merge 1 commit into
meijing0114 wants to merge 1 commit into
Conversation
…CII filenames (larksuite#364) Feishu returns raw UTF-8 bytes in the Content-Disposition header filename field. Node.js HTTP clients decode headers as Latin1 per HTTP/1.1 spec, producing garbled filenames for Chinese/CJK characters. The message body already contains the correct UTF-8 file_name extracted during the converter phase. This fix simply swaps the priority: res.fileName (message body) is now preferred over result.fileName (Content-Disposition header). Fixes larksuite#364
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Fixes #364
When downloading files with Chinese/CJK filenames from Feishu, the saved filenames are garbled (e.g.,
ã_ç_å_è_.pdfinstead of助英台.pdf).Root cause: Feishu returns raw UTF-8 bytes in the
Content-Dispositionheaderfilenamefield. Node.js HTTP clients decode headers as Latin1 per HTTP/1.1 spec, producing garbled strings for non-ASCII characters.Fix
The message body already contains the correct UTF-8
file_name, extracted during the converter phase intoResourceDescriptor.fileName. The fix is a one-line priority swap inmedia-resolver.ts:res.fileName(from message body, always correct UTF-8) is now preferred overresult.fileName(from Content-Disposition header, potentially garbled). The header value still serves as a fallback for cases where the message body doesn't include a filename (e.g., image messages).Why not fix the Content-Disposition parsing?
An earlier approach attempted Latin1→UTF-8 byte recovery on the header value, but that relies on heuristic detection (guessing whether a string is misencoded). Using the message body as the primary source is deterministic and zero-guesswork.