You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If a chat message in a VOD contains a non-ASCII character (any 2-bytes UTF-8 symbol for example) then emotes[].name field of message JSON from the library parsed wrongly.
(I've patcher the library with temporarily debugging by prints to see the raw GQL content for the message mapper (chat_downloader.sites.twitch.TwitchChatDownloader._parse_message_info()))
Twitch GQL uses byte positioning as the beginning and the end of an emote code inside the chat text, so for non-ASCII characters the byte form of Python string should be used as the source of applying locations.
Basic information
Describe the bug
If a chat message in a VOD contains a non-ASCII character (any 2-bytes UTF-8 symbol for example) then
emotes[].name
field ofmessage
JSON from the library parsed wrongly.Command/Code used
chat_downloader --start_time 05:58:28 --end_time 05:58:30 --output test.jsonl --testing 'https://www.twitch.tv/videos/2184933543'
-v
):chat_downloader --start_time 05:58:28 --end_time 05:58:30 --output test.jsonl --testing 'https://www.twitch.tv/videos/2184933543'
(I've patcher the library with temporarily debugging by
print
s to see the raw GQL content for themessage
mapper (chat_downloader.sites.twitch.TwitchChatDownloader._parse_message_info()
))Actual content of
test.jsonl
(prettified)Expected content of
test.jsonl
(prettified)name
field of the emote should be filled:Additional context/information
Twitch GQL uses byte positioning as the beginning and the end of an emote code inside the chat text, so for non-ASCII characters the byte form of Python string should be used as the source of applying
locations
.The fix is straightforward:
instead of
chat-downloader/chat_downloader/sites/twitch.py
Line 258 in 94ed3fe
The text was updated successfully, but these errors were encountered: