-
Notifications
You must be signed in to change notification settings - Fork 48
Open
Labels
P2Medium priority issues to be scheduled in a future releaseMedium priority issues to be scheduled in a future release
Description
Summary
Identification gets stuck for over a minute on bright production DPX images at:
trying b'(?s)\\xff[\\xfa\\xfb\\xf2\\xf3][\\x10-\\xeb].{46,1439}\\xff[\\xfa\\xfb\\xf2\\xf3][\\x10-\\xeb].{46,1439}\\Z'at: https://raw.githubusercontent.com/openpreserve/fido/refs/heads/main/fido/conf/formats-v116.xml
$ curl -s https://raw.githubusercontent.com/openpreserve/fido/refs/heads/main/fido/conf/formats-v116.xml | grep -A 16 '<puid>fmt/134</puid>'
<puid>fmt/134</puid>
<mime>audio/mpeg</mime>
<name>MPEG 1/2 Audio Layer 3</name>
<version />
<alias>MP3</alias>
<pronom_id>687</pronom_id>
<extension>mp3</extension>
<apple_uti>public.mp3</apple_uti>
<signature>
<name>MPEG-1 Audio Layer 3 with ID3v2 Tag</name>
<note>Regularly-spaced frame headers should always be discoverable near EOF. An ID3v1 tag of up to 355 bytes may be present at EOF.</note>
<pattern>
<position>EOF</position>
<pronom_pattern>FFFB[10:EB]{46-1439}FFFB[10:EB]{46-1439}FFFB[10:EB]{46-1439}FFFB[10:EB]{46-1439}FFFB[10:EB]{46-1439}FFFB[10:EB]{46-1439}FFFB[10:EB]{47-1795}</pronom_pattern>
<regex>(?s)\xff\xfb[\x10-\xeb].{46,1439}\xff\xfb[\x10-\xeb].{46,1439}\xff\xfb[\x10-\xeb].{46,1439}\xff\xfb[\x10-\xeb].{46,1439}\xff\xfb[\x10-\xeb].{46,1439}\xff\xfb[\x10-\xeb].{46,1439}\xff\xfb[\x10-\xeb].{47,1795}\Z</regex>
</pattern>
<pattern>Brightness seems like a red herring, but actually causes this regex to partially match and makes it slow.
I'm looking at improving this but any and all feedback would be much appreciated.
Steps to reproduce
I have the production images and can reproduce this locally. I'm currently trying to generate similar images with random data that makes this regex slow, but haven't gotten it to work yet. I'll attach one such image and rough instructions to create it below:
# Create 4k all white png in GIMP
# Convert it to dpx:
$ ffmpeg -i white.png white.dpx
# Create 0x17bb00 * 0x10 bytes of random xxd formatted data:
$ cat rand.py
import os
def rand_line(offset):
"""
<hex-offset>: (f[0-f]<rand> <rand><rand> ){4}
"""
line = f"{offset:08x}: "
rand = os.urandom(4)
byte1 = f"{rand[0] | 0xf0:02x}"
byte2 = f"{rand[1]:02x}"
byte3 = f"{rand[2]:02x}"
byte4 = f"{rand[3]:02x}"
for i in range(4):
line += f"{byte1}{byte2} {byte3}{byte4} "
return f"{line[:-1]}\n"
with open("rand.xxd", "w") as outfile:
for i in range(0x17bb00):
offset = i * 0x10
if offset % 1024**2 == 0:
print(f"{offset:08x}/{0x17bb00*0x10:08x}\r", end="")
outfile.write(rand_line(offset))
# Create the binary
$ xxd -r rand.xxd > rand.bin
# Remove all image date from white.dpx
# i.e., all ff bytes after offset 0x680
# 00000680: ffff ffff ffff ffff ffff ffff ffff ffff ................
$ vim -b white.dpx
# Concatenate new random image data
$ cat white.dpx rand.bin > rand.dpxReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P2Medium priority issues to be scheduled in a future releaseMedium priority issues to be scheduled in a future release