-
Notifications
You must be signed in to change notification settings - Fork 24
Description
We are doing digital preservation. In some cases we are scraping metadata from thousands of image files in the same python process. As far as I understand, pyexiftool handles multiple files in the -stay_open
mode. We are seeing the ValueError: filedescriptor out of range in select()
error a lot in production.
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/exiftool/exiftool.py", line 812, in run
self._ver = self._parse_ver()
File "/usr/lib/python3.9/site-packages/exiftool/exiftool.py", line 1199, in _parse_ver
return self.execute("-ver").strip()
File "/usr/lib/python3.9/site-packages/exiftool/helper.py", line 132, in execute
result: Union[str, bytes] = super().execute(*str_bytes_params, **kwargs)
File "/usr/lib/python3.9/site-packages/exiftool/exiftool.py", line 1009, in execute
raw_stdout = _read_fd_endswith(fdout, seq_ready.encode(self._encoding), self._block_size)
File "/usr/lib/python3.9/site-packages/exiftool/exiftool.py", line 142, in _read_fd_endswith
inputready, outputready, exceptready = select.select([fd], [], [])
ValueError: filedescriptor out of range in select()
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/bin/check-sip-digital-objects-3", line 8, in <module>
sys.exit(main())
File "/usr/lib/python3.9/site-packages/ipt/scripts/check_sip_digital_objects.py", line 39, in main
report = validation_report(
File "/usr/lib/python3.9/site-packages/ipt/scripts/check_sip_digital_objects.py", line 448, in validation_report
for result in validation(mets_path=mets_path, catalog_path=catalog_path):
File "/usr/lib/python3.9/site-packages/ipt/scripts/check_sip_digital_objects.py", line 356, in validation
yield _validate(metadata_info)
File "/usr/lib/python3.9/site-packages/ipt/scripts/check_sip_digital_objects.py", line 328, in _validate
scraper_result, streams, grade = check_well_formed(
File "/usr/lib/python3.9/site-packages/ipt/scripts/check_sip_digital_objects.py", line 174, in check_well_formed
(mime, version) = scraper.detect_filetype()
File "/usr/lib/python3.9/site-packages/file_scraper/scraper.py", line 242, in detect_filetype
self._identify()
File "/usr/lib/python3.9/site-packages/file_scraper/scraper.py", line 77, in _identify
self._update_filetype(exiftool_detector)
File "/usr/lib/python3.9/site-packages/file_scraper/scraper.py", line 89, in _update_filetype
tool.detect()
File "/usr/lib/python3.9/site-packages/file_scraper/detectors.py", line 339, in detect
with exiftool.ExifToolHelper() as et:
File "/usr/lib/python3.9/site-packages/exiftool/exiftool.py", line 317, in __enter__
self.run()
File "/usr/lib/python3.9/site-packages/exiftool/helper.py", line 150, in run
super().run()
File "/usr/lib/python3.9/site-packages/exiftool/exiftool.py", line 816, in run
raise ExifToolVersionError(f"Error retrieving Exiftool info. Is your Exiftool version ('exiftool -ver') >= required version ('{constants.EXIFTOOL_MINIMUM_VERSION}')?")
exiftool.exceptions.ExifToolVersionError: Error retrieving Exiftool info. Is your Exiftool version ('exiftool -ver') >= required version ('12.15')?
If you happen to be interested in the check-sip-digital-objects(-3) command seen in the backtrace, that's here: https://github.com/Digital-Preservation-Finland/dpres-ipt
And our scraping tool is here: https://github.com/Digital-Preservation-Finland/file-scraper/
We are running version 0.5.5 that we packaged ourselves. It seems that 0.5.6 does not change anything related to this issue.
Exiftool is 12.70.
Someone has reported this same issue here earlier: https://exiftool.org/forum/index.php?topic=11067.0
man 2 select
says
WARNING: select() can monitor only file descriptors numbers that are less than FD_SETSIZE (1024)—an unreasonably low limit for many modern applications—and this limitation will not change. All modern applications should instead use poll(2) or epoll(7), which do not suffer this limitation.