Skip to content

Conversation

jukuisma
Copy link

Use higher level selectors module instead:

https://docs.python.org/3/library/selectors.html.

Selectors uses the most efficient implementation available on the current platform. On Linux, it defaults to using:

$ python3 -c "import selectors; print(selectors.DefaultSelector())"
<selectors.EpollSelector object at 0x7a6a66f02120>

Fixes: #97

Use higher level `selectors` module instead:

https://docs.python.org/3/library/selectors.html.

Selectors uses the most efficient implementation available on the
current platform. On Linux, it defaults to using:

```
$ python3 -c "import selectors; print(selectors.DefaultSelector())"
<selectors.EpollSelector object at 0x7a6a66f02120>
```
@jukuisma
Copy link
Author

jukuisma commented Sep 24, 2024

TODO: Benchmarking. I don't foresee this being slower than select.select(), but rather safe than sorry. Done here: #98 (comment)

@jukuisma
Copy link
Author

Benchmarking

vagrant@almalinux:~/test$ cat benchmark.py 
import exiftool

images = [ f"images/{i}" for i in range(512)]
with exiftool.ExifToolHelper() as et:
    print(et.get_metadata(images))

for i in range(512):
    print(exiftool.ExifToolHelper().get_metadata(f"images/{i}"))

Old:

vagrant@almalinux:~/test$ rpm -q python3-pyexiftool
python3-pyexiftool-0.5.6-1.el9.noarch
vagrant@almalinux:~/test$ python3 benchmark.py | md5sum
8cb4cc1e001e1abcfe6a4b4bae714288  -
vagrant@almalinux:~/test$ time python3 benchmark.py > /dev/null

real    0m48.851s
user    0m42.612s
sys     0m6.483s

New:

vagrant@almalinux:~/test$ rpm -q python3-pyexiftool
python3-pyexiftool-0.5.6-2.el9.noarch
vagrant@almalinux:~/test$ python3 benchmark.py | md5sum
8cb4cc1e001e1abcfe6a4b4bae714288  -
vagrant@almalinux:~/test$ time python3 benchmark.py > /dev/null

real    0m47.134s
user    0m41.186s
sys     0m6.156s

And just:

vagrant@almalinux:~/test$ cat benchmark.py 
import exiftool

images = [ f"images/{i}" for i in range(512)]
with exiftool.ExifToolHelper() as et:
    print(et.get_metadata(images))

# for i in range(512):
#     print(exiftool.ExifToolHelper().get_metadata(f"images/{i}"))

To get some rounds in, old:

vagrant@almalinux:~/test$ perf stat -r 10 python3 benchmark.py > /dev/null                                                                                                                     
                                                                                                                                                                                               
 Performance counter stats for 'python3 benchmark.py' (10 runs):                                                                                                                               
                                                                                                                                                                                               
           3450.78 msec task-clock:u                     #    1.009 CPUs utilized               ( +-  0.26% )                                                                                  
                 0      context-switches:u               #    0.000 /sec                                                                                                                       
                 0      cpu-migrations:u                 #    0.000 /sec                                                                                                                       
             11170      page-faults:u                    #    3.237 K/sec                       ( +-  1.23% )                                                                                  
       14374934023      cycles:u                         #    4.166 GHz                         ( +-  0.26% )                                                                                  
          17189602      stalled-cycles-frontend:u        #    0.12% frontend cycles idle        ( +-  2.14% )                                                                                  
          71707628      stalled-cycles-backend:u         #    0.50% backend cycles idle         ( +- 11.28% )                                                                                  
       21913527379      instructions:u                   #    1.52  insn per cycle                                                                                                             
                                                  #    0.00  stalled cycles per insn     ( +-  0.04% )                                                                                         
        4602999890      branches:u                       #    1.334 G/sec                       ( +-  0.04% )                                                                                  
                 0      branch-misses:u                                                                                                                                                        
                                                                                                                                                                                               
           3.41934 +- 0.00858 seconds time elapsed  ( +-  0.25% )                                                                                                                              

New:

vagrant@almalinux:~/test$ perf stat -r 10 python3 benchmark.py > /dev/null 

 Performance counter stats for 'python3 benchmark.py' (10 runs):

           3450.74 msec task-clock:u                     #    1.010 CPUs utilized               ( +-  0.13% )
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
             11305      page-faults:u                    #    3.276 K/sec                       ( +-  0.81% )
       14332278239      cycles:u                         #    4.153 GHz                         ( +-  0.11% )
          16894272      stalled-cycles-frontend:u        #    0.12% frontend cycles idle        ( +-  1.35% )
          72376828      stalled-cycles-backend:u         #    0.50% backend cycles idle         ( +- 15.40% )
       21919017057      instructions:u                   #    1.53  insn per cycle            
                                                  #    0.00  stalled cycles per insn     ( +-  0.03% )
        4604247475      branches:u                       #    1.334 G/sec                       ( +-  0.04% )
                 0      branch-misses:u                                                       

           3.41807 +- 0.00510 seconds time elapsed  ( +-  0.15% )

I'm seeing next to no difference between the two solutions on my alma9 VM. Testing larger images could be useful, but all our test images are small not to bloat the git repo. These 512 images are all: https://github.com/Digital-Preservation-Finland/file-scraper/blob/master/tests/data/image_jpeg/valid_2.2.1_exif_metadata.jpg.

@jukuisma jukuisma changed the title Draft: Avoid explicit calls to select.select() Avoid explicit calls to select.select() Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ValueError: filedescriptor out of range in select()

1 participant