Skip to content

asyncio download rewrite

Kiss, György edited this page Oct 26, 2018 · 1 revision

Here is what happened when I rewrote the downloading part with ThreadPoolExecutor to asyncio:

  • The code got somewhat more verbose and complicated in some places, because async is more verbose by itself and also because mixing and matching synchronous code with async code (loop) is not always easy/trivial.
  • The cancellation (CTRL-C) works reliable every time, there is no Exception and don't need to press twice!
  • The download got faster, see The Measurements
  • The progress bar is waaay more accurate, because during the rewrite, I could fine tune it to step by every single finished download. Before I set it at the end of every max_threads batch.
  • Only the download relevant code has to be rewritten to async (download_rss and download_episodes), the event loop can be activated any time with loop.run_until_complete mixing and matching sync and async Python code comfortably.

Results for download speed measurements

  • Async NO grouper:
    podcast-dl -v talkpython -d talkpython-async 79,11s user 37,11s system 55% cpu 3:30,37 total (1 failed download)

  • Async with grouper (10 length groups):
    podcast-dl -v talkpython -d talkpython-async2 87,30s user 42,23s system 44% cpu 4:52,98 total

  • Async with semaphore (10 length):
    podcast-dl -v talkpython 76,68s user 36,38s system 50% cpu 3:45,16 total
    podcast-dl -p talkpython 79,17s user 38,73s system 53% cpu 3:39,14 total
    podcast-dl -p talkpython 79,63s user 38,53s system 53% cpu 3:42,92 total

  • Sync with (10 threads):
    podcast-dl -v talkpython -d talkpython-sync 59,53s user 46,14s system 31% cpu 5:32,59 total

Clone this wiki locally