You can use concurrent.futures' ThreadPoolExecutor
to significantly speed up page downloads. It uses a pool of threads to execute calls asynchronously 💪
Here I adapted the ThreadPoolExecutor
example from the concurrent.futures
module docs and added a little command line flag to compare downloading 10 Pybites articles synchronously version asynchronously.
Notice: making more requests you probably want to limit the number of workers / threads for rate limiting (and to be respectful).
The code (first pip install feedparser requests
into a virtual environment):
import concurrent.futures
from pathlib import Path
import sys
import feedparser
import requests
URLS = [
entry.link for entry in
feedparser.parse("https://pybit.es/feed/").entries
]
def _download_page(url):
resp = requests.get(url)
with open(Path(url).stem, "wb") as f:
f.write(resp.content)
def download_urls_sync(urls=URLS):
for url in urls:
_download_page(url)
def download_urls_async(urls=URLS):
with concurrent.futures.ThreadPoolExecutor(
max_workers=32
) as executor:
future_to_url = {
executor.submit(_download_page, url): url
for url in urls
}
for future in concurrent.futures.as_completed(
future_to_url
):
future_to_url[future]
if __name__ == "__main__":
if len(sys.argv) > 1 and sys.argv[1] == "async":
download_urls_async()
else:
download_urls_sync()
Running this code locally I get an almost 4X performance improvement:
$ time python download.py async
python download.py async 0.31s user 0.08s system 22% cpu 1.719 total
$ time python download.py
python download.py 0.29s user 0.05s system 5% cpu 6.416 total
#threading #concurrent