Faster page downloads

You can use concurrent.futures' ThreadPoolExecutor to significantly speed up page downloads. It uses a pool of threads to execute calls asynchronously 💪

Here I adapted the ThreadPoolExecutor example from the concurrent.futures module docs and added a little command line flag to compare downloading 10 Pybites articles synchronously version asynchronously.

Notice: making more requests you probably want to limit the number of workers / threads for rate limiting (and to be respectful).

The code (first pip install feedparser requests into a virtual environment):

import concurrent.futures
from pathlib import Path
import sys

import feedparser
import requests

URLS = [
    entry.link for entry in
    feedparser.parse("https://pybit.es/feed/").entries
]


def _download_page(url):
    resp = requests.get(url)
    with open(Path(url).stem, "wb") as f:
        f.write(resp.content)


def download_urls_sync(urls=URLS):
    for url in urls:
        _download_page(url)


def download_urls_async(urls=URLS):
    with concurrent.futures.ThreadPoolExecutor(
        max_workers=32
    ) as executor:
        future_to_url = {
            executor.submit(_download_page, url): url
            for url in urls
        }
        for future in concurrent.futures.as_completed(
            future_to_url
        ):
            future_to_url[future]


if __name__ == "__main__":
    if len(sys.argv) > 1 and sys.argv[1] == "async":
        download_urls_async()
    else:
        download_urls_sync()

Running this code locally I get an almost 4X performance improvement:

$ time python download.py async
python download.py async  0.31s user 0.08s system 22% cpu 1.719 total

$ time python download.py
python download.py  0.29s user 0.05s system 5% cpu 6.416 total

#threading #concurrent

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

20230110130247.md

20230110130247.md

Faster page downloads

Files

20230110130247.md

Latest commit

History

20230110130247.md

File metadata and controls

Faster page downloads