Skip to content

Latest commit

 

History

History
61 lines (36 loc) · 1.96 KB

README.md

File metadata and controls

61 lines (36 loc) · 1.96 KB

Web Scraping Tools

This is intended to grow into a library of useful data scraping tools scripted in Python.

Index

Images

Video and audio

These scripts are mainly based on the ffmpeg and youtube-dl libraries. Please note that youtube-dl can be very slow for downloading (~50kb/s). The new yt-dlp library is a fork of youtube-dl and features improved performance (~5Mip/s downloads) and additionnal tools. More on these libraries in the links below.

Prerequisites

Requirements

  • git
    • You'll know you did it right if you can run git --version and you see a response like git version x.x.x

Setup

Clone this repo

git clone https://github.com/VidiHawk/web-scraping-tools

cd <your project's file>

Then install dependencies

pip install -r requirements.txt

Adding your own tools

If you want to add packages to the requirement.txt file, I recommand using the pipreqs package. To install it:

pip install pipreqs

To build automatically your requirements.txt, just run the following command in the project directory:

pipreqs . --force

The --force flag will overwrite the existing requirements.txt file.

Notes

These scripts have been created and tested on the Ubuntu 20.04.4 LTS operating system and Python 3.8.10

Acknoledgements