Web Scraping Tools

This is intended to grow into a library of useful data scraping tools scripted in Python.

Index

Images

Instagram images, comments, and data scraping
Get photos from Google Maps based on coordinates
Search and download photos from Google Image

Video and audio

Convert video files to audio files
Download videos from YouTube, Twitch, Vimeo, etc.
Download audio file from Youtube

These scripts are mainly based on the ffmpeg and youtube-dl libraries. Please note that youtube-dl can be very slow for downloading (~50kb/s). The new yt-dlp library is a fork of youtube-dl and features improved performance (~5Mip/s downloads) and additionnal tools. More on these libraries in the links below.

Prerequisites

python
pip

Requirements

git
- You'll know you did it right if you can run git --version and you see a response like git version x.x.x

Setup

Clone this repo

git clone https://github.com/VidiHawk/web-scraping-tools

cd <your project's file>

Then install dependencies

pip install -r requirements.txt

Adding your own tools

If you want to add packages to the requirement.txt file, I recommand using the pipreqs package. To install it:

pip install pipreqs

To build automatically your requirements.txt, just run the following command in the project directory:

pipreqs . --force

The --force flag will overwrite the existing requirements.txt file.

Notes

These scripts have been created and tested on the Ubuntu 20.04.4 LTS operating system and Python 3.8.10

Acknoledgements

youtube_dl
yt_dlt
ffmpeg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Web Scraping Tools

Index

Images

Video and audio

Prerequisites

Requirements

Setup

Adding your own tools

Notes

Acknoledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Web Scraping Tools

Index

Images

Video and audio

Prerequisites

Requirements

Setup

Adding your own tools

Notes

Acknoledgements