This repository holds examples on mining web data implemented by me.
Currently, cases are:
- Requesting a .csv using python
requestsand scheduling it periodically usingcronservices - Consuming league of legends API (Expiring token example)
I'm currently working on providing examples for:
- Scraping a static page with
beautifulSoup - Orchestrating a whole website scraping with
scrapy - Consuming hidden APIs by investigating network traffic
- Using a proxy to bypass ip restrictions
- Scraping dynamic webpages with Selenium
Each folder has a README.md file with instructions for mining a specific case.
For simplyfing things, an unique file containing all projects libraries is available at requirements.txt.
Before running a script, create a virtual environment, enter it and install requirements:
python3 -m venv env
source env/bin/activate
pip install -r requirements.txtThis is only necessary once.
Then, enter the individual folder and read further instructions on README.md. For example, project 0 can be run with:
cd 0_schedule_database_download/
python script.pyHowever it also uses crontab to schedule the script periodically. Read its README to further instructions on how to set it.