scraper

Apr 1, 2020

29c19f3 · Apr 1, 2020

Name	Name	Last commit message	Last commit date
parent directory ..
graphs	graphs	do not commit package-lock.json	Apr 1, 2020
README.md	README.md	add result of scrape	Jul 10, 2019
package.json	package.json	add scraper	Jul 10, 2019
scrape.js	scrape.js	add scraper	Jul 10, 2019

README.md

scraper

Simple tool to scrape all the data in dd.meteo.gc.ca.

On 2019-07-10, we indexed 15 millions files which have the following file extensions:

How to use

Requirements

Node for scraping
Redis for queuing and distribute work across multiple workers
CouchDB to index all the entries

Usage

To start scraping dd.meteo.gc.ca from its root '/', add an entry in the Redis queue: redis-cli -n 2 rpush url-0 / Then you start the scraper with COUCHDB_URL=http://username:password@localhost:5984 node scraper.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

scraper

scraper

README.md

scraper

How to use

Requirements

Usage

Files

scraper

Directory actions

More options

Directory actions

More options

Latest commit

History

scraper

Folders and files

parent directory

README.md

scraper

How to use

Requirements

Usage