GitHub - wildercb/Internet-Interpreter: A wayback machine scraper, datastore and NLP evaluations for interpreting trends across sites stored within the wayback machine.

How to interact with this project

In this project we go over using the wayback machine as a source of historical data from the internet. We provide a wayback machine scraper to gather data from dates to another from certain sites. As well as a data pipeline to view reports about the state of the internet or those sites from the data collected.

Gather data from wayback machine

Scrape from wayback.py:

Use jupyter notebook or python scrapeFromWayback(url, datestamp from, datestamp to, output file name, elements_to_scrape, max requests) Example: python scrapeFromWayback.py reddit.com 20161122 2017 titles '[{"tag": "p", "class": "title", "id": null}, {"tag": "div", "class": "content"}, {"tag": "span"}]' 20

Analyze and report on the content

Clean Titles.ipynb

Data cleaning operations

Nlp.ipynb

Using nlp insights to gather insights from data and display in readable graphs

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Project		Project
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

How to interact with this project

Gather data from wayback machine

Scrape from wayback.py:

Analyze and report on the content

Clean Titles.ipynb

Nlp.ipynb

About

Uh oh!

Releases

Packages

Languages

wildercb/Internet-Interpreter

Folders and files

Latest commit

History

Repository files navigation

How to interact with this project

Gather data from wayback machine

Scrape from wayback.py:

Analyze and report on the content

Clean Titles.ipynb

Nlp.ipynb

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages