Skip to content

A simple Web Crawler which is intended to crawl through several research paper publishing websites and extract relevant data.

Notifications You must be signed in to change notification settings

typecaster/Chrome-Based-Web-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Chrome-Based-Web-Scraper

A simple Web Crawler which is intended to crawl through several research paper publishing websites and extract relevant data. The extraction will be done from manuscripts available online in pdf form. The query will consist of the keywords which are to be matched with the titles of the research articles. The result of the search will contain the Name of the Author, Affiliation and the email address.

Tools :

Scrapy:

Scrapy is a free and open-source web-crawling framework written in Python. Originally designed for web scraping, it can also be used to extract data using APIs or as a general-purpose web crawler.

Selenium:

Selenium is an open-source web-based automation tool. Selenium can send the standard Python commands to different browsers, despite variation in their browser's design.

Required:

Chrome Driver for the version of your browser. https://chromedriver.chromium.org/downloads

About

A simple Web Crawler which is intended to crawl through several research paper publishing websites and extract relevant data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages