A simple Web Crawler which is intended to crawl through several research paper publishing websites and extract relevant data. The extraction will be done from manuscripts available online in pdf form. The query will consist of the keywords which are to be matched with the titles of the research articles. The result of the search will contain the Name of the Author, Affiliation and the email address.
Scrapy is a free and open-source web-crawling framework written in Python. Originally designed for web scraping, it can also be used to extract data using APIs or as a general-purpose web crawler. Selenium is an open-source web-based automation tool. Selenium can send the standard Python commands to different browsers, despite variation in their browser's design. Chrome Driver for the version of your browser. https://chromedriver.chromium.org/downloads-
Notifications
You must be signed in to change notification settings - Fork 0
typecaster/Chrome-Based-Web-Scraper
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
A simple Web Crawler which is intended to crawl through several research paper publishing websites and extract relevant data.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published