🕷️ Scrapy-MongoDB Web Scraper

Effortlessly scrape the web using just a few keywords!

Save the existing PDFs in a separate folder and convert the data stored in the MongoDB database into PDFs or json.

📖 Table of Contents

Overview
Setup Guide

📝 Overview

This guide walks you through setting up a web scraping environment using Scrapy and MongoDB on Ubuntu/Debian Linux systems. With just a few keywords, you'll be able to scrape the web and store the results in a MongoDB database.

🛠️ Setup Guide

1. Create and Activate a Virtual Environment

python3 -m venv scraper
source scraper/bin/activate

2. Install Required Python Packages

pip install -r requirements.txt

3. Install MongoDB

Step 1: Import the MongoDB Public Key

wget -qO - https://www.mongodb.org/static/pgp/server-6.0.asc | sudo apt-key add -

Step 2: Create the MongoDB Source List File

echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-6.0.list

Step 3: Update the Package Database

sudo apt-get update

Step 4: Install MongoDB

sudo apt-get install -y mongodb-org

4. Create Data Directory for MongoDB

mkdir -p ~/path/to/your/project/data/db

5. Start the MongoDB Server

mongod --dbpath ~/path/to/your/project/data/db

6. (Optional) Configure MongoDB for WSL

If you are using WSL, you may want to specify a different port or use configuration files:

mongod --dbpath ~/data/db --bind_ip 0.0.0.0 --port 27017

7. Connect to the MongoDB Server

Open a new terminal window and connect to the MongoDB server:

mongo

If you encounter an error, run:

sudo apt update
sudo apt install mongodb-clients

8. Run the Scrapy Spider

cd webcrawler
scrapy crawl spider -a keywords="climate change" # You can add whatever keyword you want to scrape for

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data/db		data/db
webcrawler		webcrawler
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🕷️ Scrapy-MongoDB Web Scraper

📖 Table of Contents

📝 Overview

🛠️ Setup Guide

1. Create and Activate a Virtual Environment

2. Install Required Python Packages

3. Install MongoDB

Step 1: Import the MongoDB Public Key

Step 2: Create the MongoDB Source List File

Step 3: Update the Package Database

Step 4: Install MongoDB

4. Create Data Directory for MongoDB

5. Start the MongoDB Server

6. (Optional) Configure MongoDB for WSL

7. Connect to the MongoDB Server

8. Run the Scrapy Spider

About

Releases

Packages

Languages

License

ACSE-vg822/Scrapy-MongoDB-Web-Scraper

Folders and files

Latest commit

History

Repository files navigation

🕷️ Scrapy-MongoDB Web Scraper

📖 Table of Contents

📝 Overview

🛠️ Setup Guide

1. Create and Activate a Virtual Environment

2. Install Required Python Packages

3. Install MongoDB

Step 1: Import the MongoDB Public Key

Step 2: Create the MongoDB Source List File

Step 3: Update the Package Database

Step 4: Install MongoDB

4. Create Data Directory for MongoDB

5. Start the MongoDB Server

6. (Optional) Configure MongoDB for WSL

7. Connect to the MongoDB Server

8. Run the Scrapy Spider

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages