This project is a web scraping application designed to scrape and store data efficiently using Docker, Redis, and MySQL. It uses Puppeteer for web scraping tasks and integrates with a WebSocket server to provide real-time notifications and notification counts upon connection.
- Web scraping using Puppeteer.
- Real-time notifications via WebSocket.
- Efficient data storage with MySQL.
- Redis for fast in-memory operations.
- Puppeteer for scrapping
- Fully containerized setup with Docker and Docker Compose.
- Docker and Docker Compose installed.
- Node.js and npm (if running locally without Docker).
git clone https://github.com/navaljangir/newsWebScrapAndNotification.git
cd newsWebScrapAndNotification
cp .env.example .env
Update the .env
file based on your environment:
For Localhost:
# MySQL Configuration
DB_HOST=localhost
DB_USER=root
DB_PASSWORD=your_db_password
DB_NAME=hacker_news
PORT=3000
REDIS_URL="redis://localhost:6379"
For Docker:
# MySQL Configuration
DB_HOST=mysql_scrap
DB_USER=root
DB_PASSWORD=your_db_password
DB_NAME=hacker_news
PORT=3000
REDIS_URL="redis://redis_scrap:6379"
-
Using Docker:
docker-compose up
-
Running Locally:
Ensure MySQL and Redis are running on your local system, then start the application: or start a redis docker container
npm install npm run dev
Open your browser and navigate to:
http://localhost:3000
The application connects to a WebSocket server at ws://localhost:3000
for real-time notifications. Upon initial connection, it sends the current notification count and continues to push updates for new notifications.
-
Establish a connection to the WebSocket server:
const ws = new WebSocket('ws://localhost:3000');
Ensure the Docker daemon is running before executing the commands below.
-
Clone the repository and configure the
.env
file as described above. -
Run the following command to start all services:
docker-compose up
-
To verify the containers are running, use:
docker ps
-
To stop all services:
docker-compose down
- Puppeteer Image:
nvlkishor/pptr:latest
- Docker Hub Repository: nvlkishor/pptr
You can customize service names, ports, or credentials by editing the .env
file or the docker-compose.yml
file as needed.
- Environment Variables: Ensure correct configuration of
.env
for smooth operation. - Database Migrations: Ensure MySQL is properly set up and connected before scraping starts.
- Redis: Used for managing in-memory data efficiently.
Contributions are welcome! Feel free to submit issues or create pull requests to improve the project.