A comprehensive Remotely Operated Vehicle (ROV) system with real-time object detection, tracking, and control capabilities. This project integrates embedded systems (ESP8266, ESP32S3), computer vision (YOLOv8), and a modern web interface for complete ROV operation.
- Overview
- System Architecture
- Features
- Project Structure
- Hardware Requirements
- Software Requirements
- Installation
- Configuration
- Usage
- API Documentation
- Troubleshooting
- Contributing
- License
This project implements a complete ROV control and monitoring system that combines:
- Embedded Control: ESP8266-based motor and servo control
- Video Streaming: ESP32S3 camera module for live video feed
- Object Detection: Real-time YOLOv8 inference with TensorRT acceleration
- Object Tracking: Multi-object tracking using Norfair with Kalman filtering
- Web Interface: React-based control dashboard with real-time visualization
- Data Logging: Automatic detection logging with session management
The system is designed for real-time operation with low latency, making it suitable for applications requiring immediate feedback and control.
┌─────────────────────────────────────────────────────────────┐
│ React Frontend (Web UI) │
│ - Control Interface - Detection Charts - Camera Feed │
└───────────────────────┬─────────────────────────────────────┘
│ HTTP/WebSocket
┌───────────────────────▼─────────────────────────────────────┐
│ FastAPI Backend (rov_backend.py) │
│ - Command Routing - WebSocket Bridge - Log Management │
└───────┬───────────────────────────────┬────────────────────┘
│ │
│ WebSocket │ HTTP
│ │
┌───────▼──────────┐ ┌──────────▼──────────────┐
│ ESP8266 Motor │ │ ESP32S3 Camera Module │
│ Controller │ │ (Video Stream Server) │
└──────────────────┘ └──────────┬───────────────┘
│ MJPEG Stream
┌──────────▼───────────────┐
│ Object Detection │
│ (camera_detector.py) │
│ - YOLOv8 TensorRT │
│ - Norfair Tracking │
│ - Detection Logging │
└──────────────────────────┘
- Control Flow: User → React UI → FastAPI → ESP8266 → Motors/Servos
- Video Flow: ESP32S3 → MJPEG Stream → Object Detection → Annotated Video
- Data Flow: Object Detection → Log File → FastAPI → React UI (Charts)
- Real-time Joystick Control: 8-directional movement with adjustable speed
- Path Planning: Visual grid-based path planner with automatic execution
- Pan/Tilt Camera Control: Interactive control for camera positioning
- Movement Settings: Configurable forward/backward and turn speeds/durations
- Button Controls: Direct forward, backward, left, right, and stop commands
- Real-time Object Detection: YOLOv8 model with TensorRT acceleration
- Multi-Object Tracking: Persistent tracking across frames using Norfair
- Detection Logging: Automatic logging of detected objects with timestamps
- Session Management: Organize detections into measurement sessions
- Visualization: Pie charts showing detection statistics by object type
- Line Crossing Detection: Tracks objects crossing defined vertical boundaries
- Draggable UI Cards: Customizable dashboard layout
- Live Camera Feed: MJPEG stream display with configurable URL
- Real-time Statistics: FPS, latency, and detection counts
- WebSocket Telemetry: Real-time status updates from ROV
- Responsive Design: Works on desktop and mobile devices
ROV-Real-Time-Object-Detection/
│
├── ARDUINO/ # Embedded firmware
│ ├── ESP8266/ # Motor and servo controller
│ │ └── sketch_apr2a/
│ │ └── sketch_apr2a.ino # Main control firmware
│ │
│ └── XIAO ESP32S3/ # Camera module
│ └── CameraWebServer/
│ ├── CameraWebServer.ino # Camera server firmware
│ ├── app_httpd.cpp # HTTP server implementation
│ ├── camera_pins.h # Camera pin definitions
│ └── partitions.csv # ESP32 partition table
│
├── Object detection/ # Computer vision module
│ ├── camera_detector.py # Main detection script
│ ├── yolo12n.engine # TensorRT model (generated)
│ ├── detections_log.txt # Detection log file
│ └── package.json # Node dependencies (for charts)
│
├── REACT+API/ # Web application
│ ├── rov_backend.py # FastAPI backend server
│ └── rov_frontend/ # React frontend
│ ├── src/
│ │ ├── App.js # Main application component
│ │ ├── DetectionPieChart.jsx # Detection visualization
│ │ ├── Animations/ # UI animation components
│ │ └── Backgrounds/ # Background effects
│ ├── public/ # Static assets
│ └── package.json # Frontend dependencies
│
└── LICENSE # GPL v3 License
For detailed information about each component, see:
- ARDUINO/README.md - Embedded firmware documentation
- Object detection/README.md - Detection system documentation
- REACT+API/README.md - Web application documentation
- ESP8266 Development Board (e.g., NodeMCU, Wemos D1 Mini)
- Motor Driver (L298N or similar)
- 2x DC Motors for movement
- 2x Servo Motors for pan/tilt camera mount
- Power Supply (7-12V for motors, 5V for ESP8266)
- ESP32S3 Development Board (XIAO ESP32S3 or similar)
- Camera Module compatible with ESP32 (OV2640, OV3660, or OV5640)
- PSRAM (recommended for better performance)
- Computer with:
- NVIDIA GPU (for TensorRT acceleration)
- CUDA Toolkit 11.0+
- Python 3.8+
- Node.js 16+ (for frontend)
- Python 3.8 or higher
- OpenCV (cv2)
- Ultralytics YOLO
- TensorRT
- NumPy
- CuPy (for GPU acceleration)
- Numba
- Norfair (for object tracking)
- FastAPI
- WebSockets
- Uvicorn
- Node.js 16+ and npm
- React 18+
- Material-UI (MUI)
- Recharts
- Axios
- Arduino IDE 1.8+ or PlatformIO
- ESP8266 Board Support Package
- ESP32 Board Support Package
- Required Libraries:
- WebSocketsServer (for ESP8266)
- ArduinoJson
- Servo
git clone <repository-url>
cd ROV-Real-Time-Object-Detection# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install opencv-python ultralytics numpy cupy numba norfair fastapi websockets uvicorncd REACT+API/rov_frontend
npm installSee ARDUINO/README.md for detailed instructions on flashing the ESP8266 and ESP32S3 firmware.
The system uses a WiFi Access Point (AP) mode. Configure the following:
ESP32S3 Camera (Access Point):
- SSID:
ESP32-CAM(default) - Password:
123456789(default) - IP:
192.168.4.1(default)
ESP8266 Motor Controller:
- Connects to ESP32-CAM network
- Static IP:
192.168.4.2(or 3, 4, 5 for multiple ROVs) - WebSocket Port:
81
Edit Object detection/camera_detector.py:
VIDEO_STREAM_SOURCE = "http://192.168.4.1:81/stream" # Camera stream URL
MODEL_PATH = "yolo12n.engine" # TensorRT model path
MODEL_INPUT_SIZE = 320 # Input image size
DISPLAY = True # Show video windowEdit REACT+API/rov_backend.py:
CAR_IPS = ["192.168.4.2", "192.168.4.3", "192.168.4.4", "192.168.4.5"] # ROV IPs
CAR_PORT = 81 # WebSocket port
LOG_FILE_PATH = "detections_log.txt" # Log file pathEdit REACT+API/rov_frontend/src/App.js:
const API_URL = 'http://localhost:8000'; // Backend API URL- Start the Backend Server:
cd REACT+API
python rov_backend.py
# Or with uvicorn:
uvicorn rov_backend:app --host 0.0.0.0 --port 8000- Start the Frontend:
cd REACT+API/rov_frontend
npm start- Start Object Detection:
cd "Object detection"
python camera_detector.py- Access the Web Interface:
- Open browser to
http://localhost:3000 - The ROV controller interface will load
- Open browser to
- Joystick Control: Use the joystick card to control movement in real-time
- Path Planning:
- Click dots on the grid to create a path
- Click "Start" to execute the path automatically
- Pan/Tilt: Drag the pointer in the pan/tilt box to adjust camera angle
- Movement Settings: Adjust speed and duration sliders for fine control
- Detection Chart: View pie chart of detected object types
- Session Management: Start new measurement sessions with labels
- Log Viewing: Detection logs are automatically updated in real-time
Send movement command to ROV.
Request Body:
{
"left": 150, // Left motor speed (-255 to 255)
"right": -150, // Right motor speed (-255 to 255, typically inverted)
"pan": 90, // Pan angle (0-180)
"tilt": 90 // Tilt angle (0-180)
}Response:
{
"ok": true
}Real-time bidirectional communication with ROV.
Messages: JSON strings with status updates from ROV.
Start a new detection logging session.
Response:
{
"ok": true,
"start_pos": 1234
}Get new log entries since last session start.
Response:
{
"ok": true,
"entries": "2024-01-01 12:00:00.123 | ID: 1 | class: person | x: 100 | y: 200\n..."
}End current logging session.
Start a new measurement session with optional label.
Request Body:
{
"label": "Test Run 1"
}Response:
{
"ok": true,
"session_id": "20240101120000"
}- Verify ESP32S3 is powered and connected
- Check WiFi connection to
ESP32-CAMnetwork - Verify camera stream URL in detection script
- Check camera module connections
- Verify ESP8266 is connected to WiFi network
- Check WebSocket connection in backend logs
- Verify motor driver connections
- Check power supply voltage
- Verify NVIDIA GPU and CUDA are installed
- Check TensorRT model file exists
- Verify camera stream is accessible
- Check GPU memory availability
- Verify backend server is running on port 8000
- Check CORS settings in backend
- Verify API_URL in frontend code
- Check browser console for errors
- Reduce Model Input Size: Lower
MODEL_INPUT_SIZEfor faster inference - Adjust Confidence Threshold: Modify
confparameter in YOLO predict call - Disable Display: Set
DISPLAY = Falseto reduce CPU usage - Optimize Network: Use wired connection for lower latency
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
- Ultralytics YOLO for object detection
- Norfair for object tracking
- FastAPI for the backend framework