Skip to content

hidatara-ds/pepper-robots

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

🤖 Pepper Robot Management System

A Cost-Effective Cloud-Offloading Architecture for Human-Robot Interaction
Integrating LLMs and Computer Vision on Resource-Constrained Humanoid Robots

License Python Flask Google Cloud Docker

Author: Gilang Hidayatullah


Overview

Pepper robots are resource-constrained humanoid platforms with limited onboard compute. This project addresses that constraint through a cloud-offloading architecture — offloading heavy AI workloads (speech recognition, language understanding, face recognition) to Google Cloud services, while the robot handles physical interaction and I/O. The result is a full-stack system enabling real-time conversation, face-based identity recognition, and programmable movement — at a fraction of the cost of onboard processing.

The system is organized into three independently deployable microservices:

Service Folder Purpose
Backend Management backend/ REST API, auth, robot control, admin UI
AI Conversation ai-chat/ Voice-based dialogue via Gemini + GCP STT/TTS
Face Recognition face-recognition/ Real-time face detection with cloud-synced database

Architecture

Cloud Logic Architecture

                          ┌─────────────────────────────────────┐
                          │           Google Cloud Platform      │
                          │  ┌──────────┐  ┌──────────────────┐ │
                          │  │  Vertex  │  │  Cloud Storage   │ │
                          │  │  AI      │  │  (face database) │ │
                          │  │ (Gemini) │  └──────────────────┘ │
                          │  └──────────┘  ┌──────────────────┐ │
                          │  ┌──────────┐  │  STT / TTS APIs  │ │
                          │  │Cloud Run │  └──────────────────┘ │
                          │  └──────────┘                       │
                          └────────────┬────────────────────────┘
                                       │ REST / gRPC
               ┌───────────────────────┼───────────────────────┐
               │                       │                       │
      ┌────────▼──────┐    ┌───────────▼──────┐    ┌──────────▼──────┐
      │   backend     │    │     ai-chat        │   │face-recognition │
      │  Flask API    │    │  Flask + Docker   │    │  Flask + DeepFace│
      └───────┬───────┘    └─────────┬─────────┘    └────────┬────────┘
              │                      │                        │
              └──────────────────────┴────────────────────────┘
                                     │ SSH / NAOqi SDK
                                ┌────▼─────┐
                                │  Pepper  │
                                │  Robot   │
                                └──────────┘

Services

1. Backend Management System (backend/)

A Flask-based REST API that serves as the central control plane for the robot:

  • JWT-authenticated user management and role-based access control
  • Robot movement and choreography sequence management (walk, dance patterns)
  • SSH-based command dispatch to the Pepper robot
  • Face identity database management (linked to GCS)
  • AI conversation session orchestration
  • Web-based admin dashboard (Bootstrap, HTML/CSS/JS)
  • Multi-language support (Indonesian / English)

Stack: Python 3.11, Flask, SQLAlchemy (SQLite), JWT, Google Cloud Storage


2. AI Conversation Service (ai-chat/)

A voice-first dialogue service that gives Pepper natural language capabilities without onboard ML:

  • Captures audio input → Google Cloud Speech-to-Text for transcription
  • Sends transcript to Vertex AI (Gemini 2.0 Flash) for response generation
  • Synthesizes response audio via Google Cloud Text-to-Speech
  • Maintains per-session conversation history for multi-turn context
  • Containerized and deployable to Cloud Run

Stack: Flask, Vertex AI (Gemini 2.0 Flash), Google Cloud STT/TTS, Docker

Voice Processing Flow


3. Face Recognition Service (face-recognition/)

A real-time face recognition service backed by cloud-synced identity storage:

  • Detects and identifies faces from live camera frames using DeepFace + VGGFace
  • Stores and retrieves identity embeddings via Google Cloud Storage (no local DB required)
  • Auto-syncs the local face cache with GCS on startup
  • Exposes a REST API for robot integration

Stack: Flask, DeepFace, OpenCV, VGGFace, Google Cloud Storage

Face Processing Flow


Project Structure

pepper-robots/
├── backend/                         # Backend management system
│   ├── app/
│   │   ├── controller/              # API route handlers
│   │   ├── model/                   # SQLAlchemy database models
│   │   ├── services/                # Business logic
│   │   ├── templates/               # Jinja2 web UI templates
│   │   ├── static/                  # CSS, JS, images
│   │   └── utils/                   # Shared utilities
│   ├── tests/                       # Unit and integration tests
│   └── docs/                        # API documentation
│
├── ai-chat/                         # AI conversation microservice
│   ├── app.py                       # Flask application entry point
│   ├── Dockerfile                   # Container build config
│   └── requirements.txt
│
├── face-recognition/                # Face recognition microservice
│   ├── app.py                       # Flask application entry point
│   ├── gcs_handler.py               # Google Cloud Storage integration
│   └── pepper_client.py             # Pepper robot client library
│
├── assets/                          # Architecture & flow diagrams
│   ├── cloud-logic.png
│   ├── face-processing.png
│   └── voice-processing.png
│
├── .gitignore
├── AUTHORS
├── CITATION.cff
├── CONTRIBUTING.md
├── LICENSE                          # Apache 2.0
└── SECURITY.md

Getting Started

Prerequisites

  • Python 3.11+
  • A Google Cloud Platform project with these APIs enabled:
    • Vertex AI API
    • Cloud Speech-to-Text API
    • Cloud Text-to-Speech API
    • Cloud Storage API
  • A GCP service account key (JSON) with appropriate permissions
  • Docker (for the AI conversation service)
  • Access to a Pepper robot (optional — core services run without it)

1. Backend Management System

cd backend
python -m venv venv
source venv/bin/activate          # Linux/macOS
# venv\Scripts\activate           # Windows

pip install -r requirements.txt
cp .env.example .env              # Fill in your credentials
python run.py

Key .env variables:

SECRET_KEY=your_flask_secret
GCS_BUCKET_NAME=your_bucket
GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json
PEPPER_HOST=192.168.x.x          # Robot IP (optional)

2. AI Conversation Service

With Docker (recommended):

cd ai-chat
docker build -t pepper-ai .
docker run -p 5000:5000 \
  -e GOOGLE_APPLICATION_CREDENTIALS=/app/service-account.json \
  -v /path/to/service-account.json:/app/service-account.json \
  pepper-ai

Without Docker:

cd ai-chat
pip install -r requirements.txt
python app.py

3. Face Recognition Service

cd face-recognition
pip install -r requirements.txt
export GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json
export GCS_BUCKET_NAME=your_bucket
python app.py

On startup, the service will sync the face database from GCS to a local cache automatically.


API Reference

Each service exposes its own REST API. Refer to the individual README files in each subdirectory for full endpoint documentation:


License

Licensed under the Apache License, Version 2.0.

Copyright 2026 Gilang Hidayatullah


Citation

If you use this work in academic research, please cite:

@software{hidayatullah2026pepper,
  author    = {Hidayatullah, Gilang},
  title     = {Pepper Robot Management System: A Cost-Effective Cloud-Offloading
               Architecture for Human-Robot Interaction},
  year      = {2026},
  url       = {https://github.com/hidatara-ds/pepper-robots}
}

Contributing

Contributions are welcome. Please read CONTRIBUTING.md before submitting a pull request.

About

A Cost-Effective Cloud-Offloading Architecture for Human-Robot Interaction: Integrating LLMs and Computer Vision on Resource-Constrained Humanoid Robots

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors