Gemma3n Models Benchmark

A comprehensive benchmark suite for testing Gemma3n language models (2.6B and 4.5B parameters) using llama.cpp native executables. This benchmark focuses on Spanish administrative tasks and compares CPU vs GPU performance.

Features

Model Support: Benchmarks both Gemma3n 2.6B and 4.5B models
Performance Comparison: CPU-only vs GPU-accelerated inference
Administrative Focus: 8 Spanish administrative prompts covering government/office workflows
Comprehensive Metrics: Tokens/second, response time, success rates, and GPU speedup ratios
Multiple Output Formats: Detailed JSON results and CSV summaries
System Information: Captures hardware and software configuration for reproducible results

Prerequisites

Required Files

llama.cpp executable: build/bin/llama-cli
Model files:
- models/gemma-3n-E2B-it-Q4_K_M.gguf (2.6B model)
- models/gemma-3n-E4B-it-Q4_K_M.gguf (4.5B model)

Dependencies

# Install minimal dependencies
pip install -r requirements.txt

# Or install manually
pip install pandas psutil

# All llama.cpp tool dependencies (optional)
pip install -r requirements/requirements-all.txt

System Requirements

Python 3.8+
Built llama.cpp with CLI support
CUDA-compatible GPU (optional, for GPU benchmarks)
Sufficient RAM for model loading (8GB+ recommended)

Quick Start

1. Setup

# Clone and navigate to the repository
cd gemma3n-research

# Install dependencies
pip install -r requirements.txt

# Ensure llama.cpp is built
# (Follow llama.cpp build instructions if needed)

2. Run Benchmark

# Run comprehensive benchmark (CPU + GPU)
./run_benchmark.sh

# Run CPU-only benchmark
./run_benchmark.sh --cpu-only

# Run GPU-only benchmark
./run_benchmark.sh --gpu-only

# Custom configuration
./run_benchmark.sh --threads 8 --gpu-layers 50

3. View Results

Results are saved in the benchmark_results/ directory:

gemma3n_benchmark_YYYYMMDD_HHMMSS.json - Detailed results
gemma3n_benchmark_summary_YYYYMMDD_HHMMSS.csv - Summary table

Usage

Command Line Options

run_benchmark.sh

Options:
  -c, --cpu-only           Run CPU-only benchmark
  -g, --gpu-only           Run GPU-accelerated benchmark only
  -a, --all                Run comprehensive benchmark (CPU + GPU) [DEFAULT]
  -t, --threads <N>        Set number of CPU threads (default: 4)
  -l, --gpu-layers <N>     Set number of GPU layers (default: 99)
  -o, --output-dir <DIR>   Set output directory (default: benchmark_results)
  -h, --help               Show help message

benchmark_gemma3n.py

python benchmark_gemma3n.py [options]

Options:
  --cpu-threads N          Number of CPU threads to use
  --gpu-layers N           Maximum GPU layers to use (default: 32)
  --output-dir DIR         Output directory for results (default: benchmark_results)
  --executable PATH        Path to llama-cli executable

Benchmark Tasks

The benchmark tests models on 8 Spanish administrative tasks:

Remote Work Policy: Draft remote work guidelines for government agencies
Budget Proposal: Create IT infrastructure budget summary for municipal offices
Incident Report: Develop security incident report template
Document Approval: Generate standard operating procedure for document workflows
Compliance Memo: Write data protection compliance requirements memo
Employee Onboarding: Create public administration employee onboarding instructions
Performance Evaluation: Develop performance evaluation checklist for administrative staff
Procurement Request: Create equipment and supplies procurement form

Output Format

JSON Results

Detailed results include:

System information (CPU, memory, GPU, etc.)
Per-prompt results with response text, timing, and token counts
Configuration details for each test run
Success/failure status for each inference

CSV Summary

Summary table with:

Model name and configuration
Average tokens per second
Average response time
Total tokens generated
Success rate
GPU speedup ratios

Performance Metrics

Tokens/Second: Inference speed measurement
Response Time: Total time per prompt (seconds)
Success Rate: Percentage of successful inferences
GPU Speedup: Performance improvement ratio (GPU vs CPU)
Total Tokens: Cumulative tokens generated across all prompts

Configuration

Model Settings

Quantization: Q4_K_M for both models
Max Tokens: 150 per prompt
Temperature: 0.7
Top-p: 0.9
Context Size: 2048 tokens
Batch Size: 512

Hardware Settings

CPU Threads: Configurable (default: auto-detect)
GPU Layers: Configurable (default: 32-99 depending on script)
Timeout: 120 seconds per inference

Troubleshooting

Common Issues

llama-cli not found
- Ensure llama.cpp is built: make -C llama.cpp
- Check executable path: build/bin/llama-cli
Model files missing
- Download or convert models to GGUF format
- Place in models/ directory with expected filenames
GPU not detected
- Verify CUDA installation
- Check GPU layers setting
- Ensure llama.cpp built with CUDA support
Memory issues
- Reduce GPU layers for large models
- Ensure sufficient system RAM
- Monitor memory usage during benchmarks

Verification

# Check system prerequisites
python benchmark_gemma3n.py --help

# Test llama.cpp installation
build/bin/llama-cli --help

# Verify model files
ls -la models/

Contributing

When contributing to this benchmark:

Maintain focus on administrative/government use cases
Test changes with both CPU and GPU configurations
Ensure output format compatibility
Update documentation for new features

License

This benchmark suite is part of the montevive research project. Please refer to the project's main license file for usage terms.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
benchmark_results		benchmark_results
CLAUDE.md		CLAUDE.md
README.md		README.md
benchmark_gemma3n.py		benchmark_gemma3n.py
requirements.txt		requirements.txt
run_benchmark.sh		run_benchmark.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Gemma3n Models Benchmark

Features

Prerequisites

Required Files

Dependencies

System Requirements

Quick Start

1. Setup

2. Run Benchmark

3. View Results

Usage

Command Line Options

run_benchmark.sh

benchmark_gemma3n.py

Benchmark Tasks

Output Format

JSON Results

CSV Summary

Performance Metrics

Configuration

Model Settings

Hardware Settings

Troubleshooting

Common Issues

Verification

Contributing

License

About

Uh oh!

Releases

Packages

Languages

montevive/gemma3n-benchmark

Folders and files

Latest commit

History

Repository files navigation

Gemma3n Models Benchmark

Features

Prerequisites

Required Files

Dependencies

System Requirements

Quick Start

1. Setup

2. Run Benchmark

3. View Results

Usage

Command Line Options

run_benchmark.sh

benchmark_gemma3n.py

Benchmark Tasks

Output Format

JSON Results

CSV Summary

Performance Metrics

Configuration

Model Settings

Hardware Settings

Troubleshooting

Common Issues

Verification

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages