Skip to content

montevive/gemma3n-benchmark

Repository files navigation

Gemma3n Models Benchmark

A comprehensive benchmark suite for testing Gemma3n language models (2.6B and 4.5B parameters) using llama.cpp native executables. This benchmark focuses on Spanish administrative tasks and compares CPU vs GPU performance.

Features

  • Model Support: Benchmarks both Gemma3n 2.6B and 4.5B models
  • Performance Comparison: CPU-only vs GPU-accelerated inference
  • Administrative Focus: 8 Spanish administrative prompts covering government/office workflows
  • Comprehensive Metrics: Tokens/second, response time, success rates, and GPU speedup ratios
  • Multiple Output Formats: Detailed JSON results and CSV summaries
  • System Information: Captures hardware and software configuration for reproducible results

Prerequisites

Required Files

  1. llama.cpp executable: build/bin/llama-cli
  2. Model files:
    • models/gemma-3n-E2B-it-Q4_K_M.gguf (2.6B model)
    • models/gemma-3n-E4B-it-Q4_K_M.gguf (4.5B model)

Dependencies

# Install minimal dependencies
pip install -r requirements.txt

# Or install manually
pip install pandas psutil

# All llama.cpp tool dependencies (optional)
pip install -r requirements/requirements-all.txt

System Requirements

  • Python 3.8+
  • Built llama.cpp with CLI support
  • CUDA-compatible GPU (optional, for GPU benchmarks)
  • Sufficient RAM for model loading (8GB+ recommended)

Quick Start

1. Setup

# Clone and navigate to the repository
cd gemma3n-research

# Install dependencies
pip install -r requirements.txt

# Ensure llama.cpp is built
# (Follow llama.cpp build instructions if needed)

2. Run Benchmark

# Run comprehensive benchmark (CPU + GPU)
./run_benchmark.sh

# Run CPU-only benchmark
./run_benchmark.sh --cpu-only

# Run GPU-only benchmark
./run_benchmark.sh --gpu-only

# Custom configuration
./run_benchmark.sh --threads 8 --gpu-layers 50

3. View Results

Results are saved in the benchmark_results/ directory:

  • gemma3n_benchmark_YYYYMMDD_HHMMSS.json - Detailed results
  • gemma3n_benchmark_summary_YYYYMMDD_HHMMSS.csv - Summary table

Usage

Command Line Options

run_benchmark.sh

Options:
  -c, --cpu-only           Run CPU-only benchmark
  -g, --gpu-only           Run GPU-accelerated benchmark only
  -a, --all                Run comprehensive benchmark (CPU + GPU) [DEFAULT]
  -t, --threads <N>        Set number of CPU threads (default: 4)
  -l, --gpu-layers <N>     Set number of GPU layers (default: 99)
  -o, --output-dir <DIR>   Set output directory (default: benchmark_results)
  -h, --help               Show help message

benchmark_gemma3n.py

python benchmark_gemma3n.py [options]

Options:
  --cpu-threads N          Number of CPU threads to use
  --gpu-layers N           Maximum GPU layers to use (default: 32)
  --output-dir DIR         Output directory for results (default: benchmark_results)
  --executable PATH        Path to llama-cli executable

Benchmark Tasks

The benchmark tests models on 8 Spanish administrative tasks:

  1. Remote Work Policy: Draft remote work guidelines for government agencies
  2. Budget Proposal: Create IT infrastructure budget summary for municipal offices
  3. Incident Report: Develop security incident report template
  4. Document Approval: Generate standard operating procedure for document workflows
  5. Compliance Memo: Write data protection compliance requirements memo
  6. Employee Onboarding: Create public administration employee onboarding instructions
  7. Performance Evaluation: Develop performance evaluation checklist for administrative staff
  8. Procurement Request: Create equipment and supplies procurement form

Output Format

JSON Results

Detailed results include:

  • System information (CPU, memory, GPU, etc.)
  • Per-prompt results with response text, timing, and token counts
  • Configuration details for each test run
  • Success/failure status for each inference

CSV Summary

Summary table with:

  • Model name and configuration
  • Average tokens per second
  • Average response time
  • Total tokens generated
  • Success rate
  • GPU speedup ratios

Performance Metrics

  • Tokens/Second: Inference speed measurement
  • Response Time: Total time per prompt (seconds)
  • Success Rate: Percentage of successful inferences
  • GPU Speedup: Performance improvement ratio (GPU vs CPU)
  • Total Tokens: Cumulative tokens generated across all prompts

Configuration

Model Settings

  • Quantization: Q4_K_M for both models
  • Max Tokens: 150 per prompt
  • Temperature: 0.7
  • Top-p: 0.9
  • Context Size: 2048 tokens
  • Batch Size: 512

Hardware Settings

  • CPU Threads: Configurable (default: auto-detect)
  • GPU Layers: Configurable (default: 32-99 depending on script)
  • Timeout: 120 seconds per inference

Troubleshooting

Common Issues

  1. llama-cli not found

    • Ensure llama.cpp is built: make -C llama.cpp
    • Check executable path: build/bin/llama-cli
  2. Model files missing

    • Download or convert models to GGUF format
    • Place in models/ directory with expected filenames
  3. GPU not detected

    • Verify CUDA installation
    • Check GPU layers setting
    • Ensure llama.cpp built with CUDA support
  4. Memory issues

    • Reduce GPU layers for large models
    • Ensure sufficient system RAM
    • Monitor memory usage during benchmarks

Verification

# Check system prerequisites
python benchmark_gemma3n.py --help

# Test llama.cpp installation
build/bin/llama-cli --help

# Verify model files
ls -la models/

Contributing

When contributing to this benchmark:

  1. Maintain focus on administrative/government use cases
  2. Test changes with both CPU and GPU configurations
  3. Ensure output format compatibility
  4. Update documentation for new features

License

This benchmark suite is part of the montevive research project. Please refer to the project's main license file for usage terms.

About

Gemma 3n benchmark (CPU vs GPU)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published