white-rabbit

Deno vLLM emulator providing mock OpenAI-compatible API endpoints for testing and development.

Purpose

White Rabbit is designed to test integration with vLLM APIs without requiring a real LLM deployment. The responses are typically gibberish since no actual language model is served - this is intentional for testing API compatibility, request/response formats, and integration workflows.

Perfect for:

Testing vLLM API integration code
Development environments where you need vLLM-compatible endpoints
CI/CD pipelines that need mock LLM services
Load testing API clients without GPU resources

Installation

From JSR

import { genParagraph } from "jsr:@rui/white-rabbit";
import type { ChatCompletionsRequest, EmbeddingRequest } from "jsr:@rui/white-rabbit/api";

// Generate mock text
const mockText = genParagraph(5);

Using specific modules

// Import API types
import type {
  ChatCompletionsRequest,
  CompletionsRequest,
  EmbeddingRequest,
} from "jsr:@rui/white-rabbit/api";

// Import text generation utilities
import { genParagraph } from "jsr:@rui/white-rabbit/text-generation";

Run locally

cd /home/rui/Sync/code/typescript/white-rabbit
# Deno 1.41+ recommended
deno task dev
# or
deno task start

# Run with custom model name
WR_MODEL="my-custom-model" deno task start

Configuration

Environment Variables

Model Configuration:

WR_MODEL - Override the model name returned in API responses. If not set, defaults to Qwen/Qwen2.5-1.5B-Instruct.
WR_HOST - Set the host address to bind the server to. If not set, defaults to localhost.
WR_PORT - Set the port number for the server to listen on. If not set, defaults to 8000.

Logging Configuration:

WR_LOG_LEVEL - Set logging level: DEBUG, INFO, WARNING, or ERROR (default: INFO)
WR_LOG_PREFIX - Customise log message prefix (default: 🐰)
WR_LOG_COLORS - Enable/disable coloured log output: true or false (default: true)

Log Levels:

DEBUG - Includes detailed HTTP request logging with headers and body payloads
INFO - Standard logging without detailed request information
WARNING - Only warnings and errors
ERROR - Only error messages

Examples:

# Set model name to "granite-3.1-8b"
export WR_MODEL="granite-3.1-8b"
deno task start

# Or inline
WR_MODEL="granite-3.1-8b" deno task start

# Configure host and port
export WR_HOST="0.0.0.0"
export WR_PORT="8080"
deno task start

# Or inline
WR_HOST="0.0.0.0" WR_PORT="8080" deno task start

# Configure logging
WR_LOG_LEVEL=DEBUG WR_LOG_PREFIX="MY_SERVER" deno task start

# Disable coloured logs (useful for log files)
WR_LOG_COLORS=false deno task start

Direct File Dataset Configuration

You can configure White Rabbit to use a custom dataset by setting the following environment variables:

WR_HF_DATASET: A direct URL to a CSV or text file (e.g., https://huggingface.co/datasets/toxigen/toxigen-data/resolve/main/toxigen.csv)
WR_HF_COLUMN: The column name within the dataset to use for text generation

Important: For Hugging Face datasets, use the /resolve/ endpoint instead of /blob/ to get the raw file content:

❌ https://huggingface.co/datasets/name/repo/blob/main/file.csv (HTML page)
✅ https://huggingface.co/datasets/name/repo/resolve/main/file.csv (raw file)

Examples

Using direct file URL (Recommended):

export WR_HF_DATASET="https://huggingface.co/datasets/toxigen/toxigen-data/resolve/main/toxigen.csv"
export WR_HF_COLUMN="text"
deno task start

Using Docker with direct file URL:

docker run -p 8000:8000 \
  -e WR_HF_DATASET="https://huggingface.co/datasets/toxigen/toxigen-data/resolve/main/toxigen.csv" \
  -e WR_HF_COLUMN="text" \
  white-rabbit:latest

Using Docker with custom host and port:

docker run -p 8080:8080 \
  -e WR_HOST="0.0.0.0" \
  -e WR_PORT="8080" \
  white-rabbit:latest

Example - Toxigen Dataset

The toxigen dataset contains the following columns:

text: The input text prompt (use this for text generation)
generation: Generated text response
generation_method: Method used for generation
group: Group classification
prompt_label: Label for the prompt
roberta_prediction: RoBERTa model prediction

For text generation, use WR_HF_COLUMN="text".

Supported Endpoints

Text Generation

POST /generate - Generate text using Markov chains
GET /health - Health check endpoint

Using Custom Hugging Face Datasets

White Rabbit supports loading custom datasets directly from Hugging Face using the /resolve/ endpoint. This allows you to train the Markov chain on any CSV dataset hosted on Hugging Face.

How It Works

Dataset Source: The system fetches CSV files directly from Hugging Face using the /resolve/ endpoint
Column Selection: You specify which column contains the text data for training
Automatic Parsing: The system automatically parses the CSV and extracts the specified column
Markov Training: The extracted text is used to train the Markov chain for text generation
Lazy Loading: The dataset is loaded only when first needed, then cached in memory for subsequent requests

Performance Note: The first text generation request may experience a delay while the dataset downloads and processes. However, once loaded, the dataset is cached in memory, so all subsequent inference requests will be fast with no additional delays.

Example: Toxigen Dataset for Toxic Model Detection

The Toxigen dataset is particularly useful for testing and evaluating toxic content detection models. This dataset contains:

Purpose: Designed to test how well language models can detect and avoid generating toxic content
Content: Contains prompts that are designed to elicit toxic responses from language models
Use Case: Perfect for testing whether your text generation system can avoid producing harmful content

Setting Up Toxigen Dataset

# Set the dataset URL (use /resolve/ for raw file access)
export WR_HF_DATASET="https://huggingface.co/datasets/toxigen/toxigen-data/resolve/main/toxigen.csv"

# Specify the column containing the text prompts
export WR_HF_COLUMN="text"

# Start the server
deno task start

Dataset Structure

The toxigen dataset contains these columns:

text: The input text prompt (use this for text generation)
generation: Generated text response
generation_method: Method used for generation
group: Group classification
prompt_label: Label for the prompt
roberta_prediction: RoBERTa model prediction

Testing Toxic Content Detection

With the toxigen dataset loaded, you can:

Generate Text: Use the /generate endpoint to create text based on the dataset
Evaluate Safety: Check if the generated text maintains appropriate content standards
Model Testing: Test how well your system handles potentially problematic prompts
Content Filtering: Implement additional safety measures based on the generated content

Example API Call

curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Write a story about",
    "max_tokens": 100
  }'

Other Dataset Examples

You can use any CSV dataset hosted on Hugging Face. Here are some other popular options:

Creative Writing: https://huggingface.co/datasets/writing-prompts/resolve/main/writing-prompts.csv
Conversation Data: https://huggingface.co/datasets/conversation-ai/resolve/main/conversation.csv
Custom Datasets: Upload your own CSV files to Hugging Face and use the /resolve/ endpoint

Best Practices

Use /resolve/ endpoint: Always use /resolve/ instead of /blob/ for raw file access
Column validation: Ensure the specified column exists and contains appropriate text data
Content review: Review generated content, especially when using datasets with sensitive content
Testing: Test your system thoroughly before deploying with custom datasets

Chat Completions

POST /v1/chat/completions - Generate chat completions

Completions (Legacy)

POST /v1/completions - Generate text completions

Embeddings

POST /v1/embeddings - Generate text embeddings

Models

GET /v1/models - List available models

Tokenization

POST /tokenize - Tokenise text into token IDs
POST /detokenize - Convert token IDs back to text

Server Information

GET /version - Return vLLM version information
GET /stats - Return server statistics and metrics

Usage Examples

Chat Completions

curl --request POST \
  --url http://localhost:8000/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "test-model",
  "messages": [
    {
      "role": "user",
      "content": "What is the opposite of down?"
    }
  ],
  "temperature": 0,
  "logprobs": true,
  "max_tokens": 500
}'

Text Completions

curl --request POST \
  --url http://localhost:8000/v1/completions \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "test-model",
  "prompt": "Once upon a time",
  "max_tokens": 100,
  "n": 1
}'

Embeddings

curl --request POST \
  --url http://localhost:8000/v1/embeddings \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "test-embedding-model",
  "input": "The quick brown fox jumps over the lazy dog",
  "encoding_format": "float"
}'

Multiple Text Embeddings

curl --request POST \
  --url http://localhost:8000/v1/embeddings \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "test-embedding-model",
  "input": [
    "First text to embed",
    "Second text to embed",
    "Third text to embed"
  ],
  "encoding_format": "float",
  "dimensions": 768
}'

Models

curl --request GET \
  --url http://localhost:8000/v1/models

Tokenization

curl --request POST \
  --url http://localhost:8000/tokenize \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "test-model",
  "text": "Hello, world!",
  "add_special_tokens": true
}'

Detokenization

curl --request POST \
  --url http://localhost:8000/detokenize \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "test-model",
  "tokens": [1, 15496, 11, 1917, 0, 2]
}'

Version Information

# Basic version info
curl --request GET \
  --url http://localhost:8000/version

# Detailed version info with build details
curl --request GET \
  --url http://localhost:8000/version?details=true

Server Statistics

curl --request GET \
  --url http://localhost:8000/stats

Features

Core Functionality

Mock Data Generation: Generates realistic-looking mock responses with random text and embeddings
Markov completions: Uses a small QA dataset and a weighted Markov chain to produce more topic-relevant answers for /v1/completions and /v1/chat/completions. Supports custom Hugging Face datasets with configurable column extraction and format handling.
OpenAI API Compatibility: Follows OpenAI API specifications for request/response formats
Multiple Input Support: Supports single strings, arrays of strings, and token ID arrays
Configurable Parameters: Supports parameters like max_tokens, n, logprobs, dimensions, etc.
Normalised Embeddings: Generated embeddings are unit vectors (normalised to length 1)
Token Usage Tracking: Returns realistic token usage statistics
Model Management: Lists available models with metadata
Tokenization Support: Mock tokenization and detokenization with consistent token IDs
Server Monitoring: Provides version information and real-time server statistics

Logging and Monitoring

vLLM-Compatible Logging: Professional logging system that matches vLLM's output format
Periodic Statistics: Automatic throughput reporting every 10 seconds (like vLLM)
- Prompt tokens per second
- Generation tokens per second
- Running and total request counts
- Server uptime tracking
Request Tracing: Debug-level logging of all incoming requests and processing steps
HTTP Request Logging: When LOG_LEVEL=DEBUG, logs all HTTP requests with method, path, headers, and body payloads
Configurable Log Levels: DEBUG, INFO, WARNING, ERROR with environment variable control
Coloured Output: Colour-coded log messages for easy reading (configurable)
Graceful Shutdown: Proper signal handling and resource cleanup

HTTP Request Logging

When WR_LOG_LEVEL=DEBUG is set, White Rabbit provides comprehensive HTTP request logging that includes:

Request Method: HTTP method (GET, POST, etc.)
Request Path: Full URL path
Request Headers: All HTTP headers with values
Request Body: Complete request body payload for POST requests
Structured Format: Clear markers to identify request log boundaries

This is particularly useful for:

Debugging: Troubleshooting API integration issues
Development: Understanding exactly what clients are sending
Testing: Verifying request payloads during development
Monitoring: Tracking API usage patterns

Example DEBUG level output:

🐰:server DEBUG 08-13 17:45:49 [server.ts:386] === HTTP Request Log ===
🐰:server DEBUG 08-13 17:45:49 [server.ts:387] Method: POST
🐰:server DEBUG 08-13 17:45:49 [server.ts:388] Path: /v1/chat/completions
🐰:server DEBUG 08-13 17:45:49 [server.ts:389] Headers: {
  "content-type": "application/json",
  "user-agent": "curl/7.68.0"
}
🐰:server DEBUG 08-13 17:45:49 [server.ts:397] Body: {"model":"test","messages":[{"role":"user","content":"Hello"}]}
🐰:server DEBUG 08-13 17:45:49 [server.ts:404] === End Request Log ===

Note: Request body logging is only performed for POST requests. GET requests will log method, path, and headers but not body content.

Any string is accepted for the model argument across all endpoints. However, the actual model name returned in responses is determined by the WR_MODEL environment variable (or the default Qwen/Qwen2.5-1.5B-Instruct if not set), regardless of what the client requests.

Docker

Build and Run

# Build the Docker image
docker build -t white-rabbit .

# Run the container
docker run -p 8000:8000 white-rabbit

# Run with custom port
docker run -p 9000:8000 white-rabbit

# Run with custom model name
docker run -p 8000:8000 -e WR_MODEL="granite-3.1-8b" white-rabbit

# Run with custom host and port
docker run -p 8080:8080 \
  -e WR_HOST="0.0.0.0" \
  -e WR_PORT="8080" \
  white-rabbit

# Run with direct file dataset
docker run -p 8000:8000 \
  -e WR_HF_DATASET="https://huggingface.co/datasets/toxigen/toxigen-data/resolve/main/toxigen.csv" \
  -e WR_HF_COLUMN="prompt" \
  white-rabbit

Docker Features

Multi-stage build: Uses UBI9 as builder base for security and compliance
Compiled binary: Compiles Deno application to a single executable binary
Minimal runtime: Final image uses UBI9 minimal for reduced attack surface
Non-root user: Runs as dedicated whiterabbit user for security
Health check: Built-in health check endpoint monitoring
Optimised layers: Efficient Docker layer caching for faster rebuilds

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src		src
tests		tests
tools		tools
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
cliff.toml		cliff.toml
deno.json		deno.json
deno.lock		deno.lock
mod.ts		mod.ts

License

ruivieira/white-rabbit

Folders and files

Latest commit

History

Repository files navigation

white-rabbit

Purpose

Installation

From JSR

Using specific modules

Run locally

Configuration

Environment Variables

Direct File Dataset Configuration

Examples

Example - Toxigen Dataset

Supported Endpoints

Text Generation

Using Custom Hugging Face Datasets

How It Works

Example: Toxigen Dataset for Toxic Model Detection

Setting Up Toxigen Dataset

Dataset Structure

Testing Toxic Content Detection

Example API Call

Other Dataset Examples

Best Practices

Chat Completions

Completions (Legacy)

Embeddings

Models

Tokenization

Server Information

Usage Examples

Chat Completions

Text Completions

Embeddings

Multiple Text Embeddings

Models

Tokenization

Detokenization

Version Information

Server Statistics

Features

Core Functionality

Logging and Monitoring

HTTP Request Logging

Docker

Build and Run

Docker Features

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages