Deno vLLM emulator providing mock OpenAI-compatible API endpoints for testing and development.
White Rabbit is designed to test integration with vLLM APIs without requiring a real LLM deployment. The responses are typically gibberish since no actual language model is served - this is intentional for testing API compatibility, request/response formats, and integration workflows.
Perfect for:
- Testing vLLM API integration code
- Development environments where you need vLLM-compatible endpoints
- CI/CD pipelines that need mock LLM services
- Load testing API clients without GPU resources
import { genParagraph } from "jsr:@rui/white-rabbit";
import type { ChatCompletionsRequest, EmbeddingRequest } from "jsr:@rui/white-rabbit/api";
// Generate mock text
const mockText = genParagraph(5);// Import API types
import type {
ChatCompletionsRequest,
CompletionsRequest,
EmbeddingRequest,
} from "jsr:@rui/white-rabbit/api";
// Import text generation utilities
import { genParagraph } from "jsr:@rui/white-rabbit/text-generation";cd /home/rui/Sync/code/typescript/white-rabbit
# Deno 1.41+ recommended
deno task dev
# or
deno task start
# Run with custom model name
WR_MODEL="my-custom-model" deno task startModel Configuration:
WR_MODEL- Override the model name returned in API responses. If not set, defaults toQwen/Qwen2.5-1.5B-Instruct.WR_HOST- Set the host address to bind the server to. If not set, defaults tolocalhost.WR_PORT- Set the port number for the server to listen on. If not set, defaults to8000.
Logging Configuration:
WR_LOG_LEVEL- Set logging level:DEBUG,INFO,WARNING, orERROR(default:INFO)WR_LOG_PREFIX- Customise log message prefix (default:π°)WR_LOG_COLORS- Enable/disable coloured log output:trueorfalse(default:true)
Log Levels:
DEBUG- Includes detailed HTTP request logging with headers and body payloadsINFO- Standard logging without detailed request informationWARNING- Only warnings and errorsERROR- Only error messages
Examples:
# Set model name to "granite-3.1-8b"
export WR_MODEL="granite-3.1-8b"
deno task start
# Or inline
WR_MODEL="granite-3.1-8b" deno task start
# Configure host and port
export WR_HOST="0.0.0.0"
export WR_PORT="8080"
deno task start
# Or inline
WR_HOST="0.0.0.0" WR_PORT="8080" deno task start
# Configure logging
WR_LOG_LEVEL=DEBUG WR_LOG_PREFIX="MY_SERVER" deno task start
# Disable coloured logs (useful for log files)
WR_LOG_COLORS=false deno task startYou can configure White Rabbit to use a custom dataset by setting the following environment variables:
WR_HF_DATASET: A direct URL to a CSV or text file (e.g.,https://huggingface.co/datasets/toxigen/toxigen-data/resolve/main/toxigen.csv)WR_HF_COLUMN: The column name within the dataset to use for text generation
Important: For Hugging Face datasets, use the /resolve/ endpoint instead of /blob/ to get
the raw file content:
- β
https://huggingface.co/datasets/name/repo/blob/main/file.csv(HTML page) - β
https://huggingface.co/datasets/name/repo/resolve/main/file.csv(raw file)
Using direct file URL (Recommended):
export WR_HF_DATASET="https://huggingface.co/datasets/toxigen/toxigen-data/resolve/main/toxigen.csv"
export WR_HF_COLUMN="text"
deno task startUsing Docker with direct file URL:
docker run -p 8000:8000 \
-e WR_HF_DATASET="https://huggingface.co/datasets/toxigen/toxigen-data/resolve/main/toxigen.csv" \
-e WR_HF_COLUMN="text" \
white-rabbit:latestUsing Docker with custom host and port:
docker run -p 8080:8080 \
-e WR_HOST="0.0.0.0" \
-e WR_PORT="8080" \
white-rabbit:latestThe toxigen dataset contains the following columns:
text: The input text prompt (use this for text generation)generation: Generated text responsegeneration_method: Method used for generationgroup: Group classificationprompt_label: Label for the promptroberta_prediction: RoBERTa model prediction
For text generation, use WR_HF_COLUMN="text".
POST /generate- Generate text using Markov chainsGET /health- Health check endpoint
White Rabbit supports loading custom datasets directly from Hugging Face using the /resolve/
endpoint. This allows you to train the Markov chain on any CSV dataset hosted on Hugging Face.
- Dataset Source: The system fetches CSV files directly from Hugging Face using the
/resolve/endpoint - Column Selection: You specify which column contains the text data for training
- Automatic Parsing: The system automatically parses the CSV and extracts the specified column
- Markov Training: The extracted text is used to train the Markov chain for text generation
- Lazy Loading: The dataset is loaded only when first needed, then cached in memory for subsequent requests
Performance Note: The first text generation request may experience a delay while the dataset downloads and processes. However, once loaded, the dataset is cached in memory, so all subsequent inference requests will be fast with no additional delays.
The Toxigen dataset is particularly useful for testing and evaluating toxic content detection models. This dataset contains:
- Purpose: Designed to test how well language models can detect and avoid generating toxic content
- Content: Contains prompts that are designed to elicit toxic responses from language models
- Use Case: Perfect for testing whether your text generation system can avoid producing harmful content
# Set the dataset URL (use /resolve/ for raw file access)
export WR_HF_DATASET="https://huggingface.co/datasets/toxigen/toxigen-data/resolve/main/toxigen.csv"
# Specify the column containing the text prompts
export WR_HF_COLUMN="text"
# Start the server
deno task startThe toxigen dataset contains these columns:
text: The input text prompt (use this for text generation)generation: Generated text responsegeneration_method: Method used for generationgroup: Group classificationprompt_label: Label for the promptroberta_prediction: RoBERTa model prediction
With the toxigen dataset loaded, you can:
- Generate Text: Use the
/generateendpoint to create text based on the dataset - Evaluate Safety: Check if the generated text maintains appropriate content standards
- Model Testing: Test how well your system handles potentially problematic prompts
- Content Filtering: Implement additional safety measures based on the generated content
curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "Write a story about",
"max_tokens": 100
}'You can use any CSV dataset hosted on Hugging Face. Here are some other popular options:
- Creative Writing:
https://huggingface.co/datasets/writing-prompts/resolve/main/writing-prompts.csv - Conversation Data:
https://huggingface.co/datasets/conversation-ai/resolve/main/conversation.csv - Custom Datasets: Upload your own CSV files to Hugging Face and use the
/resolve/endpoint
- Use
/resolve/endpoint: Always use/resolve/instead of/blob/for raw file access - Column validation: Ensure the specified column exists and contains appropriate text data
- Content review: Review generated content, especially when using datasets with sensitive content
- Testing: Test your system thoroughly before deploying with custom datasets
POST /v1/chat/completions- Generate chat completions
POST /v1/completions- Generate text completions
POST /v1/embeddings- Generate text embeddings
GET /v1/models- List available models
POST /tokenize- Tokenise text into token IDsPOST /detokenize- Convert token IDs back to text
GET /version- Return vLLM version informationGET /stats- Return server statistics and metrics
curl --request POST \
--url http://localhost:8000/v1/chat/completions \
--header 'Content-Type: application/json' \
--data '{
"model": "test-model",
"messages": [
{
"role": "user",
"content": "What is the opposite of down?"
}
],
"temperature": 0,
"logprobs": true,
"max_tokens": 500
}'curl --request POST \
--url http://localhost:8000/v1/completions \
--header 'Content-Type: application/json' \
--data '{
"model": "test-model",
"prompt": "Once upon a time",
"max_tokens": 100,
"n": 1
}'curl --request POST \
--url http://localhost:8000/v1/embeddings \
--header 'Content-Type: application/json' \
--data '{
"model": "test-embedding-model",
"input": "The quick brown fox jumps over the lazy dog",
"encoding_format": "float"
}'curl --request POST \
--url http://localhost:8000/v1/embeddings \
--header 'Content-Type: application/json' \
--data '{
"model": "test-embedding-model",
"input": [
"First text to embed",
"Second text to embed",
"Third text to embed"
],
"encoding_format": "float",
"dimensions": 768
}'curl --request GET \
--url http://localhost:8000/v1/modelscurl --request POST \
--url http://localhost:8000/tokenize \
--header 'Content-Type: application/json' \
--data '{
"model": "test-model",
"text": "Hello, world!",
"add_special_tokens": true
}'curl --request POST \
--url http://localhost:8000/detokenize \
--header 'Content-Type: application/json' \
--data '{
"model": "test-model",
"tokens": [1, 15496, 11, 1917, 0, 2]
}'# Basic version info
curl --request GET \
--url http://localhost:8000/version
# Detailed version info with build details
curl --request GET \
--url http://localhost:8000/version?details=truecurl --request GET \
--url http://localhost:8000/stats- Mock Data Generation: Generates realistic-looking mock responses with random text and embeddings
- Markov completions: Uses a small QA dataset and a weighted Markov chain to produce more
topic-relevant answers for
/v1/completionsand/v1/chat/completions. Supports custom Hugging Face datasets with configurable column extraction and format handling. - OpenAI API Compatibility: Follows OpenAI API specifications for request/response formats
- Multiple Input Support: Supports single strings, arrays of strings, and token ID arrays
- Configurable Parameters: Supports parameters like
max_tokens,n,logprobs,dimensions, etc. - Normalised Embeddings: Generated embeddings are unit vectors (normalised to length 1)
- Token Usage Tracking: Returns realistic token usage statistics
- Model Management: Lists available models with metadata
- Tokenization Support: Mock tokenization and detokenization with consistent token IDs
- Server Monitoring: Provides version information and real-time server statistics
- vLLM-Compatible Logging: Professional logging system that matches vLLM's output format
- Periodic Statistics: Automatic throughput reporting every 10 seconds (like vLLM)
- Prompt tokens per second
- Generation tokens per second
- Running and total request counts
- Server uptime tracking
- Request Tracing: Debug-level logging of all incoming requests and processing steps
- HTTP Request Logging: When
LOG_LEVEL=DEBUG, logs all HTTP requests with method, path, headers, and body payloads - Configurable Log Levels: DEBUG, INFO, WARNING, ERROR with environment variable control
- Coloured Output: Colour-coded log messages for easy reading (configurable)
- Graceful Shutdown: Proper signal handling and resource cleanup
When WR_LOG_LEVEL=DEBUG is set, White Rabbit provides comprehensive HTTP request logging that
includes:
- Request Method: HTTP method (GET, POST, etc.)
- Request Path: Full URL path
- Request Headers: All HTTP headers with values
- Request Body: Complete request body payload for POST requests
- Structured Format: Clear markers to identify request log boundaries
This is particularly useful for:
- Debugging: Troubleshooting API integration issues
- Development: Understanding exactly what clients are sending
- Testing: Verifying request payloads during development
- Monitoring: Tracking API usage patterns
Example DEBUG level output:
π°:server DEBUG 08-13 17:45:49 [server.ts:386] === HTTP Request Log ===
π°:server DEBUG 08-13 17:45:49 [server.ts:387] Method: POST
π°:server DEBUG 08-13 17:45:49 [server.ts:388] Path: /v1/chat/completions
π°:server DEBUG 08-13 17:45:49 [server.ts:389] Headers: {
"content-type": "application/json",
"user-agent": "curl/7.68.0"
}
π°:server DEBUG 08-13 17:45:49 [server.ts:397] Body: {"model":"test","messages":[{"role":"user","content":"Hello"}]}
π°:server DEBUG 08-13 17:45:49 [server.ts:404] === End Request Log ===
Note: Request body logging is only performed for POST requests. GET requests will log method, path, and headers but not body content.
Any string is accepted for the model argument across all endpoints. However, the actual model name
returned in responses is determined by the WR_MODEL environment variable (or the default
Qwen/Qwen2.5-1.5B-Instruct if not set), regardless of what the client requests.
# Build the Docker image
docker build -t white-rabbit .
# Run the container
docker run -p 8000:8000 white-rabbit
# Run with custom port
docker run -p 9000:8000 white-rabbit
# Run with custom model name
docker run -p 8000:8000 -e WR_MODEL="granite-3.1-8b" white-rabbit
# Run with custom host and port
docker run -p 8080:8080 \
-e WR_HOST="0.0.0.0" \
-e WR_PORT="8080" \
white-rabbit
# Run with direct file dataset
docker run -p 8000:8000 \
-e WR_HF_DATASET="https://huggingface.co/datasets/toxigen/toxigen-data/resolve/main/toxigen.csv" \
-e WR_HF_COLUMN="prompt" \
white-rabbit- Multi-stage build: Uses UBI9 as builder base for security and compliance
- Compiled binary: Compiles Deno application to a single executable binary
- Minimal runtime: Final image uses UBI9 minimal for reduced attack surface
- Non-root user: Runs as dedicated
whiterabbituser for security - Health check: Built-in health check endpoint monitoring
- Optimised layers: Efficient Docker layer caching for faster rebuilds
