| title | Introduction |
|---|---|
| description | M2M Protocol overview, goals, and scope |
This document defines the M2M (Machine-to-Machine) Protocol, a token-optimized compression scheme for Large Language Model (LLM) API traffic. Unlike traditional compression algorithms that reduce bytes but increase token count due to Base64 encoding, M2M Protocol achieves 25-40% token reduction through semantic key abbreviation, value substitution, and default parameter elimination.
This specification defines the wire format, compression mappings, session negotiation, and security considerations for M2M Protocol version 1.0.
LLM APIs charge based on token count, not bytes. Traditional compression algorithms (gzip, brotli, zstd) reduce byte size but produce binary output requiring Base64 encoding, which typically increases token count by 33%.
Example with gzip:
Original: 68 bytes → 42 tokens
Gzip+Base64: 52 bytes → 58 tokens (+38% tokens)
M2M Protocol applies semantic compression that preserves JSON structure while reducing both bytes and tokens:
Original: 68 bytes → 42 tokens
M2M Token: 45 bytes → 29 tokens (-31% tokens)
Key techniques:
- Key abbreviation:
"messages"→"m" - Value substitution:
"assistant"→"a" - Model abbreviation:
"gpt-4o"→"4o" - Default elimination: Remove
"temperature": 1.0(default)
M2M Protocol operates in two modes:
Direct compression/decompression without session establishment:
Client Server
| |
|--- Compressed Request ------->|
|<-- Compressed Response -------|
Full protocol with capability negotiation:
Client Server
| |
|-------- HELLO --------------->|
|<------- ACCEPT ---------------|
| |
|======= DATA (compressed) ====>|
|<====== DATA (compressed) =====|
| |
|-------- CLOSE --------------->|
- Token Reduction: Optimize for LLM tokenizer output, not just bytes
- JSON Compatibility: Compressed output is valid JSON
- Low Latency: Sub-millisecond compression overhead
- Backward Compatibility: Graceful fallback to uncompressed
- Extensibility: Support for new compression algorithms
- Security: Optional threat detection for prompt injection
- General-purpose compression: Optimized specifically for LLM API payloads
- Encryption: Transport security (TLS) is assumed
- Authentication: Delegated to transport layer
- Binary protocols: JSON-based wire format only
| Protocol | Relationship |
|---|---|
| HTTP/1.1, HTTP/2, HTTP/3 | M2M operates over HTTP as transport |
| QUIC (RFC 9000) | Preferred transport for agent-to-agent communication |
| TLS 1.2+ / TLS 1.3 | Required for transport security (built-in with QUIC) |
| JSON (RFC 8259) | Wire format is JSON-compatible |
| OpenAI API | Primary target for compression |
| SSE | Streaming responses supported |
| Section | Contents |
|---|---|
| 01-terminology | Definitions and RFC 2119 keywords |
| 02-wire-format | Message structure and encoding |
| 03-message-types | HELLO, ACCEPT, DATA, etc. |
| 04-compression | Algorithms and mappings |
| 05-session-management | State machine and lifecycle |
| 06-security | Threat model and mitigations |