Skip to content

Commit b64cc46

Browse files
committed
Doc: add seriliazation format discription
1 parent 5dd874c commit b64cc46

File tree

1 file changed

+311
-0
lines changed

1 file changed

+311
-0
lines changed

docs/serialization-format.md

Lines changed: 311 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,311 @@
1+
# Neo Serialization Format
2+
3+
This document describes the binary serialization format used by the Neo blockchain platform. The format is designed for efficient serialization and deserialization of blockchain data structures.
4+
5+
## Overview
6+
7+
Neo uses a custom binary serialization format that supports:
8+
- Primitive data types (integers, booleans, bytes)
9+
- Variable-length integers (VarInt)
10+
- Strings (fixed and variable length)
11+
- Arrays and collections
12+
- Custom serializable objects
13+
- Nullable objects
14+
15+
## Core Interfaces
16+
17+
### ISerializable
18+
19+
All serializable objects in Neo implement the `ISerializable` interface:
20+
21+
```csharp
22+
public interface ISerializable
23+
{
24+
int Size { get; }
25+
26+
void Serialize(BinaryWriter writer);
27+
28+
void Deserialize(ref MemoryReader reader);
29+
}
30+
```
31+
32+
- `Size`: Returns the serialized size in bytes
33+
- `Serialize`: Writes the object to a BinaryWriter
34+
- `Deserialize`: Reads the object from a MemoryReader
35+
36+
## Primitive Data Types
37+
38+
### Integers
39+
40+
Neo supports both little-endian and big-endian integer formats:
41+
42+
| Type | Size | Endianness | Description |
43+
|------|------|------------|-------------|
44+
| `sbyte` | 1 byte | N/A | Signed 8-bit integer |
45+
| `byte` | 1 byte | N/A | Unsigned 8-bit integer |
46+
| `short` | 2 bytes | Little-endian | Signed 16-bit integer |
47+
| `ushort` | 2 bytes | Little-endian | Unsigned 16-bit integer |
48+
| `int` | 4 bytes | Little-endian | Signed 32-bit integer |
49+
| `uint` | 4 bytes | Little-endian | Unsigned 32-bit integer |
50+
| `long` | 8 bytes | Little-endian | Signed 64-bit integer |
51+
| `ulong` | 8 bytes | Little-endian | Unsigned 64-bit integer |
52+
53+
Big-endian variants are available for `short`, `ushort`, `int`, `uint`, `long`, and `ulong`.
54+
55+
### Boolean
56+
57+
Booleans are serialized as single bytes:
58+
- `false``0x00`
59+
- `true``0x01`
60+
- Any other value throws `FormatException`
61+
62+
### Variable-Length Integers (VarInt)
63+
64+
Neo uses a compact variable-length integer format:
65+
66+
| Value Range | Format | Size |
67+
|-------------|--------|------|
68+
| 0-252 | Direct value | 1 byte |
69+
| 253-65535 | `0xFD` + 2-byte little-endian | 3 bytes |
70+
| 65536-4294967295 | `0xFE` + 4-byte little-endian | 5 bytes |
71+
| 4294967296+ | `0xFF` + 8-byte little-endian | 9 bytes |
72+
73+
**Serialization:**
74+
```csharp
75+
if (value < 0xFD)
76+
{
77+
writer.Write((byte)value);
78+
}
79+
else if (value <= 0xFFFF)
80+
{
81+
writer.Write((byte)0xFD);
82+
writer.Write((ushort)value);
83+
}
84+
else if (value <= 0xFFFFFFFF)
85+
{
86+
writer.Write((byte)0xFE);
87+
writer.Write((uint)value);
88+
}
89+
else
90+
{
91+
writer.Write((byte)0xFF);
92+
writer.Write(value);
93+
}
94+
```
95+
96+
**Deserialization:**
97+
```csharp
98+
var b = ReadByte();
99+
var value = b switch
100+
{
101+
0xfd => ReadUInt16(),
102+
0xfe => ReadUInt32(),
103+
0xff => ReadUInt64(),
104+
_ => b
105+
};
106+
```
107+
108+
## Strings
109+
110+
### Fixed-Length Strings
111+
112+
Fixed-length strings are padded with null bytes:
113+
114+
**Format:** `[UTF-8 bytes][zero padding]`
115+
116+
**Serialization:**
117+
```csharp
118+
var bytes = value.ToStrictUtf8Bytes();
119+
if (bytes.Length > length)
120+
throw new ArgumentException();
121+
writer.Write(bytes);
122+
if (bytes.Length < length)
123+
writer.Write(new byte[length - bytes.Length]);
124+
```
125+
126+
**Deserialization:**
127+
```csharp
128+
var end = currentOffset + length;
129+
var offset = currentOffset;
130+
while (offset < end && _span[offset] != 0) offset++;
131+
var data = _span[currentOffset..offset];
132+
for (; offset < end; offset++)
133+
if (_span[offset] != 0)
134+
throw new FormatException();
135+
currentOffset = end;
136+
return data.ToStrictUtf8String();
137+
```
138+
139+
### Variable-Length Strings
140+
141+
Variable-length strings use VarInt for length prefix:
142+
143+
**Format:** `[VarInt length][UTF-8 bytes]`
144+
145+
**Serialization:**
146+
```csharp
147+
writer.WriteVarInt(value.Length);
148+
writer.Write(value.ToStrictUtf8Bytes());
149+
```
150+
151+
**Deserialization:**
152+
```csharp
153+
var length = (int)ReadVarInt((ulong)max);
154+
EnsurePosition(length);
155+
var data = _span.Slice(currentOffset, length);
156+
currentOffset += length;
157+
return data.ToStrictUtf8String();
158+
```
159+
160+
## Byte Arrays
161+
162+
### Fixed-Length Byte Arrays
163+
164+
**Format:** `[raw bytes]`
165+
166+
### Variable-Length Byte Arrays
167+
168+
**Format:** `[VarInt length][raw bytes]`
169+
170+
**Serialization:**
171+
```csharp
172+
writer.WriteVarInt(value.Length);
173+
writer.Write(value);
174+
```
175+
176+
**Deserialization:**
177+
```csharp
178+
return ReadMemory((int)ReadVarInt((ulong)max));
179+
```
180+
181+
## Collections
182+
183+
### Serializable Arrays
184+
185+
**Format:** `[VarInt count][item1][item2]...[itemN]`
186+
187+
**Serialization:**
188+
```csharp
189+
writer.WriteVarInt(value.Count);
190+
foreach (T item in value)
191+
{
192+
item.Serialize(writer);
193+
}
194+
```
195+
196+
**Deserialization:**
197+
```csharp
198+
var array = new T[reader.ReadVarInt((ulong)max)];
199+
for (var i = 0; i < array.Length; i++)
200+
{
201+
array[i] = new T();
202+
array[i].Deserialize(ref reader);
203+
}
204+
return array;
205+
```
206+
207+
### Nullable Arrays
208+
209+
**Format:** `[VarInt count][bool1][item1?][bool2][item2?]...[boolN][itemN?]`
210+
211+
**Serialization:**
212+
```csharp
213+
writer.WriteVarInt(value.Length);
214+
foreach (var item in value)
215+
{
216+
var isNull = item is null;
217+
writer.Write(!isNull);
218+
if (isNull) continue;
219+
item!.Serialize(writer);
220+
}
221+
```
222+
223+
**Deserialization:**
224+
```csharp
225+
var array = new T[reader.ReadVarInt((ulong)max)];
226+
for (var i = 0; i < array.Length; i++)
227+
array[i] = reader.ReadBoolean() ? reader.ReadSerializable<T>() : null;
228+
return array;
229+
```
230+
231+
## UTF-8 Encoding
232+
233+
Neo uses strict UTF-8 encoding with the following characteristics:
234+
235+
- **Strict Mode**: Invalid UTF-8 sequences throw exceptions
236+
- **No Fallback**: No replacement characters for invalid sequences
237+
- **Exception Handling**: Detailed error messages for debugging
238+
239+
**String to Bytes:**
240+
```csharp
241+
public static byte[] ToStrictUtf8Bytes(this string value)
242+
{
243+
return StrictUTF8.GetBytes(value);
244+
}
245+
```
246+
247+
**Bytes to String:**
248+
```csharp
249+
public static string ToStrictUtf8String(this ReadOnlySpan<byte> value)
250+
{
251+
return StrictUTF8.GetString(value);
252+
}
253+
```
254+
255+
## Error Handling
256+
257+
The serialization format includes comprehensive error handling:
258+
259+
- **FormatException**: Invalid data format or corrupted data
260+
- **ArgumentNullException**: Null values where not allowed
261+
- **ArgumentException**: Invalid arguments (e.g., string too long)
262+
- **ArgumentOutOfRangeException**: Values outside allowed ranges
263+
- **DecoderFallbackException**: Invalid UTF-8 sequences
264+
- **EncoderFallbackException**: Characters that cannot be encoded
265+
266+
## Examples
267+
268+
### Simple Object Serialization
269+
270+
```csharp
271+
public class SimpleData : ISerializable
272+
{
273+
public string Name { get; set; }
274+
public int Value { get; set; }
275+
276+
public int Size => Name.GetStrictUtf8ByteCount() + sizeof(int);
277+
278+
public void Serialize(BinaryWriter writer)
279+
{
280+
writer.WriteVarString(Name);
281+
writer.Write(Value);
282+
}
283+
284+
public void Deserialize(ref MemoryReader reader)
285+
{
286+
Name = reader.ReadVarString();
287+
Value = reader.ReadInt32();
288+
}
289+
}
290+
```
291+
292+
### Array Serialization
293+
294+
```csharp
295+
public class DataArray : ISerializable
296+
{
297+
public SimpleData[] Items { get; set; }
298+
299+
public int Size => Items.Sum(item => item.Size) + GetVarSize(Items.Length);
300+
301+
public void Serialize(BinaryWriter writer)
302+
{
303+
writer.Write(Items);
304+
}
305+
306+
public void Deserialize(ref MemoryReader reader)
307+
{
308+
Items = reader.ReadSerializableArray<SimpleData>();
309+
}
310+
}
311+
```

0 commit comments

Comments
 (0)