|
| 1 | +# Getting Started with BinaryParsing |
| 2 | + |
| 3 | +Get up to speed with a library designed to make parsing binary data safe, efficient, and easy to understand. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The BinaryParsing library provides a comprehensive set of tools for safely parsing binary data in Swift. The library provides the ``ParserSpan`` type, a consumable, memory-safe view into binary data, and defines a convention for writing concise, composable parsing functions. |
| 8 | + |
| 9 | +Using the library's tools — including the span type, parser primitives, and operators for working with newly parsed values — you can prevent common pitfalls like buffer overruns, integer overflows, and type confusion that can lead to security vulnerabilities or crashes. |
| 10 | + |
| 11 | +### A span type for parsing |
| 12 | + |
| 13 | +A ``ParserSpan`` is a view into binary data that tracks your current position and the remaining number of bytes. All the provided parsers consume data from the start of the span, shrinking its size as they produce values. Unlike unsafe pointer operations, `ParserSpan` automatically prevents you from reading past the end of your data. |
| 14 | + |
| 15 | +### Library-provided parsers |
| 16 | + |
| 17 | +The library provides parsers for standard library integers, strings, ranges, and arrays of bytes or custom-parsed types. The convention for these is an initializer with an `inout ParserSpan` parameter, along with any other configuration parameters that are required. These parsers all throw a `ParsingError`, and throw when encoutering memory safety, type safety, or integer overflow errors. |
| 18 | + |
| 19 | +For example, the parsing initializers for `Int` take the parser span as well as storage type or storage size and endianness: |
| 20 | + |
| 21 | +```swift |
| 22 | +let values = try myData.withParserSpan { input in |
| 23 | + let value1 = try Int(parsing: &input, storedAsBigEndian: Int32.self) |
| 24 | + let value2 = try Int(parsing: &input, byteCount: 4, endianness: .big) |
| 25 | +} |
| 26 | +``` |
| 27 | + |
| 28 | +Designing parser APIs as initializers is only a convention. If it feels more natural to write some parsers as free functions, static functions, or even as a parsing type, that's okay! You'll find cases of each of these in the project's [Examples directory][examples]. |
| 29 | + |
| 30 | +## Example: QOI Header |
| 31 | + |
| 32 | +Let's explore BinaryParsing through a real-world example: parsing the header for an image stored in the QOI ([Quite OK Image][qoi]) format. QOI is a simple lossless image format that demonstrates many common patterns in binary parsing. |
| 33 | + |
| 34 | +### The QOI header structure |
| 35 | + |
| 36 | +A QOI file begins with a 14-byte header, as shown in the specification: |
| 37 | + |
| 38 | +```c |
| 39 | +qoi_header { |
| 40 | + char magic[4]; // magic bytes "qoif" |
| 41 | + uint32_t width; // image width in pixels (BE) |
| 42 | + uint32_t height; // image height in pixels (BE) |
| 43 | + uint8_t channels; // 3 = RGB, 4 = RGBA |
| 44 | + uint8_t colorspace; // 0 = sRGB with linear alpha |
| 45 | + // 1 = all channels linear |
| 46 | +}; |
| 47 | +``` |
| 48 | + |
| 49 | +### Parser implementation |
| 50 | + |
| 51 | +Our declaration for the header in Swift corresponds to the specification, with `width` and `height` defined as `Int` and custom enumerations for the channels and colorspace: |
| 52 | + |
| 53 | +```swift |
| 54 | +extension QOI { |
| 55 | + struct Header { |
| 56 | + var width: Int |
| 57 | + var height: Int |
| 58 | + var channels: Channels |
| 59 | + var colorspace: ColorSpace |
| 60 | + } |
| 61 | + |
| 62 | + enum Channels: UInt8 { |
| 63 | + case rgb = 3, rgba = 4 |
| 64 | + } |
| 65 | + |
| 66 | + enum ColorSpace: UInt8 { |
| 67 | + case sRGB = 0, linear = 1 |
| 68 | + } |
| 69 | +} |
| 70 | +``` |
| 71 | + |
| 72 | +The parsing initializer follows the convention set by the library, with an `inout ParserSpan` parameter: |
| 73 | + |
| 74 | +```swift |
| 75 | +extension QOI.Header { |
| 76 | + init(parsing input: inout ParserSpan) throws { |
| 77 | + // Parsing goes here! |
| 78 | + } |
| 79 | +} |
| 80 | +``` |
| 81 | + |
| 82 | +Next, we'll walk through the implementation of that initializer, line by line, to look at the safety and ease of use in the BinaryParsing library APIs. |
| 83 | + |
| 84 | +#### Magic number validation |
| 85 | + |
| 86 | +The first value in the binary data is a "magic number" – a common practice in binary formats that acts as a quick check that you're reading the right kind of file and working with the correct endianness. The code uses a `UInt32` initialzer to load a 32-bit big-endian value, and then checks it for correctness using `guard`: |
| 87 | + |
| 88 | +```swift |
| 89 | +let magic = try UInt32(parsingBigEndian: &input) |
| 90 | +guard magic == 0x71_6f_69_66 else { |
| 91 | + throw QOIError() |
| 92 | +} |
| 93 | +``` |
| 94 | + |
| 95 | +#### Parsing dimensions |
| 96 | + |
| 97 | +Next, the width and height are also stored as 32-bit values, but we want to use them in our type as `Int` values. Instead of parsing `UInt32` values and _then_ converting them to `Int`, we'll use an `Int` parser that specifies the storage type, handling any possible overflow: |
| 98 | + |
| 99 | +```swift |
| 100 | +self.width = try Int(parsing: &input, storedAsBigEndian: UInt32.self) |
| 101 | +self.height = try Int(parsing: &input, storedAsBigEndian: UInt32.self) |
| 102 | +``` |
| 103 | + |
| 104 | +### Parsing `RawRepresentable` types |
| 105 | + |
| 106 | +Because the `Channels` and `ColorSpace` enumerations are backed by a `FixedWidthInteger` type, the library provides parsers that load and validate the parsed values. These parsers throw an error if the parsed value isn't one of the type's declared cases: |
| 107 | + |
| 108 | +```swift |
| 109 | +self.channels = try Channels(parsing: &input) |
| 110 | +self.colorspace = try ColorSpace(parsing: &input) |
| 111 | +``` |
| 112 | + |
| 113 | +### Safe arithmetic |
| 114 | + |
| 115 | +After parsing all of the header's values, the last step is to perform some validation. Using the library's optional multiplication operator (`*?`) allows for concise arithmetic while preventing integer overflow errors: |
| 116 | + |
| 117 | +```swift |
| 118 | +guard let pixelCount = width *? height, |
| 119 | + pixelCount <= maxPixelCount, |
| 120 | + width > 0, height > 0 |
| 121 | +else { throw QOIError() } |
| 122 | +``` |
| 123 | + |
| 124 | +### Bringing it together |
| 125 | + |
| 126 | +The full parser implementation, as shown below, protects against buffer overruns, integer overflow, arithmetic overflow, type invalidity, and pointer lifetime errors: |
| 127 | + |
| 128 | +```swift |
| 129 | +extension QOI.Header { |
| 130 | + init(parsing input: inout ParserSpan) throws { |
| 131 | + let magic = try UInt32(parsingBigEndian: &input) |
| 132 | + guard magic == 0x71_6f_69_66 else { |
| 133 | + throw QOIError() |
| 134 | + } |
| 135 | + |
| 136 | + self.width = try Int(parsing: &input, storedAsBigEndian: UInt32.self) |
| 137 | + self.height = try Int(parsing: &input, storedAsBigEndian: UInt32.self) |
| 138 | + self.channels = try Channels(parsing: &input) |
| 139 | + self.colorspace = try ColorSpace(parsing: &input) |
| 140 | + |
| 141 | + guard let pixelCount = width *? height, |
| 142 | + pixelCount <= maxPixelCount, |
| 143 | + width > 0, height > 0 |
| 144 | + else { throw QOIError() } |
| 145 | + } |
| 146 | +} |
| 147 | +``` |
| 148 | + |
| 149 | +[qoi]: https://qoiformat.org/ |
| 150 | +[examples]: https://github.com/apple/swift-binary-parsing/tree/main/Examples |
0 commit comments