|
| 1 | +# Overflow |
| 2 | + |
| 3 | +To convert a quantity in a program to different units, we need to multiply or divide by a conversion |
| 4 | +factor. Sometimes, the result is too big to fit in the type: a problem known as _overflow_. |
| 5 | + |
| 6 | +Units libraries generate these conversion factors automatically when the program is built, and apply |
| 7 | +them invisibly. This amazing convenience comes with a risk: since users don't see the conversion |
| 8 | +factors, it's easy to overlook the multiplication that's taking place under the hood. This is even |
| 9 | +more true in certain "hidden" conversions, where most users don't even realize that a conversion is |
| 10 | +taking place! |
| 11 | + |
| 12 | +## Hidden overflow risks |
| 13 | + |
| 14 | +Consider this comparison: |
| 15 | + |
| 16 | +```cpp |
| 17 | +constexpr bool result = (meters(11) > yards(12)); |
| 18 | +``` |
| 19 | +
|
| 20 | +Even though the quantities have different units, this code compiles and produces a correct result. |
| 21 | +It turns out that `meters(11)` is roughly 0.2% larger than `yards(12)`, so `result` is `true`. But |
| 22 | +how exactly do we compute that result from these starting numeric values of `11` and `12`? |
| 23 | +
|
| 24 | +The key is to understand that _comparison_ is a [_common unit |
| 25 | +operation_](./arithmetic.md#common-unit). Before we can carry it out, we must convert both inputs |
| 26 | +to their [_common unit_](./common_unit.md) --- that is, the largest unit that evenly divides both |
| 27 | +`meters` and `yards`. In this case, the size of that unit is 800 micrometers, giving a conversion |
| 28 | +factor of 1250 for `meters`, and 1143 for `yards`. The library multiplies the underlying values 11 |
| 29 | +and 12 by these respective factors, and then simply compares the results. |
| 30 | +
|
| 31 | +Now that we have a fuller understanding of what's going on under the hood, let's take another look |
| 32 | +at the code. When we see something like `meters(11) > yards(12)`, it's certainly not obvious at |
| 33 | +a glance that this will multiply each underlying value by a factor of over 1,000! Whatever approach |
| 34 | +we take to mitigating overflow risk, it will need to handle these kinds of "hidden" cases as well. |
| 35 | +
|
| 36 | +## Mitigation Strategies |
| 37 | +
|
| 38 | +Over the decades that people have been writing units libraries, several approaches have emerged for |
| 39 | +dealing with this category of risk. That said, there isn't a consensus about the best approach to |
| 40 | +take --- in fact, at the time of writing, new strategies are still being developed and tested! |
| 41 | +
|
| 42 | +It's also worth noting that this problem mainly applies to integral types. Floating point types can |
| 43 | +overflow too, but it happens far less often in practice. Even the smallest, `float`, has a range of |
| 44 | +$10^{38}$, while the diameter of the observable universe measured in atomic diameters is "only" |
| 45 | +about $10^{37}$![^1] |
| 46 | +
|
| 47 | +[^1]: Here, we take the radius of the observable universe as 46.6 billion light years, and the |
| 48 | + diameter of a hydrogen atom as 0.1 nanometers. |
| 49 | +
|
| 50 | +Of course, many domains prefer the simplicity and interpretability of integral types. This avoids |
| 51 | +some of the more counterintuitive aspects of floating point arithmetic --- for example, did you know |
| 52 | +that the difference between consecutive representable `double` values can be greater than |
| 53 | +$10^{292}$? With integers, we can bypass all this complexity, but the price we pay is the need to |
| 54 | +handle overflow. Here are the main strategies we've seen for doing so. |
| 55 | +
|
| 56 | +### Do nothing |
| 57 | +
|
| 58 | +This is the simplest approach, and probably also the most popular: make the users responsible for |
| 59 | +avoiding overflow. The documentation may simply warn them to check their values ahead of time, as |
| 60 | +in this [example from the bernedom/SI |
| 61 | +library](https://github.com/bernedom/SI/blob/main/doc/implementation-details.md#implicit-ratio-conversion--possible-loss-of-precision). |
| 62 | +
|
| 63 | +While this approach is perfectly valid, it does put a lot of responsibility onto the end users, many |
| 64 | +of whom may not realize that they have incurred it. Even for those who do, we've seen above that |
| 65 | +many unit conversions are hard to spot. It's reasonable to assume that this approach leads to the |
| 66 | +highest incidence of overflow bugs. |
| 67 | +
|
| 68 | +### Curate user-facing types |
| 69 | +
|
| 70 | +The [`std::chrono`](https://en.cppreference.com/w/cpp/chrono/duration) library, a time-only units |
| 71 | +library, takes a different approach. It uses intimate knowledge of the domain to craft its |
| 72 | +user-facing types such that they all cover the same (very generous) range of values. Specifically, |
| 73 | +every `std::chrono::duration` type shorter than a day --- everything from `std::chrono::hours`, all |
| 74 | +the way down to `std::chrono::nanoseconds` --- is guaranteed to be able to represent _at least_ ±292 |
| 75 | +years. |
| 76 | +
|
| 77 | +As long as users' durations are within this range --- _and_, as long as they _stick to these primary |
| 78 | +user-facing types_ --- they can be confident that their values won't overflow. |
| 79 | +
|
| 80 | +This approach works very well in practice for the (great many) users who can meet both of these |
| 81 | +conditions. However, it doesn't translate well to a _multi-dimensional_ units library: since there |
| 82 | +are many dimensions, and new ones can be created on the fly, it's infeasible to try to define |
| 83 | +a "practical range" for _all_ of them. Besides, users can still form arbitrary |
| 84 | +`std::chrono::duration` types, and they may not realize the safety they have given up in doing so. |
| 85 | +
|
| 86 | +### Adapt to risk |
| 87 | +
|
| 88 | +Fundamentally, there are two contributions to the level of overflow risk: |
| 89 | +
|
| 90 | +1. The _size of the conversion factor_: **bigger factors** mean **more risk**.[^2] |
| 91 | +
|
| 92 | +2. The _largest representable value in the destination type_: **larger max values** mean **less |
| 93 | + risk**. |
| 94 | +
|
| 95 | +[^2]: Note that we're implicitly assuming that the conversion factor is simply an integer. This is |
| 96 | +always true for the cases discussed in this section, because we're talking about converting quantity |
| 97 | +types with integral rep. If the conversion factor were _not_ an integer, then we would already |
| 98 | +forbid this conversion due to _truncation_, so we wouldn't need to bother considering overflow. |
| 99 | +
|
| 100 | +Therefore, we should be able to create an _adaptive policy_ that takes these factors into account. |
| 101 | +The key concept is the "smallest overflowing value". For every combination of "conversion factor" |
| 102 | +and "type," there is some smallest starting-value that will overflow. The simplest adaptive policy |
| 103 | +is to forbid conversions when that smallest value is "small enough to be scary". |
| 104 | +
|
| 105 | +How small is "scary"? Here are some considerations. |
| 106 | +
|
| 107 | +- Once our values get over 1,000, we can consider switching to a larger SI-prefixed version of the |
| 108 | + unit. (For example, lengths over $1000\,\text{m}$ can be more concisely expressed in |
| 109 | + $\text{km}$.) This means that if a value as small as 1,000 would overflow --- so small that we |
| 110 | + haven't even _reached_ the next unit --- we should _definitely_ forbid the conversion. |
| 111 | +
|
| 112 | +- On the other hand, we've found it useful to initialize, say, `QuantityI32<Hertz>` variables with |
| 113 | + something like `mega(hertz)(500)`. Thus, we'd like this operation to succeed (although it should |
| 114 | + probably be near the border of what's allowed). |
| 115 | +
|
| 116 | +Putting it all together, we settled on [a value threshold of 2'147][threshold]. If we can convert |
| 117 | +this value without overflow, then we permit the operation; otherwise, we don't. We picked this |
| 118 | +value because it satisfies our above criteria nicely. It will prevent operations that can't handle |
| 119 | +values of 1,000, but it still lets us use $\text{MHz}$ freely when storing $\text{Hz}$ quantities in |
| 120 | +`int32_t`. |
| 121 | +
|
| 122 | +#### Plot: the Overflow Safety Surface |
| 123 | +
|
| 124 | +This policy lends itself well to visualization. For each integral type, there is some _highest |
| 125 | +permitted conversion factor_ under this policy. We can plot these factors for each of the common |
| 126 | +integral types (`int8_t`, `uint32_t`, and so on). If we then "connect the dots", we get a boundary |
| 127 | +that separates allowed conversions from forbidden ones, permitting bigger conversions for bigger |
| 128 | +types. We call this abstract boundary the **"overflow safety surface"**, and it's the secret |
| 129 | +ingredient that lets Au users use a wide variety of integral types with confidence. |
| 130 | +
|
| 131 | + |
| 132 | +
|
| 133 | +### Check every conversion at runtime |
| 134 | +
|
| 135 | +While the overflow safety surface is a leap forward in safety and flexibility, it's still only |
| 136 | +a heuristic. There will always be valid conversions which it forbids, and invalid ones which it |
| 137 | +permits. On the latter point, note that adding an intermediate conversion can defeat the safety |
| 138 | +check: the overflow in `meters(10u).as(nano(meters))` would be caught, but the overflow in |
| 139 | +`meters(10u).as(milli(meters)).as(nano(meters))` would not. |
| 140 | +
|
| 141 | +One way to _guarantee_ doing better is to check every conversion at runtime. Some users may recoil |
| 142 | +at the idea of doing _runtime_ work in a units library, but it's easy to show that _this_ use case |
| 143 | +is innocuous. Consider: it's very hard to imagine a valid use case for needing to perform unit |
| 144 | +conversions in a "hot loop". Therefore, the extra runtime cost --- merely a few cycles at most --- |
| 145 | +won't _meaningfully_ affect the performance of the program: it's a bargain price to pay for the |
| 146 | +added safety. |
| 147 | +
|
| 148 | +Of course, in order to check every conversion at runtime, you need to decide what to do when |
| 149 | +a conversion _doesn't_ work. This is hard in general, because there is no "one true error handling |
| 150 | +strategy". Exceptions, C++17's `std::optional`, C++23's `std::expected`, and other strategies each |
| 151 | +have their place. For a library that aims to support a wide variety of projects, it's an impossible |
| 152 | +choice. |
| 153 | +
|
| 154 | +Fortunately, the problem decomposes favorably into two steps. |
| 155 | +
|
| 156 | +1. Figure out **which specific conversions** are lossy. This is the hard part, but Au can do it! |
| 157 | +
|
| 158 | +2. Write a generic **checked conversion function** using the preferred error handling mechanism. |
| 159 | + The owners of a project will have to do this, but this is easy if Au provides the first part. |
| 160 | +
|
| 161 | +Here's a complete worked example of how you would do this in a codebase using C++17's |
| 162 | +`std::optional`. |
| 163 | +
|
| 164 | +```cpp |
| 165 | +template <typename U, typename R, typename TargetUnitSlot> |
| 166 | +constexpr auto try_converting(au::Quantity<U, R> q, TargetUnitSlot target) { |
| 167 | + return is_conversion_lossy(q, target) |
| 168 | + ? std::nullopt |
| 169 | + : std::make_optional(q.coerce_as(target)); |
| 170 | +} |
| 171 | +``` |
| 172 | + |
| 173 | +The goal of `is_conversion_lossy` is to produce an implementation for each individual conversion |
| 174 | +(based on both the numeric type, and the conversion factor) that is as _accurate and efficient_ as |
| 175 | +an expertly hand-written implementation. If it passes those checks, then it's safe and correct to |
| 176 | +call `.coerce_as` instead of simply `.as`: we can override the _approximate_ safety checks of the |
| 177 | +latter because we've performed an _exact_ safety check. |
| 178 | + |
| 179 | +??? note "An example of the kind of details we take care of" |
| 180 | + When we say "expertly hand-written", we mean it. We even handle obscure C++ minutae such as |
| 181 | + [integer promotion]! |
| 182 | + |
| 183 | + Consider the conversion from `yards(int16_t{1250})` to `meters`. Under the hood, this |
| 184 | + conversion first multiplies by `int16_t{1143}`, and then divides by `int16_t{1250}`. The |
| 185 | + multiplication produces 1,428,750 --- but the maximum `int16_t` value is only 32,767. Looks |
| 186 | + like a pretty clear case of overflow. |
| 187 | + |
| 188 | + However, the product of two `int16_t` values is _not_ (usually) an `int16_t` value! On most |
| 189 | + architectures, it gets converted to `int32_t`, due to integer promotion. This intermediate type |
| 190 | + _can_ hold the result of the multiplication. What's more, the subsequent division by |
| 191 | + `int16_t{1250}` brings the final result back into the range of `int16_t`. |
| 192 | + |
| 193 | + Au's implementation of `is_conversion_lossy` will correctly return `false` on architectures |
| 194 | + where this promotion happens, and `true` on architectures where it doesn't. If this sounds like |
| 195 | + the kind of detail you'd rather not worry about, go ahead and use Au's utilities! |
| 196 | + |
| 197 | +At the time of writing, Au is the only units library we know that provides conversion checkers to do |
| 198 | +this heavy lifting. We'd like to see other units libraries try it out as well! Meanwhile, even on |
| 199 | +our end, there's still more work to do --- such as adding "explicit rep" versions of these |
| 200 | +utilities, and supporting `QuantityPoint`. You can track our progress on this feature in issue |
| 201 | +[#110]. |
| 202 | + |
| 203 | +## Summary |
| 204 | + |
| 205 | +The hazard of overflow lurks behind every unit conversion --- even the "hidden" conversions that are |
| 206 | +hard to spot. To maximize safety, we need a strategy to mitigate this risk. Au's novel overflow |
| 207 | +safety surface is a big step forward, adapting to the level of risk actually present in each |
| 208 | +specific conversion. But the most robust solution of all is to make it as easy as possible to check |
| 209 | +every conversion as it happens, and be prepared for it to fail. |
| 210 | + |
| 211 | +[threshold]: https://github.com/aurora-opensource/au/blob/dbd79b2/au/conversion_policy.hh#L27-L28 |
| 212 | +[#110]: https://github.com/aurora-opensource/au/issues/110 |
| 213 | +[integer promotion]: https://en.cppreference.com/w/c/language/conversion#Integer_promotions |
0 commit comments