Skip to content

Commit ea08978

Browse files
authored
Add discussion of overflow and how to mitigate it (#203)
This finally --- _finally!_ --- gives us an authoritative place to link for explaining the "overflow safety surface". I was going to make that the only topic of the page, but as I wrote I realized that there's a lot more value in discussing the overflow problem more generally! I expect this link will be a useful reference for other units libraries as well. I adapted some contents that were hidden away a couple layers deep in the 103 tutorial, and replaced those contents with a link to the new page.
1 parent 7d8a544 commit ea08978

File tree

6 files changed

+233
-38
lines changed

6 files changed

+233
-38
lines changed

docs/alternatives/index.md

+8-1
Original file line numberDiff line numberDiff line change
@@ -298,7 +298,14 @@ features.
298298
href="https://mpusz.github.io/units/framework/conversions_and_casting.html">consistent
299299
with <code>std::chrono</code> library</a>
300300
</td>
301-
<td class="best">Automatically adapts to level of overflow risk</td>
301+
<td class="best">
302+
Meets `std::chrono` baseline, plus:
303+
<ul>
304+
<li class="check">Automatically adapts to level of overflow risk</li>
305+
<li class="check">Runtime conversion checkers</li>
306+
<li class="check">Constants have perfect conversion policy</li>
307+
</ul>
308+
</td>
302309
</tr>
303310
<tr>
304311
<td>
208 KB
Loading

docs/discussion/concepts/index.md

+5
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,11 @@ and help you use units libraries more effectively.
1818
thing as "unitless"; we support dimensionless units, like `Percent`. Here we explain how the
1919
library handles these situations, and avoids common pitfalls.
2020

21+
- **[Overflow](./overflow.md)**. Unit conversions risk overflow. The degree of risk depends on
22+
both the conversion factor, and the range of values that fit in the destination type. Learn how
23+
different units libraries have approached this problem, including Au's novel contribution, the
24+
"overflow safety surface".
25+
2126
- **[Quantity Point](./quantity_point.md)**. An abstraction for "point types" that have units.
2227
Most use cases don't need this, but for a few --- including temperatures --- it's indispensable.
2328

docs/discussion/concepts/overflow.md

+213
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
# Overflow
2+
3+
To convert a quantity in a program to different units, we need to multiply or divide by a conversion
4+
factor. Sometimes, the result is too big to fit in the type: a problem known as _overflow_.
5+
6+
Units libraries generate these conversion factors automatically when the program is built, and apply
7+
them invisibly. This amazing convenience comes with a risk: since users don't see the conversion
8+
factors, it's easy to overlook the multiplication that's taking place under the hood. This is even
9+
more true in certain "hidden" conversions, where most users don't even realize that a conversion is
10+
taking place!
11+
12+
## Hidden overflow risks
13+
14+
Consider this comparison:
15+
16+
```cpp
17+
constexpr bool result = (meters(11) > yards(12));
18+
```
19+
20+
Even though the quantities have different units, this code compiles and produces a correct result.
21+
It turns out that `meters(11)` is roughly 0.2% larger than `yards(12)`, so `result` is `true`. But
22+
how exactly do we compute that result from these starting numeric values of `11` and `12`?
23+
24+
The key is to understand that _comparison_ is a [_common unit
25+
operation_](./arithmetic.md#common-unit). Before we can carry it out, we must convert both inputs
26+
to their [_common unit_](./common_unit.md) --- that is, the largest unit that evenly divides both
27+
`meters` and `yards`. In this case, the size of that unit is 800 micrometers, giving a conversion
28+
factor of 1250 for `meters`, and 1143 for `yards`. The library multiplies the underlying values 11
29+
and 12 by these respective factors, and then simply compares the results.
30+
31+
Now that we have a fuller understanding of what's going on under the hood, let's take another look
32+
at the code. When we see something like `meters(11) > yards(12)`, it's certainly not obvious at
33+
a glance that this will multiply each underlying value by a factor of over 1,000! Whatever approach
34+
we take to mitigating overflow risk, it will need to handle these kinds of "hidden" cases as well.
35+
36+
## Mitigation Strategies
37+
38+
Over the decades that people have been writing units libraries, several approaches have emerged for
39+
dealing with this category of risk. That said, there isn't a consensus about the best approach to
40+
take --- in fact, at the time of writing, new strategies are still being developed and tested!
41+
42+
It's also worth noting that this problem mainly applies to integral types. Floating point types can
43+
overflow too, but it happens far less often in practice. Even the smallest, `float`, has a range of
44+
$10^{38}$, while the diameter of the observable universe measured in atomic diameters is "only"
45+
about $10^{37}$![^1]
46+
47+
[^1]: Here, we take the radius of the observable universe as 46.6 billion light years, and the
48+
diameter of a hydrogen atom as 0.1 nanometers.
49+
50+
Of course, many domains prefer the simplicity and interpretability of integral types. This avoids
51+
some of the more counterintuitive aspects of floating point arithmetic --- for example, did you know
52+
that the difference between consecutive representable `double` values can be greater than
53+
$10^{292}$? With integers, we can bypass all this complexity, but the price we pay is the need to
54+
handle overflow. Here are the main strategies we've seen for doing so.
55+
56+
### Do nothing
57+
58+
This is the simplest approach, and probably also the most popular: make the users responsible for
59+
avoiding overflow. The documentation may simply warn them to check their values ahead of time, as
60+
in this [example from the bernedom/SI
61+
library](https://github.com/bernedom/SI/blob/main/doc/implementation-details.md#implicit-ratio-conversion--possible-loss-of-precision).
62+
63+
While this approach is perfectly valid, it does put a lot of responsibility onto the end users, many
64+
of whom may not realize that they have incurred it. Even for those who do, we've seen above that
65+
many unit conversions are hard to spot. It's reasonable to assume that this approach leads to the
66+
highest incidence of overflow bugs.
67+
68+
### Curate user-facing types
69+
70+
The [`std::chrono`](https://en.cppreference.com/w/cpp/chrono/duration) library, a time-only units
71+
library, takes a different approach. It uses intimate knowledge of the domain to craft its
72+
user-facing types such that they all cover the same (very generous) range of values. Specifically,
73+
every `std::chrono::duration` type shorter than a day --- everything from `std::chrono::hours`, all
74+
the way down to `std::chrono::nanoseconds` --- is guaranteed to be able to represent _at least_ ±292
75+
years.
76+
77+
As long as users' durations are within this range --- _and_, as long as they _stick to these primary
78+
user-facing types_ --- they can be confident that their values won't overflow.
79+
80+
This approach works very well in practice for the (great many) users who can meet both of these
81+
conditions. However, it doesn't translate well to a _multi-dimensional_ units library: since there
82+
are many dimensions, and new ones can be created on the fly, it's infeasible to try to define
83+
a "practical range" for _all_ of them. Besides, users can still form arbitrary
84+
`std::chrono::duration` types, and they may not realize the safety they have given up in doing so.
85+
86+
### Adapt to risk
87+
88+
Fundamentally, there are two contributions to the level of overflow risk:
89+
90+
1. The _size of the conversion factor_: **bigger factors** mean **more risk**.[^2]
91+
92+
2. The _largest representable value in the destination type_: **larger max values** mean **less
93+
risk**.
94+
95+
[^2]: Note that we're implicitly assuming that the conversion factor is simply an integer. This is
96+
always true for the cases discussed in this section, because we're talking about converting quantity
97+
types with integral rep. If the conversion factor were _not_ an integer, then we would already
98+
forbid this conversion due to _truncation_, so we wouldn't need to bother considering overflow.
99+
100+
Therefore, we should be able to create an _adaptive policy_ that takes these factors into account.
101+
The key concept is the "smallest overflowing value". For every combination of "conversion factor"
102+
and "type," there is some smallest starting-value that will overflow. The simplest adaptive policy
103+
is to forbid conversions when that smallest value is "small enough to be scary".
104+
105+
How small is "scary"? Here are some considerations.
106+
107+
- Once our values get over 1,000, we can consider switching to a larger SI-prefixed version of the
108+
unit. (For example, lengths over $1000\,\text{m}$ can be more concisely expressed in
109+
$\text{km}$.) This means that if a value as small as 1,000 would overflow --- so small that we
110+
haven't even _reached_ the next unit --- we should _definitely_ forbid the conversion.
111+
112+
- On the other hand, we've found it useful to initialize, say, `QuantityI32<Hertz>` variables with
113+
something like `mega(hertz)(500)`. Thus, we'd like this operation to succeed (although it should
114+
probably be near the border of what's allowed).
115+
116+
Putting it all together, we settled on [a value threshold of 2'147][threshold]. If we can convert
117+
this value without overflow, then we permit the operation; otherwise, we don't. We picked this
118+
value because it satisfies our above criteria nicely. It will prevent operations that can't handle
119+
values of 1,000, but it still lets us use $\text{MHz}$ freely when storing $\text{Hz}$ quantities in
120+
`int32_t`.
121+
122+
#### Plot: the Overflow Safety Surface
123+
124+
This policy lends itself well to visualization. For each integral type, there is some _highest
125+
permitted conversion factor_ under this policy. We can plot these factors for each of the common
126+
integral types (`int8_t`, `uint32_t`, and so on). If we then "connect the dots", we get a boundary
127+
that separates allowed conversions from forbidden ones, permitting bigger conversions for bigger
128+
types. We call this abstract boundary the **"overflow safety surface"**, and it's the secret
129+
ingredient that lets Au users use a wide variety of integral types with confidence.
130+
131+
![The overflow safety surface](../../assets/overflow-safety-surface.png)
132+
133+
### Check every conversion at runtime
134+
135+
While the overflow safety surface is a leap forward in safety and flexibility, it's still only
136+
a heuristic. There will always be valid conversions which it forbids, and invalid ones which it
137+
permits. On the latter point, note that adding an intermediate conversion can defeat the safety
138+
check: the overflow in `meters(10u).as(nano(meters))` would be caught, but the overflow in
139+
`meters(10u).as(milli(meters)).as(nano(meters))` would not.
140+
141+
One way to _guarantee_ doing better is to check every conversion at runtime. Some users may recoil
142+
at the idea of doing _runtime_ work in a units library, but it's easy to show that _this_ use case
143+
is innocuous. Consider: it's very hard to imagine a valid use case for needing to perform unit
144+
conversions in a "hot loop". Therefore, the extra runtime cost --- merely a few cycles at most ---
145+
won't _meaningfully_ affect the performance of the program: it's a bargain price to pay for the
146+
added safety.
147+
148+
Of course, in order to check every conversion at runtime, you need to decide what to do when
149+
a conversion _doesn't_ work. This is hard in general, because there is no "one true error handling
150+
strategy". Exceptions, C++17's `std::optional`, C++23's `std::expected`, and other strategies each
151+
have their place. For a library that aims to support a wide variety of projects, it's an impossible
152+
choice.
153+
154+
Fortunately, the problem decomposes favorably into two steps.
155+
156+
1. Figure out **which specific conversions** are lossy. This is the hard part, but Au can do it!
157+
158+
2. Write a generic **checked conversion function** using the preferred error handling mechanism.
159+
The owners of a project will have to do this, but this is easy if Au provides the first part.
160+
161+
Here's a complete worked example of how you would do this in a codebase using C++17's
162+
`std::optional`.
163+
164+
```cpp
165+
template <typename U, typename R, typename TargetUnitSlot>
166+
constexpr auto try_converting(au::Quantity<U, R> q, TargetUnitSlot target) {
167+
return is_conversion_lossy(q, target)
168+
? std::nullopt
169+
: std::make_optional(q.coerce_as(target));
170+
}
171+
```
172+
173+
The goal of `is_conversion_lossy` is to produce an implementation for each individual conversion
174+
(based on both the numeric type, and the conversion factor) that is as _accurate and efficient_ as
175+
an expertly hand-written implementation. If it passes those checks, then it's safe and correct to
176+
call `.coerce_as` instead of simply `.as`: we can override the _approximate_ safety checks of the
177+
latter because we've performed an _exact_ safety check.
178+
179+
??? note "An example of the kind of details we take care of"
180+
When we say "expertly hand-written", we mean it. We even handle obscure C++ minutae such as
181+
[integer promotion]!
182+
183+
Consider the conversion from `yards(int16_t{1250})` to `meters`. Under the hood, this
184+
conversion first multiplies by `int16_t{1143}`, and then divides by `int16_t{1250}`. The
185+
multiplication produces 1,428,750 --- but the maximum `int16_t` value is only 32,767. Looks
186+
like a pretty clear case of overflow.
187+
188+
However, the product of two `int16_t` values is _not_ (usually) an `int16_t` value! On most
189+
architectures, it gets converted to `int32_t`, due to integer promotion. This intermediate type
190+
_can_ hold the result of the multiplication. What's more, the subsequent division by
191+
`int16_t{1250}` brings the final result back into the range of `int16_t`.
192+
193+
Au's implementation of `is_conversion_lossy` will correctly return `false` on architectures
194+
where this promotion happens, and `true` on architectures where it doesn't. If this sounds like
195+
the kind of detail you'd rather not worry about, go ahead and use Au's utilities!
196+
197+
At the time of writing, Au is the only units library we know that provides conversion checkers to do
198+
this heavy lifting. We'd like to see other units libraries try it out as well! Meanwhile, even on
199+
our end, there's still more work to do --- such as adding "explicit rep" versions of these
200+
utilities, and supporting `QuantityPoint`. You can track our progress on this feature in issue
201+
[#110].
202+
203+
## Summary
204+
205+
The hazard of overflow lurks behind every unit conversion --- even the "hidden" conversions that are
206+
hard to spot. To maximize safety, we need a strategy to mitigate this risk. Au's novel overflow
207+
safety surface is a big step forward, adapting to the level of risk actually present in each
208+
specific conversion. But the most robust solution of all is to make it as easy as possible to check
209+
every conversion as it happens, and be prepared for it to fail.
210+
211+
[threshold]: https://github.com/aurora-opensource/au/blob/dbd79b2/au/conversion_policy.hh#L27-L28
212+
[#110]: https://github.com/aurora-opensource/au/issues/110
213+
[integer promotion]: https://en.cppreference.com/w/c/language/conversion#Integer_promotions

docs/reference/constant.md

+5-4
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@ single value it can represent is fully encoded in its type. This makes it an ex
99
a [monovalue type](./detail/monovalue_types.md).
1010

1111
Because the value is always fully known at compile time, we do not need to use a heuristic like the
12-
overflow safety surface to determine which conversions are allowed. Instead, we can achieve
13-
a perfect conversion policy: we allow converting to any `Quantity` that can represent the value
14-
exactly, and disallow all other conversions.
12+
[overflow safety surface](../discussion/concepts/overflow.md) to determine which conversions are
13+
allowed. Instead, we can achieve a perfect conversion policy: we allow converting to any `Quantity`
14+
that can represent the value exactly, and disallow all other conversions.
1515

1616
The main use of `Constant` is to multiply and divide raw numbers or `Quantity` values. When we do
1717
this, the constant is applied _symbolically_, and affects the _units_ of the resulting quantity.
@@ -173,7 +173,8 @@ This provides great flexibility and confidence in passing `Constant` values to A
173173
!!! note
174174
The fact that `Constant` has a perfect conversion policy means that we can use it with APIs
175175
where the corresponding `Quantity` would not work, because `Quantity` is forced to use the
176-
overflow safety surface, which is a more conservative heuristic.
176+
[overflow safety surface](../discussion/concepts/overflow.md), which is a more conservative
177+
heuristic.
177178

178179
For example, suppose you have an API accepting `Quantity<UnitQuotientT<Meters, Seconds>, int>`,
179180
and a constant `c` representing the speed of light.

docs/tutorial/103-unit-conversions.md

+2-33
Original file line numberDiff line numberDiff line change
@@ -170,39 +170,8 @@ types.
170170
```
171171

172172
Since `long long` is at least 64 bits, we could handle values into the tens of billions of
173-
feet before overflowing!
174-
175-
??? info "In more detail: the \"Overflow Safety Surface\""
176-
Here is how to reason about which integral-Rep conversions the library supports.
177-
178-
For every conversion operation, there is _some smallest value which would overflow_.
179-
This depends on both the size of the conversion factor, and the range of values which
180-
the type can hold. If that smallest value is small enough to be "scary", we forbid the
181-
conversion.
182-
183-
How small is "scary"? Here are some considerations.
184-
185-
- Once our values get over 1,000, we can consider switching to a larger SI-prefixed
186-
version of the unit. (For example, lengths over $1000\,\text{m}$ can be
187-
approximated in $\text{km}$.) This means that if a value as small as 1,000 would
188-
overflow --- so small that we haven't even _reached_ the next unit --- we should
189-
_definitely_ forbid the conversion.
190-
191-
- On the other hand, we've found it useful to initialize, say, `QuantityI32<Hertz>`
192-
variables with something like `mega(hertz)(500)`. Thus, we'd like this operation
193-
to succeed (although it should probably be near the border of what's allowed).
194-
195-
Putting it all together, we settled on [a value threshold of 2'147][threshold]. If we
196-
can convert this value without overflow, then we permit the operation; otherwise, we
197-
don't. We picked this value because it satisfies our above criteria nicely. It will
198-
prevent operations that can't handle values of 1,000, but it still lets us use
199-
$\text{MHz}$ freely when storing $\text{Hz}$ quantities in `int32_t`.
200-
201-
We can picture this relationship in terms of the _biggest allowable conversion factor_,
202-
as a function of the _max value of the type_. This function separates the allowed
203-
conversions from the forbidden ones, permitting bigger conversions for bigger types.
204-
We call this abstract boundary the **"overflow safety surface"**, and it's the secret
205-
ingredient that lets us use a wide variety of integral types with confidence.
173+
feet before overflowing! (For more details on the overflow problem, and Au's strategies for
174+
mitigating it, read our [overflow discussion](../discussion/concepts/overflow.md).)
206175

207176
As for the **floating point** value, this is again very safe, so we **allow** it without
208177
complaint.

0 commit comments

Comments
 (0)