Flat error representation #67

jprochazk · 2023-09-01T22:03:54Z

Closes #64

Implement the change
Update tests
Documentation
Benchmark (time to validate a somewhat large struct + memory usage)

This PR changes the error type from an recursive enum to a flat list. Each Error has an associated Path.

The Path is implemented as a reverse singly-linked list with reference counted nodes. This ensures that list.clone() is O(1) and does not allocate, and that concatentation of paths is also O(1) and only allocates space for the new node. These are really important performance guarantees.

There is a significant downside, which is that we can only iterate this linked list in reverse. This means that anywhere iteration is needed, such in various Display impls, we first have to push all the items onto a separate dynamically allocated array, and only then can we iterate in the correct direction.
I opted to use smallvec::SmallVec with an inline storage of 8 items instead of a plain Vec, as it's unlikely that the extra stack space requirement will be problematic for anyone, and most paths are sure to be under 8 items in length.

To get rid of even more allocations, I replaced usage of Cow<'static, str> with compact_str::CompactString. This means significantly fewer allocations for any field or list index whose representation takes up less than 24 bytes, which as it turns out, is most fields and list indices! In the entire test suite, there is only one field (byte_length_min1_u8_slice on rules::option::Test) which takes up more than that.

I found that despite the immutability and cheap clone/concatenation, Path::join still made up a large chunk of the time spent running validation, which made everything ~25% slower than on main. This is because Path::join was called eagerly on every nested field, inner::apply, etc.
The solution was to join paths lazily, and cache them. The crate::util::nested_path does this by creating a closure which captures an Option<Path>. The Option<Path> is lazily initialized using a parent path() (which itself is lazy) and a component, such as an index or a key. This means that by default, the validation logic performs zero allocations, instead trading them for branches. This turned out to be a win in the common case where most data is valid, and a slight loss when it isn't, which is exactly the trade-off we want to make here.

Breaking changes

garde::error has been completely overhauled.
garde::Validate has a new validate_into method, which is the primary entrypoint for custom implementations now. garde::Validate::validate now has a default implementation which is simply a shorthand for initializing a garde::Report, passing it into validate_into, and then returning it.

New features

You can now easily select! errors for specific fields on a given Report:

#[derive(garde::Validate)]
struct Foo<'a> {
  #[garde(dive)]
  bar: Bar<'a>,
}

#[derive(garde::Validate)]
struct Bar<'a> {
  #[garde(length(min = 1))]
  value: &'a str,
}

let v = Foo { bar: Bar { value: "" } };
let report = v.validate(&()).unwrap_err();

println!("errors for `Foo.bar.value`");
for error in garde::error::select!(report, bar) {
  println!("{error}");
}

Performance

Measured on my M2 Macbook Air

benchmark           main        flat-error-repr
validate `valid`    2.9812 µs   2.5029 µs
validate `invalid`  6.2781 µs   6.9518 µs
display `valid`     367.57 ps   367.63 ps
display `invalid`   10.360 µs   4.3824 µs

This is in line with what I expected. Display became significantly cheaper, validate is roughly the same.

jprochazk · 2023-09-02T09:51:58Z

@jirutka is the select! macro a good solution for #63? I'm guessing it isn't, but never hurts to ask!

@aumetra could you try using this branch in kitsune to see how much of a pain it is to upgrade?

aumetra · 2023-09-02T15:34:53Z

could you try using this branch in kitsune to see how much of a pain it is to upgrade?

Doesn't seem to be that much of a pain to upgrade, especially since I can just throw the native serde support at it to serialize the errors, but the issue is that the chained Rc solution you chose doesn't play too well with work-stealing runtimes such as Tokio.

jprochazk · 2023-09-02T15:58:18Z

Updated it to use Arc instead, doesn't seem to have made a difference in performance, but everything is Send again.

jirutka · 2023-09-02T18:22:59Z

@jirutka is the select! macro a good solution for #63? I'm guessing it isn't, but never hurts to ask!

I don’t see how it could be a solution for #63. What I need is to print a list of all validation errors, each with the actual value that failed the validation.

jprochazk added 8 commits September 1, 2023 23:27

switch to flat error representation

036b0e9

update snapshots

62e82a0

remove dead code

3f7c325

update docs + impl Serialize for Report

a989bcb

add benchmarks

a8e4ae3

split benchmark

4d2dc18

downgrade criterion

68c9141

implement lazy path joining

6971755

jprochazk marked this pull request as ready for review September 2, 2023 09:38

update docs

b8a7215

jprochazk added 2 commits September 2, 2023 17:36

make rc_list send

75b757a

assert Report is Send

ceed749

jprochazk merged commit 042fbc6 into main Sep 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flat error representation #67

Flat error representation #67

jprochazk commented Sep 1, 2023 •

edited

Loading

jprochazk commented Sep 2, 2023 •

edited

Loading

aumetra commented Sep 2, 2023

jprochazk commented Sep 2, 2023

jirutka commented Sep 2, 2023

Flat error representation #67

Flat error representation #67

Conversation

jprochazk commented Sep 1, 2023 • edited Loading

Breaking changes

New features

Performance

jprochazk commented Sep 2, 2023 • edited Loading

aumetra commented Sep 2, 2023

jprochazk commented Sep 2, 2023

jirutka commented Sep 2, 2023

jprochazk commented Sep 1, 2023 •

edited

Loading

jprochazk commented Sep 2, 2023 •

edited

Loading