Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flat error representation #67

Merged
merged 11 commits into from
Sep 2, 2023
Merged

Flat error representation #67

merged 11 commits into from
Sep 2, 2023

Conversation

jprochazk
Copy link
Owner

@jprochazk jprochazk commented Sep 1, 2023

Closes #64

  • Implement the change
  • Update tests
  • Documentation
  • Benchmark (time to validate a somewhat large struct + memory usage)

This PR changes the error type from an recursive enum to a flat list. Each Error has an associated Path.

The Path is implemented as a reverse singly-linked list with reference counted nodes. This ensures that list.clone() is O(1) and does not allocate, and that concatentation of paths is also O(1) and only allocates space for the new node. These are really important performance guarantees.

There is a significant downside, which is that we can only iterate this linked list in reverse. This means that anywhere iteration is needed, such in various Display impls, we first have to push all the items onto a separate dynamically allocated array, and only then can we iterate in the correct direction.
I opted to use smallvec::SmallVec with an inline storage of 8 items instead of a plain Vec, as it's unlikely that the extra stack space requirement will be problematic for anyone, and most paths are sure to be under 8 items in length.

To get rid of even more allocations, I replaced usage of Cow<'static, str> with compact_str::CompactString. This means significantly fewer allocations for any field or list index whose representation takes up less than 24 bytes, which as it turns out, is most fields and list indices! In the entire test suite, there is only one field (byte_length_min1_u8_slice on rules::option::Test) which takes up more than that.

I found that despite the immutability and cheap clone/concatenation, Path::join still made up a large chunk of the time spent running validation, which made everything ~25% slower than on main. This is because Path::join was called eagerly on every nested field, inner::apply, etc.
The solution was to join paths lazily, and cache them. The crate::util::nested_path does this by creating a closure which captures an Option<Path>. The Option<Path> is lazily initialized using a parent path() (which itself is lazy) and a component, such as an index or a key. This means that by default, the validation logic performs zero allocations, instead trading them for branches. This turned out to be a win in the common case where most data is valid, and a slight loss when it isn't, which is exactly the trade-off we want to make here.

Breaking changes

  • garde::error has been completely overhauled.
  • garde::Validate has a new validate_into method, which is the primary entrypoint for custom implementations now. garde::Validate::validate now has a default implementation which is simply a shorthand for initializing a garde::Report, passing it into validate_into, and then returning it.

New features

You can now easily select! errors for specific fields on a given Report:

#[derive(garde::Validate)]
struct Foo<'a> {
  #[garde(dive)]
  bar: Bar<'a>,
}

#[derive(garde::Validate)]
struct Bar<'a> {
  #[garde(length(min = 1))]
  value: &'a str,
}

let v = Foo { bar: Bar { value: "" } };
let report = v.validate(&()).unwrap_err();

println!("errors for `Foo.bar.value`");
for error in garde::error::select!(report, bar) {
  println!("{error}");
}

Performance

Measured on my M2 Macbook Air

benchmark           main        flat-error-repr
validate `valid`    2.9812 µs   2.5029 µs
validate `invalid`  6.2781 µs   6.9518 µs
display `valid`     367.57 ps   367.63 ps
display `invalid`   10.360 µs   4.3824 µs

This is in line with what I expected. Display became significantly cheaper, validate is roughly the same.

@jprochazk jprochazk marked this pull request as ready for review September 2, 2023 09:38
@jprochazk
Copy link
Owner Author

jprochazk commented Sep 2, 2023

@jirutka is the select! macro a good solution for #63? I'm guessing it isn't, but never hurts to ask!

@aumetra could you try using this branch in kitsune to see how much of a pain it is to upgrade?

@aumetra
Copy link
Contributor

aumetra commented Sep 2, 2023

could you try using this branch in kitsune to see how much of a pain it is to upgrade?

Doesn't seem to be that much of a pain to upgrade, especially since I can just throw the native serde support at it to serialize the errors, but the issue is that the chained Rc solution you chose doesn't play too well with work-stealing runtimes such as Tokio.

@jprochazk
Copy link
Owner Author

Updated it to use Arc instead, doesn't seem to have made a difference in performance, but everything is Send again.

@jprochazk jprochazk merged commit 042fbc6 into main Sep 2, 2023
@jirutka
Copy link
Contributor

jirutka commented Sep 2, 2023

@jirutka is the select! macro a good solution for #63? I'm guessing it isn't, but never hurts to ask!

I don’t see how it could be a solution for #63. What I need is to print a list of all validation errors, each with the actual value that failed the validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve error representation
3 participants