Skip to content

Conversation

@bal-e
Copy link
Contributor

@bal-e bal-e commented Mar 3, 2025

This is a work-in-progress PR for overhauling domain::zonefile with a brand-new, out-of-place zonefile parser. It integrates a decent number of unit tests, but is currently lacking many record types (because new-base doesn't have them yet).

  • Add an example program which uses new_zonefile for parsing.
  • Test the implementation against large sample zonefiles.
  • Benchmark this against the old implementation.
  • Add support for more record data types (in this PR or new-base).

The new zonefile parser has a few basic differences from the existing implementation:

  • Records borrow from the parser object to avoid unnecessary allocations.
  • The actual parsing logic operates on strings instead of symbols.
  • Entry-parsing code is separated from per-entry parsing code.

The new zonefile parser has been designed to support parallelized zonefile parsing in the future. I am leaving this to a later PR.

@bal-e bal-e changed the base branch from main to new-base March 3, 2025 13:01
@bal-e bal-e requested a review from partim March 3, 2025 14:09
Copy link
Member

@partim partim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all great, no real notes.

You might want to try your hand on the DNSSEC record types, though, to see how smooth scanning more complicated tokens (like Base64) is.


/// An error in scanning a zonefile record.
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum RecordError {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to leave the error types as (open) enums or rather wrap them away into an opaque type? I don’t think a user would want to match on the variants here?

@bal-e
Copy link
Contributor Author

bal-e commented Mar 7, 2025

Migrated some DNSSEC related commits to new-base, and had to force-push to resolve some mistakes during the cherry-pick.

@ximon18
Copy link
Member

ximon18 commented Apr 7, 2025

While testing this PR I noticed that on parsing failure the reported error gives the line number of the problematic record but says things like "could not parse the owner name: irregular character in domain name" without saying which character was considered irregular, or "could not parse the record data: unrecognized record type" without saying which record type.

I appreciate that the reason for this is to be able to use &'static str as the error message type, but am generally in favour of giving as much context on error as possible.

For the irregular character case one cannot know which character is the problem without a trial and error / divide and conquer approach to change the record and reload, change the record and reload, which could be very frustrating.

For the unrecognized record type the line number should be enough to make this clear, but only if parsing a file as input. If the input were XFR and not written out prior to parsing the reader of the error would have no idea which record caused the problem.

break;
}

if !c.is_ascii_alphanumeric() && !b"\\-_\"".contains(&c) {
Copy link
Member

@ximon18 ximon18 Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails for a record containing a wildcard such as:

*.w.example.    3600    IN      MX      1 ai.example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I forgot to account for wildcards. Will fix.

@bal-e
Copy link
Contributor Author

bal-e commented Apr 7, 2025

@ximon18: I agree that the user experience here isn't the best. But the parsing system has a lot of inherent limitations with regards to error reporting. It can only report one error at a time, so a zonefile with multiple errors would still require many invocations to get working. It also provides very limited context: most parsing functions can't tell which part of a record they're being used in (e.g. RType can be parsed for the type of the record, but also for an NSEC type bitmap). Providing useful, contextual error messages is going to be hard.

Instead of making the parsing system more complicated, I think users are better served if we can show them the specification the parser is following. I've documented the zonefile spec (that I'm using) in this PR; perhaps we can print it or link to it from user-facing programs that perform zonefile parsing. For example, dnst read-zone could have a --spec option which prints out the specification.

With regards to the correctness of the specification (i.e. that the actual implementation follows the documented specification correctly): perhaps we can use fuzzing here. It seems that grammar-based fuzzers do exist, and we could set that up. I'd like to do that in a separate PR, though -- I suspect it'll get pretty complicated.

@ximon18
Copy link
Member

ximon18 commented Apr 7, 2025

@ximon18: I agree that the user experience here isn't the best. But the parsing system has a lot of inherent limitations with regards to error reporting. It can only report one error at a time, so a zonefile with multiple errors would still require many invocations to get working. It also provides very limited context: most parsing functions can't tell which part of a record they're being used in (e.g. RType can be parsed for the type of the record, but also for an NSEC type bitmap). Providing useful, contextual error messages is going to be hard.

Instead of making the parsing system more complicated, I think users are better served if we can show them the specification the parser is following. I've documented the zonefile spec (that I'm using) in this PR; perhaps we can print it or link to it from user-facing programs that perform zonefile parsing. For example, dnst read-zone could have a --spec option which prints out the specification.

With regards to the correctness of the specification (i.e. that the actual implementation follows the documented specification correctly): perhaps we can use fuzzing here. It seems that grammar-based fuzzers do exist, and we could set that up. I'd like to do that in a separate PR, though -- I suspect it'll get pretty complicated.

I'm concerned that referring operators to a spec document somewhere when there's an error due to something in the received zonefile bytes asking them to somehow work it out is going to be usable. In the previous parser implementation I added the missing context by passing out a String instead of &'static str, as the parser knew which value it just failed to parse. Is that not possible here?

@bal-e
Copy link
Contributor Author

bal-e commented Apr 7, 2025

I think I misunderstood the use case you're thinking about. I was considering this API from the perspective of a user manually executing a program that is loading a zonefile (which is the perspective of us as developers trying to test out the zonefile functionality). What information would we need to provide DNS operators (who are presumably working with a large number of zones, and have an automated system that is loading zonefiles) when an error occurs? What would their immediate response to such a failure be?

To answer the concrete question: it is possible to return a String for the error message here. It's just that the parser has very little context about where it's getting called from (e.g. whether the parser is located before or within the RDATA field).

@ximon18
Copy link
Member

ximon18 commented Apr 7, 2025

I think I misunderstood the use case you're thinking about. I was considering this API from the perspective of a user manually executing a program that is loading a zonefile (which is the perspective of us as developers trying to test out the zonefile functionality). What information would we need to provide DNS operators (who are presumably working with a large number of zones, and have an automated system that is loading zonefiles) when an error occurs? What would their immediate response to such a failure be?

To answer the concrete question: it is possible to return a String for the error message here. It's just that the parser has very little context about where it's getting called from (e.g. whether the parser is located before or within the RDATA field).

The use case is that a running server attempts to read a zone file from disk or records from an incoming XFR and for some reason is unable to parse the zone. The operator then needs to be able to know which record could not be parsed, which may require more than a line number and a generic error message not specific to the record in question. Am I missing something here that makes this wish seem strange?

@Philip-NLnetLabs
Copy link
Member

I think it makes sense. A parser should give at least the string value that could not be parsed, but ideally also the context surrounding that string.

@bal-e
Copy link
Contributor Author

bal-e commented Apr 7, 2025

My concern is, DNS operators are probably not writing zonefiles by hand; if they're generating it from an automated system, and the output is malformed (at least according to this zonefile parser implementation), they can't just modify the zonefile by hand in response to the error message (although perhaps that is helpful for fixing the immediate issue). They will need to update their automated zonefile generation to avoid making the same mistake in the future. From that perspective, I'd imagine that having a formal specification of the acceptable grammar is far more useful than a thorough description of a single parse error.

@ximon18
Copy link
Member

ximon18 commented Apr 7, 2025

My concern is, DNS operators are probably not writing zonefiles by hand; if they're generating it from an automated system, and the output is malformed (at least according to this zonefile parser implementation), they can't just modify the zonefile by hand in response to the error message (although perhaps that is helpful for fixing the immediate issue). They will need to update their automated zonefile generation to avoid making the same mistake in the future. From that perspective, I'd imagine that having a formal specification of the acceptable grammar is far more useful than a thorough description of a single parse error.

Indeed I don't expect operators to write zone files by hand. They will need to report the error to someone to get it solved in our software. We will then ask them "which record caused the error" and they will not know and may not wish to give out their raw data if they even have it.

@bal-e
Copy link
Contributor Author

bal-e commented Apr 7, 2025

We will then ask them "which record caused the error" and they will not know [...]

The parse error still reports the line number, so it's possible to find the failing record. Maybe this is harder if the zonefile is ephemeral, but I'd still expect it to be found on disk.

and may not wish to give out their raw data if they even have it.

Suppose an operator does encounter a zonefile parsing issue, and they are unable to share the original record (or the specific component of it that appeared to fail) because it is sensitive data. If we were to include that information in the error message, doesn't that just make the error message sensitive data too?

Still, I understand that having a bit more context in parse errors would be helpful. I'll update ParseError to return heap-allocated strings, and then use format!() to provide some additional information. I'll also fix the specific example that motivated this discussion, but there are other places which would benefit from this (e.g. reporting parse failures for IP addresses, which delegate to functions in std).

For providing additional context: calculating the character offset of a parse failure is surprisingly difficult because of parentheses. Parentheses are handled outside the Scanner type (in Entries) and are stripped by it; this simplifies the parsing logic in Scanner, but makes it impossible to determine character offsets in the original input. Changing this will not be pleasant, but it can be done if it would seriously benefit end users.


impl<R: io::BufRead> ZonefileScanner<R> {
/// Scan the next entry and return a reference to it.
pub fn scan(&mut self) -> Result<Option<Entry<'_>>, ZonefileError> {
Copy link
Member

@ximon18 ximon18 Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work for me unless I move the alloc: bumpalo::Bump field to be a alloc: &bumpalo::Bump argument passed in to this function. I think it's because the &mut self causes the passed &self.alloc to retain a reference to &mut self via the returned Entry causing error scanner was mutably borrowed here in the previous iteration of the loop. It's fine if I don't use Entry, but once I try to store the record referenced by the returned Entry in the loop over scanner.scan() I hit that lifetime error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an intentional part of the design -- you can only access one Entry at a time, so that each Entry can use the same block of memory, and thus the total memory overhead is O(1). If you need to retain those entries, you would write them into a zone tree or similar structure (which I am yet to define), or just clone them into an external bump allocator. I need to make the latter case more convenient right now, but it can be done manually using RecordData::clone_to_bump() etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a simple zone I am storing the references into, just a BTreeMap. I can't write them into it because of the problem above, and I thought the idea was zero copy so I didn't try cloning them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, how do I ensure that the bump arena lives long enough, I have to make sure the scanner is not dropped (as it owns the bump arena), or clone entries...

Copy link
Member

@ximon18 ximon18 Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll have to show me how to do this, after several different attempts also using an external bump allocator with clone_to_bump() I'm not getting anywhere, I still have the scanner was mutably borrowed here in the previous iteration of the loop error.

Base automatically changed from new-base to main May 21, 2025 10:57
bal-e added a commit that referenced this pull request Jul 24, 2025
This is a manual squash of #491 for internal testing.
bal-e added a commit that referenced this pull request Aug 1, 2025
This is a manual squash of #491 for internal testing.
bal-e added 4 commits August 1, 2025 17:36
This is a manual squash of #491 for internal testing.
Record data usually needs 'Name' (e.g. for canonical ordering), while
we continue using 'RevName' for record owner names.
This is a breaking change to 'domain::new', switching from 'RevName'
to the more suitable 'Name' for representing domain names within record
data.
bal-e added 4 commits August 26, 2025 06:49
- Also enhance 'FromStr' impls to parse more of the zonefile format,
  for parity with the existing 'Display' impls

- Also impl conversions between 'NameBuf' and 'RevNameBuf'
Improves the 'FromStr' impls for the name types.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants