prototype using pest-typed #1

tomtau · 2024-02-07T04:15:20Z

see the discussion on pest-parser/ast#29

Summary by CodeRabbit

New Features
- Introduced a macro for deriving typed parsers.
- Added functionality for parsing sequences with increasing length constraints.
- Implemented parsers for skipping specific patterns, including comments.
- Introduced parsing rules and constructs for string parsing and more.
- Added functionality for generating typed parsers based on grammar rules.
- Introduced modules and functionality for managing cursor positions, handling spans, and visualizing whitespace in inputs.
- Added traits for defining parser rules and typed nodes, enhancing error handling during parsing.
Documentation
- Added documentation comments for parsing rules.
Bug Fixes
- Fixed issue with trailing comma in macro definition.
Chores
- Updated .gitignore to ignore .vscode directory and generated.rs in tests.

- pest/src/formatter.rs - pest/src/position.rs - pest/src/span.rs

- pest/src/typed/template/mod.rs - pest/src/typed/template/repetition.rs

Just follow pest's licenses.

coderabbitai · 2024-02-07T04:16:06Z

Walkthrough

This update introduces a comprehensive enhancement to a Rust-based parser generation library, focusing on the integration of a typed parser derivation system using pest_generator and pest. It improves grammar parsing, error handling, and documentation, and introduces new testing capabilities. The changes streamline the process of generating typed parsers from grammar specifications, enriching the library's functionality with more robust and user-friendly parsing solutions.

Changes

Files	Change Summaries
`.gitignore`, `generator/tests/.gitignore`	Added `.vscode`, `generated.rs` to ignore list.
`derive/src/lib.rs`	Added macro for deriving typed parsers using `pest_generator`.
`derive/tests/sequence.rs`, `derive/tests/skip.rs`, `pest/tests/predefined_node.rs`	Introduced parsers for sequences, skipping patterns, and predefined nodes with tests.
`derive/tests/grammar.pest`, `derive/tests/grammar_typed.rs`, `generator/tests/syntax.pest`, `pest/tests/syntax.used.rules.txt`	Specified parsing rules and constructs; introduced functionality for parsing based on defined grammars.
`generator/src/common.rs`, `generator/src/typed/...`, `pest/src/...`	Enhanced functionality for generating typed parsers, handling tokens, sequences, choices, and more.
`meta/src/doc.rs`, `meta/src/error.rs`, `meta/src/parser.rs`	Improved documentation, error handling, and parsing rule management.

🐇✨
In the realm of code, where parsers play,
A rabbit hopped, crafting rules to convey.
With typed parsers derived, and grammars designed,
It leaped through lines, leaving bugs behind.
"To parse," it cheered, "is to understand,
Each token and rule, meticulously planned."
🌟📚🐾

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit-tests for this file.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit tests for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository from git and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit tests.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

coderabbitai

Review Status

Actionable comments generated: 62

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 7e1873f and dd145db.

Files ignored due to path filters (5)

Cargo.toml is excluded by: !**/*.toml
derive/Cargo.toml is excluded by: !**/*.toml
generator/Cargo.toml is excluded by: !**/*.toml
meta/Cargo.toml is excluded by: !**/*.toml
pest/Cargo.toml is excluded by: !**/*.toml

Files selected for processing (33)

.gitignore (1 hunks)
derive/src/lib.rs (1 hunks)
derive/tests/sequence.rs (1 hunks)
generator/src/common.rs (1 hunks)
generator/src/config.rs (1 hunks)
generator/src/lib.rs (1 hunks)
generator/src/state_builder.rs (2 hunks)
generator/src/typed/attr.rs (1 hunks)
generator/src/typed/config.rs (1 hunks)
generator/src/typed/generator.rs (1 hunks)
generator/src/typed/mod.rs (1 hunks)
generator/src/typed/output.rs (1 hunks)
generator/src/types.rs (1 hunks)
generator/tests/.gitignore (1 hunks)
generator/tests/generated.rs (1 hunks)
generator/tests/generated.txt (1 hunks)
generator/tests/syntax.pest (1 hunks)
meta/src/doc.rs (1 hunks)
meta/src/error.rs (1 hunks)
meta/src/lib.rs (1 hunks)
meta/src/parser.rs (10 hunks)
pest/src/choice.rs (1 hunks)
pest/src/formatter.rs (1 hunks)
pest/src/lib.rs (1 hunks)
pest/src/position.rs (1 hunks)
pest/src/sequence.rs (1 hunks)
pest/src/span.rs (1 hunks)
pest/src/typed/mod.rs (1 hunks)
pest/src/typed/template/mod.rs (1 hunks)
pest/src/typed/template/repetition.rs (1 hunks)
pest/src/typed/tracker.rs (1 hunks)
pest/src/typed/traits.rs (1 hunks)
pest/src/typed/wrapper.rs (1 hunks)

Files skipped from review due to trivial changes (2)

.gitignore
generator/tests/.gitignore

Additional comments: 66

meta/src/lib.rs (1)

1-2: Publicly exposing the doc and error modules aligns with the PR's objective to enhance documentation and error handling capabilities.

pest/src/typed/mod.rs (1)

1-7: The introduction of template, tracker, traits, and wrapper modules, along with their public exposure, is consistent with the PR's goal to support typed parser generation.

generator/src/types.rs (1)

4-9: The functions _str and option_type provide utility for generating token streams for common Rust types, which is a useful addition for typed parser generation.

generator/src/typed/mod.rs (1)

1-13: The structure of the typed module with attr, config, generator, and output submodules, and the public exposure of derive_typed_parser function, aligns with the PR's objective to facilitate typed parser generation.

pest/src/lib.rs (1)

1-14: Adding modules formatter, position, sequence, span, typed, and integrating the alloc crate, along with exposing entities from pest2, enhances the pest library's functionality in line with the PR's objectives.

meta/src/doc.rs (1)

1-10: The DocComment struct with fields for grammar documentation and rule-specific documentation supports the PR's goal of improving documentation capabilities.

derive/src/lib.rs (1)

3-15: The derive_typed_parser macro with attributes for parser configuration is a key addition for generating typed parsers, aligning with the PR's objectives.

generator/src/typed/config.rs (1)

1-15: The Config struct with flags for parser generation options is a necessary addition for supporting configurable typed parser generation.

pest/src/typed/wrapper.rs (1)

1-20: The introduction of traits String, Rule, and Bound for typed parser generation is consistent with the PR's goal of enhancing the pest ecosystem's capabilities.

generator/tests/syntax.pest (1)

1-40: The syntax file for testing demonstrates the use of various grammar constructs, which is essential for validating the functionality of the typed parser generation.

generator/src/lib.rs (1)

1-6: Adding Rust compatibility and idiom warnings, along with the new typed module, improves code quality and supports the typed parser generation feature.

generator/tests/generated.rs (1)

1-41: The test for validating generated rules against expected outputs, including formatting the generated code, is crucial for ensuring the correctness of the typed parser generation feature.

meta/src/error.rs (1)

1-31: The rename_meta_rule function for improving error messages by renaming verbose rules aligns with the PR's objective to enhance error handling.

generator/src/typed/attr.rs (1)

1-32: Parsing derive attributes to configure the typed parser generation process is a necessary addition for achieving the PR's objectives.

derive/tests/sequence.rs (1)

1-67: The tests for sequence parsing validate the functionality of the typed parser generation, ensuring it meets the PR's objectives.

generator/src/common.rs (1)

1-68: The common backend code for generating include_str! tokens and rule enums supports the PR's goal of simplifying and enhancing the parser generation process.

pest/src/typed/traits.rs (1)

1-76: Introducing traits RuleType and TypedNode for typed parser generation is consistent with the PR's objectives of enhancing type safety and usability.

pest/src/choice.rs (1)

1-3: Adding a trailing comma in the choice_type macro definition improves the macro's usability by allowing optional trailing commas in its invocation.

generator/src/config.rs (1)

1-91: The configuration utilities for handling grammar sources and attributes support the PR's goal of making parser generation more configurable and user-friendly.

pest/src/typed/template/repetition.rs (1)

1-125: The repetition template structures and their implementations for typed nodes align with the PR's objectives of enhancing the pest ecosystem's capabilities for typed parser generation.

pest/src/sequence.rs (1)

1-3: The sequence_type macro definition with support for optional trailing commas improves usability by allowing more flexible macro invocations.

generator/src/state_builder.rs (4)

20-20: The clone() call in GeneratedExpr::Inline's to_tokens method has been removed. Confirm that this change does not affect the correctness of token stream generation.

229-229: The gen method's signature has been changed to take self instead of &mut self. Ensure this change aligns with the intended usage patterns and does not introduce any issues with state management or mutability.

17-27: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [65-67]

The method clear only clears the attempts map. Verify that this is the intended behavior and that no other state should be reset in this context.

17-27: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [79-85]

The method during temporarily changes the positive state. Confirm that this approach is thread-safe and does not introduce any unintended side effects when used in concurrent contexts.

pest/src/typed/tracker.rs (1)

79-85: The method during temporarily changes the positive state. As previously mentioned, confirm that this approach is safe and does not introduce side effects.

pest/src/typed/template/mod.rs (20)

21-47: The implementation of Str<T> and its associated methods are correct and follow Rust's idiomatic practices. The use of PhantomData to associate generic type T without storing it is appropriate.

50-88: The Insens<'i, T> struct and its implementation correctly provide case-insensitive string matching functionality. The use of PhantomData and the implementation of TypedNode for case-insensitive matching are correctly done.

91-111: The SkipChar<'i, const N: usize> struct and its implementation for skipping a fixed number of characters are correct. The use of a constant generic for N is a good use of Rust's type system for compile-time parameterization.

114-136: The CharRange<const MIN: char, const MAX: char> struct and its implementation for matching a character within a specified range are correct. The use of constant generics for MIN and MAX is appropriate, and the implementation correctly matches a single character within the range.

139-161: The functions constrain_idx and constrain_idxs are utility functions for constraining indices within a certain range. Ensure these functions are covered by tests to validate their correctness across various edge cases.

163-183: The stack_slice function correctly creates a slice of the stack based on start and end indices, handling out-of-bound scenarios appropriately. The use of Option for the end parameter and the early return for empty sequences are good practices.

186-203: The peek_spans function for matching a sequence of spans without consuming input is correctly implemented. It iterates through spans and matches them against the input, returning early if any span does not match.

206-248: The Positive<N> struct and its implementation for positive lookahead are correctly implemented. The use of Deref and DerefMut traits to provide easy access to the inner content is a good practice.

252-284: The Negative<T> struct and its implementation for negative lookahead are correctly implemented. The use of PhantomData to type-safely associate generic type T without storing it is appropriate.

288-309: The ANY struct for matching any character is correctly implemented. The use of match_char_by with a closure that always returns true is a simple and effective way to match any single character.

312-322: The NONE struct for representing a no-op or empty match is correctly implemented. It correctly returns None for any input, indicating that it does not consume any input.

326-340: The SOI (Start of Input) and EOI (End of Input) structs are correctly implemented to match the start and end of input, respectively. These are essential constructs for many parsing scenarios.

364-400: The NEWLINE struct for matching newline characters and the predefined ASCII character and digit types (ASCII_DIGIT, ASCII_NONZERO_DIGIT, etc.) are correctly implemented. These utilities enhance the expressiveness and convenience of the parser.

402-454: The PEEK and PEEK_ALL structs for peeking at the top or all spans in the stack without consuming input are correctly implemented. The use of reverse iteration for PEEK_ALL and the matching logic in PEEK are appropriate.

457-498: The Skipped<T, Skip, const SKIP: usize> and DROP structs for handling skipped content and dropping the top of the stack are correctly implemented. The use of a constant generic for SKIP in Skipped and the straightforward implementation of DROP are good practices.

500-529: The POP and POP_ALL structs for matching and popping the top or all spans from the stack are correctly implemented. The logic for matching spans against the input and the loop for popping all spans in POP_ALL are correct.

557-591: The Push<T> struct for matching an expression and pushing it to the stack is correctly implemented. The use of Deref and DerefMut for easy access to the inner content and the logic for pushing the span to the stack are appropriate.

593-625: The PeekSlice1 and PeekSlice2 structs for matching slices of the stack in top-to-bottom order are correctly implemented. The use of stack_slice and peek_spans functions within their implementations ensures correct behavior.

627-666: The type aliases for ASCII character classes (ASCII_DIGIT, ASCII_NONZERO_DIGIT, ASCII_BIN_DIGIT, etc.) are correctly defined using CharRange and Choice constructs. These aliases provide convenient shorthand for common character classes.

668-695: The match_char_by utility function for matching a character based on a predicate and the restore_on_none function for error handling are correctly implemented. These utilities enhance the flexibility and robustness of the parser.

pest/src/position.rs (6)

61-62: The method new correctly returns None for invalid positions, aligning with Rust's safety and error handling conventions.

75-77: The method from_start is correctly implemented, ensuring safety by leveraging the fact that position 0 is always a valid UTF-8 border.

265-282: The skip method is correctly implemented, ensuring that pos is only updated if the skip is possible. This maintains the integrity of the Position object.

390-392: The match_char method is concise and correctly implemented using pattern matching.

417-426: The match_string method is implemented correctly, ensuring atomicity in updating pos only if the match is successful.

452-460: The match_range method is correctly implemented, ensuring that pos is updated only on a successful match within the specified range.

meta/src/parser.rs (14)

20-20: Change grammar module visibility to pub(crate) limits its accessibility outside of this crate. Confirm this aligns with design intentions.

31-31: Added doc module import. Ensure that the doc module is utilized effectively in the parser logic.

28-36: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [33-77]

Span struct and its methods are correctly implemented. Ensure that the union method's logic correctly handles spans from different paths.

75-81: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [78-101]

Range<T> struct and its from method are correctly implemented. Verify that the unimplemented branches for start_bound and end_bound are intentional and won't cause runtime issues.

99-105: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [102-110]

ParseRule struct correctly captures parsing rules. Ensure that the silent field's usage aligns with the parsing logic requirements.

111-116: ParseNode struct correctly encapsulates a node in the parse tree. Verify that the structure is adequately used throughout the parsing process.

117-122: PathArgs enum variants are correctly defined. Ensure that all possible path argument types are covered.

123-132: Trivia enum correctly introduces trivia handling. Ensure that the usage of Trivia aligns with the grammar and parsing requirements.

134-147: ParseExpr enum modifications introduce Trivia handling for Seq and Separated variants. Ensure that these changes are correctly integrated into the parsing logic.

161-240: Refactoring of parse function into _parse with an additional doc parameter and the introduction of parse_with_doc_comment function. Ensure that the new structure improves modularity and maintainability without introducing regressions.

541-542: Introduction of Trivia handling in parse_postfix function. Verify that the Trivia enum is correctly used to represent optional and mandatory trivia in the grammar.

608-613: Modification in parse_node function to handle Trivia for Seq expressions. Ensure that the introduction of Trivia handling aligns with the grammar specifications and parsing requirements.

1572-1576: In the test parse_infixes, the use of Trivia::Mandatory and Trivia::Optional is correctly demonstrated. Verify that the test cases adequately cover the expected behavior of trivia handling in sequences.

1624-1624: In the test parse_postfixes, the use of Trivia::Optional is correctly demonstrated. Ensure that the test case accurately reflects the intended parsing behavior for optional trivia.

coderabbitai · 2024-02-07T04:20:47Z

pest/src/position.rs

+/// A cursor position in a `&str` which provides useful methods to manually parse that string.
+#[derive(Clone, Copy)]
+pub struct Position<'i> {
+    pub(crate) input: &'i str,
+    /// # Safety:
+    ///
+    /// `input[pos..]` must be a valid codepoint boundary (should not panic when indexing thus).
+    pos: usize,
+}


Ensure the safety comment on line 34 is accurate and consider providing more detail on what constitutes a valid codepoint boundary.

coderabbitai · 2024-02-07T04:20:47Z

pest/src/position.rs

+    pub(crate) unsafe fn new_unchecked(input: &str, pos: usize) -> Position<'_> {
+        debug_assert!(input.get(pos..).is_some());


Use of debug_assert! for validating pos is a good practice in debug mode, but consider adding a fallback error handling for release builds to prevent potential runtime panics.

coderabbitai · 2024-02-07T04:20:47Z

pest/src/position.rs

+    pub fn span(&self, other: &Position<'i>) -> span::Span<'i> {
+        if ptr::eq(self.input, other.input)
+        /* && self.input.get(self.pos..other.pos).is_some() */
+        {
+            // This is safe because the pos field of a Position should always be a valid str index.
+            unsafe { span::Span::new_unchecked(self.input, self.pos, other.pos) }
+        } else {
+            // TODO: maybe a panic if self.pos < other.pos
+            panic!("span created from positions from different inputs")
+        }


The span method's use of ptr::eq for input comparison is efficient, but the commented-out validation check on line 116 suggests incomplete error handling. Consider implementing or removing this check based on intended functionality.

coderabbitai · 2024-02-07T04:20:47Z

pest/src/position.rs

+    pub fn line_col(&self) -> (usize, usize) {
+        if self.pos > self.input.len() {
+            panic!("position out of bounds");
+        }
+        let mut pos = self.pos;
+        let slice = &self.input[..pos];
+        let mut chars = slice.chars().peekable();
+
+        let mut line_col = (1, 1);
+
+        while pos != 0 {
+            match chars.next() {
+                Some('\r') => {
+                    if let Some(&'\n') = chars.peek() {
+                        chars.next();
+
+                        if pos == 1 {
+                            pos -= 1;
+                        } else {
+                            pos -= 2;
+                        }
+
+                        line_col = (line_col.0 + 1, 1);
+                    } else {
+                        pos -= 1;
+                        line_col = (line_col.0, line_col.1 + 1);
+                    }
+                }
+                Some('\n') => {
+                    pos -= 1;
+                    line_col = (line_col.0 + 1, 1);
+                }
+                Some(c) => {
+                    pos -= c.len_utf8();
+                    line_col = (line_col.0, line_col.1 + 1);
+                }
+                None => unreachable!(),
+            }
+        }
+
+        line_col


The line_col method's implementation is correct but potentially inefficient due to its O(n) complexity. Consider documenting this explicitly or exploring optimization opportunities.

coderabbitai · 2024-02-07T04:20:47Z

pest/src/position.rs

+    pub fn line_of(&self) -> &'i str {
+        if self.pos > self.input.len() {
+            panic!("position out of bounds");
+        };
+        // Safe since start and end can only be valid UTF-8 borders.
+        &self.input[self.find_line_start()..self.find_line_end()]


The method line_of correctly calculates the line containing the position. Ensure that the panic condition on line 208 is desirable and consider handling this case more gracefully if applicable.

coderabbitai · 2024-02-07T04:20:51Z