Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prototype using pest-typed #1

Merged
merged 60 commits into from
Feb 20, 2024
Merged

Conversation

tomtau
Copy link
Contributor

@tomtau tomtau commented Feb 7, 2024

see the discussion on pest-parser/ast#29

Summary by CodeRabbit

  • New Features
    • Introduced a macro for deriving typed parsers.
    • Added functionality for parsing sequences with increasing length constraints.
    • Implemented parsers for skipping specific patterns, including comments.
    • Introduced parsing rules and constructs for string parsing and more.
    • Added functionality for generating typed parsers based on grammar rules.
    • Introduced modules and functionality for managing cursor positions, handling spans, and visualizing whitespace in inputs.
    • Added traits for defining parser rules and typed nodes, enhancing error handling during parsing.
  • Documentation
    • Added documentation comments for parsing rules.
  • Bug Fixes
    • Fixed issue with trailing comma in macro definition.
  • Chores
    • Updated .gitignore to ignore .vscode directory and generated.rs in tests.

Copy link

coderabbitai bot commented Feb 7, 2024

Walkthrough

This update introduces a comprehensive enhancement to a Rust-based parser generation library, focusing on the integration of a typed parser derivation system using pest_generator and pest. It improves grammar parsing, error handling, and documentation, and introduces new testing capabilities. The changes streamline the process of generating typed parsers from grammar specifications, enriching the library's functionality with more robust and user-friendly parsing solutions.

Changes

Files Change Summaries
.gitignore, generator/tests/.gitignore Added .vscode, generated.rs to ignore list.
derive/src/lib.rs Added macro for deriving typed parsers using pest_generator.
derive/tests/sequence.rs, derive/tests/skip.rs, pest/tests/predefined_node.rs Introduced parsers for sequences, skipping patterns, and predefined nodes with tests.
derive/tests/grammar.pest, derive/tests/grammar_typed.rs, generator/tests/syntax.pest, pest/tests/syntax.used.rules.txt Specified parsing rules and constructs; introduced functionality for parsing based on defined grammars.
generator/src/common.rs, generator/src/typed/..., pest/src/... Enhanced functionality for generating typed parsers, handling tokens, sequences, choices, and more.
meta/src/doc.rs, meta/src/error.rs, meta/src/parser.rs Improved documentation, error handling, and parsing rule management.

🐇✨
In the realm of code, where parsers play,
A rabbit hopped, crafting rules to convey.
With typed parsers derived, and grammars designed,
It leaped through lines, leaving bugs behind.
"To parse," it cheered, "is to understand,
Each token and rule, meticulously planned."
🌟📚🐾

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository from git and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 62

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 7e1873f and dd145db.
Files ignored due to path filters (5)
  • Cargo.toml is excluded by: !**/*.toml
  • derive/Cargo.toml is excluded by: !**/*.toml
  • generator/Cargo.toml is excluded by: !**/*.toml
  • meta/Cargo.toml is excluded by: !**/*.toml
  • pest/Cargo.toml is excluded by: !**/*.toml
Files selected for processing (33)
  • .gitignore (1 hunks)
  • derive/src/lib.rs (1 hunks)
  • derive/tests/sequence.rs (1 hunks)
  • generator/src/common.rs (1 hunks)
  • generator/src/config.rs (1 hunks)
  • generator/src/lib.rs (1 hunks)
  • generator/src/state_builder.rs (2 hunks)
  • generator/src/typed/attr.rs (1 hunks)
  • generator/src/typed/config.rs (1 hunks)
  • generator/src/typed/generator.rs (1 hunks)
  • generator/src/typed/mod.rs (1 hunks)
  • generator/src/typed/output.rs (1 hunks)
  • generator/src/types.rs (1 hunks)
  • generator/tests/.gitignore (1 hunks)
  • generator/tests/generated.rs (1 hunks)
  • generator/tests/generated.txt (1 hunks)
  • generator/tests/syntax.pest (1 hunks)
  • meta/src/doc.rs (1 hunks)
  • meta/src/error.rs (1 hunks)
  • meta/src/lib.rs (1 hunks)
  • meta/src/parser.rs (10 hunks)
  • pest/src/choice.rs (1 hunks)
  • pest/src/formatter.rs (1 hunks)
  • pest/src/lib.rs (1 hunks)
  • pest/src/position.rs (1 hunks)
  • pest/src/sequence.rs (1 hunks)
  • pest/src/span.rs (1 hunks)
  • pest/src/typed/mod.rs (1 hunks)
  • pest/src/typed/template/mod.rs (1 hunks)
  • pest/src/typed/template/repetition.rs (1 hunks)
  • pest/src/typed/tracker.rs (1 hunks)
  • pest/src/typed/traits.rs (1 hunks)
  • pest/src/typed/wrapper.rs (1 hunks)
Files skipped from review due to trivial changes (2)
  • .gitignore
  • generator/tests/.gitignore
Additional comments: 66
meta/src/lib.rs (1)
  • 1-2: Publicly exposing the doc and error modules aligns with the PR's objective to enhance documentation and error handling capabilities.
pest/src/typed/mod.rs (1)
  • 1-7: The introduction of template, tracker, traits, and wrapper modules, along with their public exposure, is consistent with the PR's goal to support typed parser generation.
generator/src/types.rs (1)
  • 4-9: The functions _str and option_type provide utility for generating token streams for common Rust types, which is a useful addition for typed parser generation.
generator/src/typed/mod.rs (1)
  • 1-13: The structure of the typed module with attr, config, generator, and output submodules, and the public exposure of derive_typed_parser function, aligns with the PR's objective to facilitate typed parser generation.
pest/src/lib.rs (1)
  • 1-14: Adding modules formatter, position, sequence, span, typed, and integrating the alloc crate, along with exposing entities from pest2, enhances the pest library's functionality in line with the PR's objectives.
meta/src/doc.rs (1)
  • 1-10: The DocComment struct with fields for grammar documentation and rule-specific documentation supports the PR's goal of improving documentation capabilities.
derive/src/lib.rs (1)
  • 3-15: The derive_typed_parser macro with attributes for parser configuration is a key addition for generating typed parsers, aligning with the PR's objectives.
generator/src/typed/config.rs (1)
  • 1-15: The Config struct with flags for parser generation options is a necessary addition for supporting configurable typed parser generation.
pest/src/typed/wrapper.rs (1)
  • 1-20: The introduction of traits String, Rule, and Bound for typed parser generation is consistent with the PR's goal of enhancing the pest ecosystem's capabilities.
generator/tests/syntax.pest (1)
  • 1-40: The syntax file for testing demonstrates the use of various grammar constructs, which is essential for validating the functionality of the typed parser generation.
generator/src/lib.rs (1)
  • 1-6: Adding Rust compatibility and idiom warnings, along with the new typed module, improves code quality and supports the typed parser generation feature.
generator/tests/generated.rs (1)
  • 1-41: The test for validating generated rules against expected outputs, including formatting the generated code, is crucial for ensuring the correctness of the typed parser generation feature.
meta/src/error.rs (1)
  • 1-31: The rename_meta_rule function for improving error messages by renaming verbose rules aligns with the PR's objective to enhance error handling.
generator/src/typed/attr.rs (1)
  • 1-32: Parsing derive attributes to configure the typed parser generation process is a necessary addition for achieving the PR's objectives.
derive/tests/sequence.rs (1)
  • 1-67: The tests for sequence parsing validate the functionality of the typed parser generation, ensuring it meets the PR's objectives.
generator/src/common.rs (1)
  • 1-68: The common backend code for generating include_str! tokens and rule enums supports the PR's goal of simplifying and enhancing the parser generation process.
pest/src/typed/traits.rs (1)
  • 1-76: Introducing traits RuleType and TypedNode for typed parser generation is consistent with the PR's objectives of enhancing type safety and usability.
pest/src/choice.rs (1)
  • 1-3: Adding a trailing comma in the choice_type macro definition improves the macro's usability by allowing optional trailing commas in its invocation.
generator/src/config.rs (1)
  • 1-91: The configuration utilities for handling grammar sources and attributes support the PR's goal of making parser generation more configurable and user-friendly.
pest/src/typed/template/repetition.rs (1)
  • 1-125: The repetition template structures and their implementations for typed nodes align with the PR's objectives of enhancing the pest ecosystem's capabilities for typed parser generation.
pest/src/sequence.rs (1)
  • 1-3: The sequence_type macro definition with support for optional trailing commas improves usability by allowing more flexible macro invocations.
generator/src/state_builder.rs (4)
  • 20-20: The clone() call in GeneratedExpr::Inline's to_tokens method has been removed. Confirm that this change does not affect the correctness of token stream generation.
  • 229-229: The gen method's signature has been changed to take self instead of &mut self. Ensure this change aligns with the intended usage patterns and does not introduce any issues with state management or mutability.
  • 17-27: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [65-67]

The method clear only clears the attempts map. Verify that this is the intended behavior and that no other state should be reset in this context.

  • 17-27: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [79-85]

The method during temporarily changes the positive state. Confirm that this approach is thread-safe and does not introduce any unintended side effects when used in concurrent contexts.

pest/src/typed/tracker.rs (1)
  • 79-85: The method during temporarily changes the positive state. As previously mentioned, confirm that this approach is safe and does not introduce side effects.
pest/src/typed/template/mod.rs (20)
  • 21-47: The implementation of Str<T> and its associated methods are correct and follow Rust's idiomatic practices. The use of PhantomData to associate generic type T without storing it is appropriate.
  • 50-88: The Insens<'i, T> struct and its implementation correctly provide case-insensitive string matching functionality. The use of PhantomData and the implementation of TypedNode for case-insensitive matching are correctly done.
  • 91-111: The SkipChar<'i, const N: usize> struct and its implementation for skipping a fixed number of characters are correct. The use of a constant generic for N is a good use of Rust's type system for compile-time parameterization.
  • 114-136: The CharRange<const MIN: char, const MAX: char> struct and its implementation for matching a character within a specified range are correct. The use of constant generics for MIN and MAX is appropriate, and the implementation correctly matches a single character within the range.
  • 139-161: The functions constrain_idx and constrain_idxs are utility functions for constraining indices within a certain range. Ensure these functions are covered by tests to validate their correctness across various edge cases.
  • 163-183: The stack_slice function correctly creates a slice of the stack based on start and end indices, handling out-of-bound scenarios appropriately. The use of Option for the end parameter and the early return for empty sequences are good practices.
  • 186-203: The peek_spans function for matching a sequence of spans without consuming input is correctly implemented. It iterates through spans and matches them against the input, returning early if any span does not match.
  • 206-248: The Positive<N> struct and its implementation for positive lookahead are correctly implemented. The use of Deref and DerefMut traits to provide easy access to the inner content is a good practice.
  • 252-284: The Negative<T> struct and its implementation for negative lookahead are correctly implemented. The use of PhantomData to type-safely associate generic type T without storing it is appropriate.
  • 288-309: The ANY struct for matching any character is correctly implemented. The use of match_char_by with a closure that always returns true is a simple and effective way to match any single character.
  • 312-322: The NONE struct for representing a no-op or empty match is correctly implemented. It correctly returns None for any input, indicating that it does not consume any input.
  • 326-340: The SOI (Start of Input) and EOI (End of Input) structs are correctly implemented to match the start and end of input, respectively. These are essential constructs for many parsing scenarios.
  • 364-400: The NEWLINE struct for matching newline characters and the predefined ASCII character and digit types (ASCII_DIGIT, ASCII_NONZERO_DIGIT, etc.) are correctly implemented. These utilities enhance the expressiveness and convenience of the parser.
  • 402-454: The PEEK and PEEK_ALL structs for peeking at the top or all spans in the stack without consuming input are correctly implemented. The use of reverse iteration for PEEK_ALL and the matching logic in PEEK are appropriate.
  • 457-498: The Skipped<T, Skip, const SKIP: usize> and DROP structs for handling skipped content and dropping the top of the stack are correctly implemented. The use of a constant generic for SKIP in Skipped and the straightforward implementation of DROP are good practices.
  • 500-529: The POP and POP_ALL structs for matching and popping the top or all spans from the stack are correctly implemented. The logic for matching spans against the input and the loop for popping all spans in POP_ALL are correct.
  • 557-591: The Push<T> struct for matching an expression and pushing it to the stack is correctly implemented. The use of Deref and DerefMut for easy access to the inner content and the logic for pushing the span to the stack are appropriate.
  • 593-625: The PeekSlice1 and PeekSlice2 structs for matching slices of the stack in top-to-bottom order are correctly implemented. The use of stack_slice and peek_spans functions within their implementations ensures correct behavior.
  • 627-666: The type aliases for ASCII character classes (ASCII_DIGIT, ASCII_NONZERO_DIGIT, ASCII_BIN_DIGIT, etc.) are correctly defined using CharRange and Choice constructs. These aliases provide convenient shorthand for common character classes.
  • 668-695: The match_char_by utility function for matching a character based on a predicate and the restore_on_none function for error handling are correctly implemented. These utilities enhance the flexibility and robustness of the parser.
pest/src/position.rs (6)
  • 61-62: The method new correctly returns None for invalid positions, aligning with Rust's safety and error handling conventions.
  • 75-77: The method from_start is correctly implemented, ensuring safety by leveraging the fact that position 0 is always a valid UTF-8 border.
  • 265-282: The skip method is correctly implemented, ensuring that pos is only updated if the skip is possible. This maintains the integrity of the Position object.
  • 390-392: The match_char method is concise and correctly implemented using pattern matching.
  • 417-426: The match_string method is implemented correctly, ensuring atomicity in updating pos only if the match is successful.
  • 452-460: The match_range method is correctly implemented, ensuring that pos is updated only on a successful match within the specified range.
meta/src/parser.rs (14)
  • 20-20: Change grammar module visibility to pub(crate) limits its accessibility outside of this crate. Confirm this aligns with design intentions.
  • 31-31: Added doc module import. Ensure that the doc module is utilized effectively in the parser logic.
  • 28-36: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [33-77]

Span struct and its methods are correctly implemented. Ensure that the union method's logic correctly handles spans from different paths.

  • 75-81: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [78-101]

Range<T> struct and its from method are correctly implemented. Verify that the unimplemented branches for start_bound and end_bound are intentional and won't cause runtime issues.

  • 99-105: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [102-110]

ParseRule struct correctly captures parsing rules. Ensure that the silent field's usage aligns with the parsing logic requirements.

  • 111-116: ParseNode struct correctly encapsulates a node in the parse tree. Verify that the structure is adequately used throughout the parsing process.
  • 117-122: PathArgs enum variants are correctly defined. Ensure that all possible path argument types are covered.
  • 123-132: Trivia enum correctly introduces trivia handling. Ensure that the usage of Trivia aligns with the grammar and parsing requirements.
  • 134-147: ParseExpr enum modifications introduce Trivia handling for Seq and Separated variants. Ensure that these changes are correctly integrated into the parsing logic.
  • 161-240: Refactoring of parse function into _parse with an additional doc parameter and the introduction of parse_with_doc_comment function. Ensure that the new structure improves modularity and maintainability without introducing regressions.
  • 541-542: Introduction of Trivia handling in parse_postfix function. Verify that the Trivia enum is correctly used to represent optional and mandatory trivia in the grammar.
  • 608-613: Modification in parse_node function to handle Trivia for Seq expressions. Ensure that the introduction of Trivia handling aligns with the grammar specifications and parsing requirements.
  • 1572-1576: In the test parse_infixes, the use of Trivia::Mandatory and Trivia::Optional is correctly demonstrated. Verify that the test cases adequately cover the expected behavior of trivia handling in sequences.
  • 1624-1624: In the test parse_postfixes, the use of Trivia::Optional is correctly demonstrated. Ensure that the test case accurately reflects the intended parsing behavior for optional trivia.

Comment on lines +28 to +36
/// A cursor position in a `&str` which provides useful methods to manually parse that string.
#[derive(Clone, Copy)]
pub struct Position<'i> {
pub(crate) input: &'i str,
/// # Safety:
///
/// `input[pos..]` must be a valid codepoint boundary (should not panic when indexing thus).
pos: usize,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure the safety comment on line 34 is accurate and consider providing more detail on what constitutes a valid codepoint boundary.

Comment on lines +44 to +45
pub(crate) unsafe fn new_unchecked(input: &str, pos: usize) -> Position<'_> {
debug_assert!(input.get(pos..).is_some());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of debug_assert! for validating pos is a good practice in debug mode, but consider adding a fallback error handling for release builds to prevent potential runtime panics.

Comment on lines +114 to +123
pub fn span(&self, other: &Position<'i>) -> span::Span<'i> {
if ptr::eq(self.input, other.input)
/* && self.input.get(self.pos..other.pos).is_some() */
{
// This is safe because the pos field of a Position should always be a valid str index.
unsafe { span::Span::new_unchecked(self.input, self.pos, other.pos) }
} else {
// TODO: maybe a panic if self.pos < other.pos
panic!("span created from positions from different inputs")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The span method's use of ptr::eq for input comparison is efficient, but the commented-out validation check on line 116 suggests incomplete error handling. Consider implementing or removing this check based on intended functionality.

Comment on lines +146 to +186
pub fn line_col(&self) -> (usize, usize) {
if self.pos > self.input.len() {
panic!("position out of bounds");
}
let mut pos = self.pos;
let slice = &self.input[..pos];
let mut chars = slice.chars().peekable();

let mut line_col = (1, 1);

while pos != 0 {
match chars.next() {
Some('\r') => {
if let Some(&'\n') = chars.peek() {
chars.next();

if pos == 1 {
pos -= 1;
} else {
pos -= 2;
}

line_col = (line_col.0 + 1, 1);
} else {
pos -= 1;
line_col = (line_col.0, line_col.1 + 1);
}
}
Some('\n') => {
pos -= 1;
line_col = (line_col.0 + 1, 1);
}
Some(c) => {
pos -= c.len_utf8();
line_col = (line_col.0, line_col.1 + 1);
}
None => unreachable!(),
}
}

line_col
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line_col method's implementation is correct but potentially inefficient due to its O(n) complexity. Consider documenting this explicitly or exploring optimization opportunities.

Comment on lines +206 to +211
pub fn line_of(&self) -> &'i str {
if self.pos > self.input.len() {
panic!("position out of bounds");
};
// Safe since start and end can only be valid UTF-8 borders.
&self.input[self.find_line_start()..self.find_line_end()]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method line_of correctly calculates the line containing the position. Ensure that the panic condition on line 208 is desirable and consider handling this case more gracefully if applicable.

Comment on lines 302 to 344
fn collect_used_rule<'g>(
rule: &'g ParseRule,
rule_trivia: Option<&'g ParseRule>,
res: &mut BTreeSet<RuleRef<'g>>,
) {
let mut nodes = vec![&rule.node];
let expect_trivia = format!(
"Please define trivia with `~ = \"...\"`. It's used in rule `{}`.",
rule.name
);
while let Some(expr) = nodes.pop() {
match &expr.expr {
ParseExpr::Str(_) | ParseExpr::Insens(_) | ParseExpr::Range(_, _) => (),
ParseExpr::PosPred(node) | ParseExpr::NegPred(node) => nodes.push(node),
ParseExpr::Seq(lhs, rhs, trivia) => {
nodes.push(lhs);
nodes.push(rhs);
if let Some(Trivia::Mandatory) | Some(Trivia::Optional) = trivia {
let rule_trivia = rule_trivia.expect(&expect_trivia);
nodes.push(&rule_trivia.node);
}
}
ParseExpr::Choice(lhs, rhs) => {
nodes.push(lhs);
nodes.push(rhs);
}
ParseExpr::Opt(node)
| ParseExpr::Rep(node)
| ParseExpr::RepOnce(node)
| ParseExpr::RepRange(node, _) => nodes.push(node),
ParseExpr::Path(path, args) => {
res.insert(RuleRef::new(path, args));
}
ParseExpr::Separated(node, trivia) => {
nodes.push(node);
if let Trivia::Mandatory | Trivia::Optional = trivia {
let rule_trivia = rule_trivia.expect(&expect_trivia);
nodes.push(&rule_trivia.node);
}
}
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function collect_used_rule collects rules used within a given rule. Review for correctness in identifying all relevant rules and ensuring no rules are missed.

Comment on lines 346 to 373
fn collect_reachability<'g>(rules: &'g [ParseRule]) -> BTreeMap<RuleRef, BTreeSet<RuleRef>> {
let mut res = BTreeMap::new();
let rule_trivia = rules.iter().find(|rule| rule.name == "~");
let path = vec![];
for rule in rules {
let entry = res
.entry(RuleRef::from_current(rule.name.as_str(), path.clone()))
.or_default();
collect_used_rule(rule, rule_trivia, entry);
}
for _ in 0..rules.len() {
for rule in rules {
let rule_ref = RuleRef::from_current(rule.name.as_str(), path.clone());
if let Some(cur) = res.remove(&rule_ref) {
let mut new = cur.clone();
for referenced in cur {
if let Some(iter) = res.get(&referenced) {
new.extend(iter.iter().cloned());
}
}
if !new.contains(&rule_ref) {
res.insert(rule_ref, new);
}
}
}
}
res
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function collect_reachability calculates rule reachability. Ensure that the logic correctly identifies all reachable rules and does not introduce any inaccuracies.

Comment on lines 375 to 380
fn generate_typed_pair_from_rule(rules: &[ParseRule], config: Config) -> TokenStream {
let defined_rules: BTreeSet<&str> = rules.iter().map(|rule| rule.name.as_str()).collect();
let not_boxed = collect_reachability(rules).keys().cloned().collect();
let graph = generate_graph(rules, &defined_rules, &not_boxed, config);
graph.collect()
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function generate_typed_pair_from_rule initiates the generation process for a set of rules. Review the integration of different generation steps to ensure they form a coherent and correct output.

Comment on lines 382 to 407
/// Generate codes for Parser.
fn generate_typed(
name: Ident,
generics: &Generics,
paths: Vec<PathBuf>,
rules: Vec<ParseRule>,
doc_comment: &DocComment,
include_grammar: bool,
config: Config,
) -> TokenStream {
let include_fix = if include_grammar {
generate_include(&name, paths)
} else {
quote!()
};
let rule_enum = generate_rule_enum(&rules, doc_comment);
let definition = generate_typed_pair_from_rule(&rules, config);
let (impl_generics, ty_generics, where_clause) = generics.split_for_impl();

let res = quote! {
#include_fix
#rule_enum
#definition
};
res
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function generate_typed orchestrates the generation of typed parsers. Ensure that the overall structure and flow of generation are clear and result in correct parser code.

Comment on lines 409 to 421
pub fn derive_typed_parser(input: TokenStream, include_grammar: bool) -> TokenStream {
let ast: DeriveInput = syn::parse2(input).unwrap();
let (name, generics, contents, config) = parse_derive(ast);

let (data, paths) = collect_data(contents);

let (rules, doc) = match parser::parse_with_doc_comment(&data, &data) {
Ok(pairs) => pairs,
Err(error) => panic!("error parsing \n{}", error.renamed_rules(rename_meta_rule)),
};

generate_typed(name, &generics, paths, rules, &doc, include_grammar, config)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function derive_typed_parser is the entry point for deriving typed parsers. Review for clarity and ensure it correctly sets up and executes the generation process.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between dd145db and 0fd7f6e.
Files selected for processing (1)
  • generator/tests/generator.rs (1 hunks)
Additional comments: 1
generator/tests/generator.rs (1)
  • 4-8: Ensure the documentation comment accurately reflects the file path used in the rustfmt command.
Verification successful

The documentation comment accurately reflects the file path used in the rustfmt command as verified by the shell script output. The file path generator/tests/generated.txt matches the actual file path used in tests.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify if the file path in the documentation matches the actual file path used in tests.
fd 'generated.txt' --exec echo "File path found: {}"

Length of output: 101

Comment on lines +1 to +2
use pest_generator::typed::derive_typed_parser;
use quote::quote;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider grouping imports from the same crate together to improve readability.

- use pest_generator::typed::derive_typed_parser;
- use quote::quote;
+ use pest_generator::typed::{derive_typed_parser};
+ use quote::{quote};

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
use pest_generator::typed::derive_typed_parser;
use quote::quote;
use pest_generator::typed::{derive_typed_parser};
use quote::{quote};

Comment on lines 11 to 40
let path_generated = "tests/generated.rs";
let path_expected = "tests/expected.rs";
let actual = derive_typed_parser(
quote! {
#[grammar = "tests/syntax.pest"]
#[emit_rule_reference]
#[emit_tagged_node_reference]
#[no_warnings]
struct Parser;
},
false,
);
let actual = actual.to_string();
std::fs::write(path_generated, &actual).unwrap();
let output = std::process::Command::new("rustfmt")
.arg(path_generated)
.arg("--config")
.arg("use_small_heuristics=Max,max_width=1000")
.output()
.unwrap();
assert!(
output.status.success(),
"STDOUT:\n{}\nSTDERR:\n{}",
String::from_utf8(output.stdout).unwrap(),
String::from_utf8(output.stderr).unwrap(),
);

if std::fs::read(path_generated).unwrap() != std::fs::read(path_expected).unwrap() {
panic!("Generated codes have changed.")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Use expect instead of unwrap for better error messages when file operations or command executions fail.
  • Consider extracting the rustfmt configuration and command execution into a separate function for better modularity and reusability.
- std::fs::write(path_generated, &actual).unwrap();
+ std::fs::write(path_generated, &actual).expect("Failed to write generated code to file");
...
- std::process::Command::new("rustfmt")
+ format_code(path_generated); // Implement this function to encapsulate formatting logic
...
- String::from_utf8(output.stdout).unwrap(),
- String::from_utf8(output.stderr).unwrap(),
+ String::from_utf8(output.stdout).expect("Failed to convert stdout to UTF-8"),
+ String::from_utf8(output.stderr).expect("Failed to convert stderr to UTF-8"),
...
- std::fs::read(path_generated).unwrap() != std::fs::read(path_expected).unwrap()
+ std::fs::read(path_generated).expect("Failed to read generated file") != std::fs::read(path_expected).expect("Failed to read expected file")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
let path_generated = "tests/generated.rs";
let path_expected = "tests/expected.rs";
let actual = derive_typed_parser(
quote! {
#[grammar = "tests/syntax.pest"]
#[emit_rule_reference]
#[emit_tagged_node_reference]
#[no_warnings]
struct Parser;
},
false,
);
let actual = actual.to_string();
std::fs::write(path_generated, &actual).unwrap();
let output = std::process::Command::new("rustfmt")
.arg(path_generated)
.arg("--config")
.arg("use_small_heuristics=Max,max_width=1000")
.output()
.unwrap();
assert!(
output.status.success(),
"STDOUT:\n{}\nSTDERR:\n{}",
String::from_utf8(output.stdout).unwrap(),
String::from_utf8(output.stderr).unwrap(),
);
if std::fs::read(path_generated).unwrap() != std::fs::read(path_expected).unwrap() {
panic!("Generated codes have changed.")
}
let path_generated = "tests/generated.rs";
let path_expected = "tests/expected.rs";
let actual = derive_typed_parser(
quote! {
#[grammar = "tests/syntax.pest"]
#[emit_rule_reference]
#[emit_tagged_node_reference]
#[no_warnings]
struct Parser;
},
false,
);
let actual = actual.to_string();
std::fs::write(path_generated, &actual).expect("Failed to write generated code to file");
format_code(path_generated); // Implement this function to encapsulate formatting logic
assert!(
output.status.success(),
"STDOUT:\n{}\nSTDERR:\n{}",
String::from_utf8(output.stdout).expect("Failed to convert stdout to UTF-8"),
String::from_utf8(output.stderr).expect("Failed to convert stderr to UTF-8"),
);
if std::fs::read(path_generated).expect("Failed to read generated file") != std::fs::read(path_expected).expect("Failed to read expected file") {
panic!("Generated codes have changed.")
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 3

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 326f6d4 and 2f2d341.
Files selected for processing (12)
  • derive/examples/reader.rs (1 hunks)
  • derive/src/lib.rs (1 hunks)
  • derive/tests/skip.rs (1 hunks)
  • generator/src/lib.rs (3 hunks)
  • generator/tests/expected.rs (1 hunks)
  • meta/src/lib.rs (1 hunks)
  • pest/src/choice.rs (1 hunks)
  • pest/src/lib.rs (1 hunks)
  • pest/src/sequence.rs (1 hunks)
  • pest/src/token/mod.rs (1 hunks)
  • pest/src/typed/template/mod.rs (1 hunks)
  • pest/src/typed/template/repetition.rs (1 hunks)
Files not summarized due to errors (1)
  • generator/tests/expected.rs: Error: Message exceeds token limit
Files skipped from review as they are similar to previous changes (8)
  • derive/src/lib.rs
  • derive/tests/skip.rs
  • generator/src/lib.rs
  • meta/src/lib.rs
  • pest/src/lib.rs
  • pest/src/sequence.rs
  • pest/src/token/mod.rs
  • pest/src/typed/template/repetition.rs
Additional comments: 32
derive/examples/reader.rs (1)
  • 6-9: The grammar definition uses a combination of x and whitespace/comment rules. Ensure that the whitespace/comment rule (~) is correctly defined to handle nested comments if that's the intended behavior.
pest/src/choice.rs (3)
  • 4-5: The macro definition correctly allows for optional trailing commas in the macro invocation, aligning with Rust's syntax conventions for enums and structs.
  • 10-39: The implementation of TypedNode for the choice_type enum introduces a pattern matching mechanism for each variant. Ensure that the logic correctly restores the stack snapshot in case of a match failure to prevent side effects on the parsing state.
  • 40-53: The implementation of PairContainer for the choice_type enum correctly delegates the for_each_child_pair method to the variant's implementation. This ensures that child pairs are correctly iterated over, regardless of the choice made.
pest/src/typed/template/mod.rs (24)
  • 25-51: The Str struct and its implementation for TypedNode correctly match a given string case sensitively. Ensure that input.match_string(Self::CONTENT) correctly advances the input position upon a successful match.
  • 60-98: The Insens struct and its implementation for TypedNode correctly match a given string case insensitively. The use of input.match_insensitive(Self::CONTENT) and capturing the matched span is correctly implemented.
  • 110-130: The SkipChar struct and its implementation for TypedNode provide functionality for skipping a fixed number of characters. Ensure that input.skip(N) correctly updates the input position and that the span is accurately captured.
  • 142-165: The CharRange struct and its implementation for TypedNode correctly match a character within a specified range. Ensure that input.match_range(MIN..MAX) correctly advances the input position upon a successful match.
  • 178-191: The Char struct and its implementation for TypedNode correctly match an exact character. Ensure that input.match_char_by(|c| c == CHAR) correctly advances the input position upon a successful match.
  • 201-223: The utility functions constrain_idx and constrain_idxs are used for safely constraining indices within a valid range for slicing operations. Ensure that the logic correctly handles negative indices as offsets from the end of the sequence.
  • 225-245: The stack_slice function correctly creates a slice of the stack based on provided start and end indices. Ensure that the function correctly handles out-of-bound indices by invoking tracker.out_of_bound.
  • 248-265: The peek_spans function correctly matches a sequence of spans against the input without consuming it. Ensure that the function correctly advances the input position upon a successful match.
  • 268-311: The Positive struct and its implementation for TypedNode correctly represent a positive lookahead assertion. Ensure that the stack snapshot is correctly restored regardless of the match outcome.
  • 315-348: The Negative struct and its implementation for TypedNode correctly represent a negative lookahead assertion. Ensure that the stack snapshot is correctly restored regardless of the match outcome.
  • 357-383: The ANY struct and its implementation for TypedNode correctly match any single character. Ensure that input.match_char_by correctly advances the input position upon a successful match.
  • 387-398: The NONE struct and its implementation for TypedNode correctly represent a failure to match any input. This is consistent with the expected behavior for a node that always fails.
  • 402-417: The SOI struct and its implementation for TypedNode correctly match the start of input. Ensure that input.at_start() is correctly used to determine if the current position is at the start.
  • 421-437: The EOI struct and its implementation for TypedNode correctly match the end of input. Ensure that input.at_end() is correctly used to determine if the current position is at the end.
  • 453-477: The NEWLINE struct and its implementation for TypedNode correctly match newline characters with support for different newline types. Ensure that the logic correctly advances the input position upon a successful match.
  • 486-503: The PEEK_ALL struct and its implementation for TypedNode correctly peeks all spans in the stack in reverse order. Ensure that the logic correctly matches the sequence of spans against the input without consuming it.
  • 513-542: The PEEK struct and its implementation for TypedNode correctly peeks the top span in the stack. Ensure that the logic correctly matches the top span against the input without consuming it.
  • 552-570: The DROP struct and its implementation for TypedNode correctly represents an operation to drop the top of the stack. Ensure that the stack is correctly manipulated and that an empty stack is handled gracefully.
  • 575-603: The POP struct and its implementation for TypedNode correctly matches and pops the top span of the stack. Ensure that the logic correctly matches the top span against the input and updates the stack accordingly.
  • 613-634: The POP_ALL struct and its implementation for TypedNode correctly matches and pops all spans in the stack. Ensure that the logic correctly matches the sequence of spans against the input and clears the stack.
  • 644-665: The PUSH struct and its implementation for TypedNode correctly matches an expression and pushes it to the stack. Ensure that the stack is correctly updated with the span of the matched expression.
  • 693-707: The PeekSlice2 struct and its implementation for TypedNode correctly matches a slice of the stack based on provided start and end indices. Ensure that the logic correctly peeks the specified range of spans without consuming input.
  • 712-724: The PeekSlice1 struct and its implementation for TypedNode correctly matches a slice of the stack based on a provided start index. Ensure that the logic correctly peeks the specified range of spans without consuming input.
  • 799-828: The try_handle_trivia function correctly handles optional and mandatory trivia based on the TRIVIA constant. Ensure that the logic correctly parses trivia according to the specified mode and advances the input position accordingly.
generator/tests/expected.rs (4)
  • 37-39: The Trivia module is concise and correctly defines a type alias. No issues here.
  • 40-106: The wrapper module correctly defines wrapper structs for string literals used in parsing. Ensure that all new wrappers are necessary and correctly used in the parsing rules.
  • 109-1028: Each struct representing a rule (e.g., r#Regular, r#Atomic, etc.) correctly implements the necessary traits for typed parsing. Ensure that the grammar definitions in documentation comments accurately reflect the implemented parsing logic.
  • 1030-1037: The generics module correctly re-exports necessary constructs from the pest crate. Ensure that all re-exports are used in the typed parsing rules and that there are no unnecessary re-exports.

Comment on lines 33 to 36
impl ::pest::typed::RuleType for Rule {
const EOI: Self = Rule::EOI;
type Trivia<'i> = trivia::Trivia<'i>;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of RuleType for Rule is correct but consider adding documentation for public traits and their implementations for clarity.

Comment on lines +13 to +27
fn main() -> anyhow::Result<()> {
let mut stdin = stdin().lock();
let mut buf = String::new();
loop {
buf.clear();
stdin.read_line(&mut buf)?;
buf.pop();
if buf.is_empty() {
return Ok(());
}
match rules::main::try_parse(&buf) {
Ok(res) => println!("{}", res.as_pair_tree()),
Err(err) => println!("{err}"),
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main function reads lines from stdin and attempts to parse them until an empty line is encountered. Consider adding error handling for stdin.read_line(&mut buf) to gracefully handle potential I/O errors.

-        stdin.read_line(&mut buf)?;
+        if stdin.read_line(&mut buf).is_err() {
+            eprintln!("Error reading from stdin");
+            return Ok(());
+        }

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
fn main() -> anyhow::Result<()> {
let mut stdin = stdin().lock();
let mut buf = String::new();
loop {
buf.clear();
stdin.read_line(&mut buf)?;
buf.pop();
if buf.is_empty() {
return Ok(());
}
match rules::main::try_parse(&buf) {
Ok(res) => println!("{}", res.as_pair_tree()),
Err(err) => println!("{err}"),
}
}
fn main() -> anyhow::Result<()> {
let mut stdin = stdin().lock();
let mut buf = String::new();
loop {
buf.clear();
if stdin.read_line(&mut buf).is_err() {
eprintln!("Error reading from stdin");
return Ok(());
}
buf.pop();
if buf.is_empty() {
return Ok(());
}
match rules::main::try_parse(&buf) {
Ok(res) => println!("{}", res.as_pair_tree()),
Err(err) => println!("{err}"),
}
}

@TheVeryDarkness
Copy link
Contributor

@tomtau Hello! I've added a grammar test in grammar.pest, would you like to check them or ask someone to check them?

And could you help me by adding more tests or examples?

@tomtau
Copy link
Contributor Author

tomtau commented Feb 15, 2024

@TheVeryDarkness awesome work! I'll try to check it out this weekend. Maybe @dragostis may have some comments? I'll also ask on Discord

@tomtau
Copy link
Contributor Author

tomtau commented Feb 17, 2024

@TheVeryDarkness I tried porting the json parser example; I cannot push on this PR / to your master, so I think I'll later merge this PR and add it there. FYI this is the sample test I wrote (not sure if the typed API is used properly?):

json.pest

//! A parser for JSON file.
//!
//! And this is a example for JSON parser.
json = pest::SOI ~ value ~ pest::EOI

/// Matches object, e.g.: `{ "foo": "bar" }`
/// Foobar
object = "{" ~ pair ~ ("," ~ pair)* ~ "}" | "{" ~ "}"
pair   = string ~ ":" ~ value

array = "[" ~ value ~ ("," ~ value)* ~ "]" | "[" ~ "]"


//////////////////////
/// Matches value, e.g.: `"foo"`, `42`, `true`, `null`, `[]`, `{}`.
//////////////////////
value = string | number | object | array | bool | null

string  = "\"" - inner - "\""
inner   = (!("\"" | "\\") - pest::any)* - (escape ~ inner)?
escape  = "\\" - ("\"" | "\\" | "/" | "b" | "f" | "n" | "r" | "t" | unicode)
// FIXME: should be under pest::*
ascii_hex_digit = ascii_digit | "A" | "B" | "C" | "D" | "E" | "F"
ascii_digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
ascii_nonzero_digit = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
unicode = "u" - ascii_hex_digit{4}

number = "-"? - int - ("." ~ ascii_digit+ ~ exp? | exp)?
int    = "0" | ascii_nonzero_digit - ascii_digit*
exp    = ("E" | "e") - ("+" | "-")? - ascii_digit+

bool = "true" | "false"

null = "null"

~ = (" " | "\t" | "\r" | "\n")

lib.rs

/// Grammar rules of a sample JSON parser
#[allow(missing_docs)]
pub mod json {
    use pest_derive::Parser;
 
    /// JSON parser.
    #[derive(Parser)]
    #[grammar = "grammars/json.pest"]
    pub struct JsonParser;

} 

#[cfg(test)]
mod tests {
    use pest::typed::{PairContainer, TypedParser};

    use crate::json;
    #[test]
    fn json_basic_test() {
        let sample1 = "{\"key\": \"value\"}";
        let s1: json::rules::json = json::JsonParser::try_parse(sample1).unwrap();
        let values = s1.vec_children_pairs();
        assert_eq!(&values[0].rule, &json::Rule::value);
        let obj = &values[0].children;
        assert_eq!(obj[0].rule, json::Rule::object);
        let pair = &obj[0].children;
        assert_eq!(pair[0].rule, json::Rule::pair);
        let key = &pair[0].children;
        assert_eq!(key[0].rule, json::Rule::string);
        assert_eq!(&sample1[key[0].start..key[0].end], "\"key\"");
        let value = &key[1].children;
        assert_eq!(value[0].rule, json::Rule::string);
        assert_eq!(&sample1[value[0].start..value[0].end], "\"value\"");
    }

}

@tomtau
Copy link
Contributor Author

tomtau commented Feb 17, 2024

(just note to myself -- I made a mistake of writing ~ = (" " | "\t" | "\r" | "\n")* which wouldn't terminate)

@TheVeryDarkness
Copy link
Contributor

TheVeryDarkness commented Feb 17, 2024

@TheVeryDarkness I tried porting the json parser example; I cannot push on this PR / to your master, so I think I'll later merge this PR and add it there. FYI this is the sample test I wrote (not sure if the typed API is used properly?):

json.pest

//! A parser for JSON file.
//!
//! And this is a example for JSON parser.
json = pest::SOI ~ value ~ pest::EOI

/// Matches object, e.g.: `{ "foo": "bar" }`
/// Foobar
object = "{" ~ pair ~ ("," ~ pair)* ~ "}" | "{" ~ "}"
pair   = string ~ ":" ~ value

array = "[" ~ value ~ ("," ~ value)* ~ "]" | "[" ~ "]"


//////////////////////
/// Matches value, e.g.: `"foo"`, `42`, `true`, `null`, `[]`, `{}`.
//////////////////////
value = string | number | object | array | bool | null

string  = "\"" - inner - "\""
inner   = (!("\"" | "\\") - pest::any)* - (escape ~ inner)?
escape  = "\\" - ("\"" | "\\" | "/" | "b" | "f" | "n" | "r" | "t" | unicode)
// FIXME: should be under pest::*
ascii_hex_digit = ascii_digit | "A" | "B" | "C" | "D" | "E" | "F"
ascii_digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
ascii_nonzero_digit = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
unicode = "u" - ascii_hex_digit{4}

number = "-"? - int - ("." ~ ascii_digit+ ~ exp? | exp)?
int    = "0" | ascii_nonzero_digit - ascii_digit*
exp    = ("E" | "e") - ("+" | "-")? - ascii_digit+

bool = "true" | "false"

null = "null"

~ = (" " | "\t" | "\r" | "\n")

lib.rs

/// Grammar rules of a sample JSON parser
#[allow(missing_docs)]
pub mod json {
    use pest_derive::Parser;
 
    /// JSON parser.
    #[derive(Parser)]
    #[grammar = "grammars/json.pest"]
    pub struct JsonParser;

} 

#[cfg(test)]
mod tests {
    use pest::typed::{PairContainer, TypedParser};

    use crate::json;
    #[test]
    fn json_basic_test() {
        let sample1 = "{\"key\": \"value\"}";
        let s1: json::rules::json = json::JsonParser::try_parse(sample1).unwrap();
        let values = s1.vec_children_pairs();
        assert_eq!(&values[0].rule, &json::Rule::value);
        let obj = &values[0].children;
        assert_eq!(obj[0].rule, json::Rule::object);
        let pair = &obj[0].children;
        assert_eq!(pair[0].rule, json::Rule::pair);
        let key = &pair[0].children;
        assert_eq!(key[0].rule, json::Rule::string);
        assert_eq!(&sample1[key[0].start..key[0].end], "\"key\"");
        let value = &key[1].children;
        assert_eq!(value[0].rule, json::Rule::string);
        assert_eq!(&sample1[value[0].start..value[0].end], "\"value\"");
    }

}

It's a way similar to pest2's pair api, which is available but is not my favorite. And it's not so important how we make those tests.
By the way, my favorite ways (accesser api and structural api) haven't been fully implemented yet. There are still several functions or impls to be added.

@TheVeryDarkness
Copy link
Contributor

TheVeryDarkness commented Feb 18, 2024

@tomtau I've invited you to be a collaborator of the fork repo.
https://github.com/TheVeryDarkness/pest3/invitations

@TheVeryDarkness
Copy link
Contributor

TheVeryDarkness commented Feb 18, 2024

There may be several things to do.

  • Currently, ~ rule and ^ rule (I mean ~ = ... and ^ = ...) have the same effect, and only one of them can be defined. I don't fully understand those rules, and I didn't find verbose examples about them. If you are familiar with their difference, could you please explain it with grammar examples for me?
  • There is a prototype for module system, but use statements are not processed.
  • Generics rules are not implemented.
  • Typed api can be improved.

@TheVeryDarkness
Copy link
Contributor

(just note to myself -- I made a mistake of writing ~ = (" " | "\t" | "\r" | "\n")* which wouldn't terminate)

Maybe show an error or warning when the trivia is used in some discouraged ways?

@tomtau
Copy link
Contributor Author

tomtau commented Feb 18, 2024

I've invited you to be a collaborator of the fork repo.

Thanks! I may still merge this PR soon, so the follow up fixes are easier to review.

If you are familiar with their difference, could you please explain it with grammar examples for me?

I think the difference is mandatory vs optional. In pest2, those operators would be sort of defined as rules like:

~ = { (COMMENT | WHITESPACE)* }
^ = { (COMMENT | WHITESPACE)+ }

For the mandatory trivia (^), one example where it could be used is to solve this grammar I think:
pest-parser/pest#967

Maybe show an error or warning when the trivia is used in some discouraged ways?

Yes, that's not a priority for the initial prototype, but later, we'd want some analysis checking the grammar for common errors such as non-terminating rules (like what pest2 does, but hopefully it can be done better with a simpler surface grammar).

@dragostis
Copy link
Collaborator

Hi all,

Sorry for the late reply.

I want to start by saying that I'm humbled by the amount of work you put into the project and love the direction! ^_^

Here are the things off the top of my mind after a high-level review of the code:

  • I like the mandatory trivia idea
  • the pest-type work and the way it integrated with the grammar is a great way forward

What I think it potentially a bit of a missed opportunity for pest3 is still making use of the pairs API. Using a just-in-time pull-type parser that would simply be called in the methods generated by pest-typed would very likely give much better results. I'm also trying to keep in mind the case where someone might want to do extremely light processing of the parsed results, e.g. increment a bunch of counters depending on the rules, and this use case should IMO be easy to optimize with an approach like this very close to ideal performance.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 3

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 2f2d341 and 9831d1f.
Files selected for processing (12)
  • derive/tests/grammar.pest (1 hunks)
  • derive/tests/skip.rs (1 hunks)
  • generator/src/common.rs (1 hunks)
  • generator/src/typed/generator.rs (1 hunks)
  • generator/src/typed/output.rs (1 hunks)
  • generator/tests/expected.rs (1 hunks)
  • generator/tests/syntax.pest (1 hunks)
  • pest/src/typed/template/mod.rs (1 hunks)
  • pest/src/typed/template/repetition.rs (1 hunks)
  • pest/src/typed/tracker.rs (1 hunks)
  • pest/src/typed/traits.rs (1 hunks)
  • pest/tests/predefined_node.rs (1 hunks)
Files not summarized due to errors (1)
  • generator/tests/expected.rs: Error: Message exceeds token limit
Files skipped from review as they are similar to previous changes (10)
  • derive/tests/skip.rs
  • generator/src/common.rs
  • generator/src/typed/generator.rs
  • generator/src/typed/output.rs
  • generator/tests/syntax.pest
  • pest/src/typed/template/mod.rs
  • pest/src/typed/template/repetition.rs
  • pest/src/typed/tracker.rs
  • pest/src/typed/traits.rs
  • pest/tests/predefined_node.rs
Additional comments: 7
derive/tests/grammar.pest (3)
  • 12-50: The grammar rules defined from lines 12 to 50 appear to be well-structured and follow the pest syntax correctly. However, it's important to ensure that these rules align with the intended parsing logic and that they are effectively utilized in the typed parsing capabilities introduced in this PR. Additionally, consider adding comments to complex rules to improve maintainability and understandability for future contributors.
  • 73-75: The definitions for the ~ (optional trivia) and ^ (mandatory trivia) on lines 73 and 75 are concise and clear. These rules are essential for handling whitespace and special characters in the grammar. Ensure that their usage throughout the grammar is consistent and that they are applied appropriately in the context of typed parsing.
  • 76-90: The comments and multiline comments from lines 76 to 90 are correctly formatted and demonstrate the use of both line and block comments in pest grammar files. These examples are useful for documentation purposes and for guiding new contributors on how to comment their grammar rules effectively.
generator/tests/expected.rs (4)
  • 33-37: The implementation of the RuleType trait for Rule is correct and follows the expected pattern for typed parsing in pest. This setup allows for the integration of typed parsing capabilities with the existing pest infrastructure.
  • 38-108: The wrapper structs (W0 to W10) and their implementations are correctly defined, providing a typed interface for string literals used in the grammar. Each wrapper struct is documented with the string it represents, which is a good practice for clarity and maintainability.
  • 111-1030: The typed node structures for each rule (r#Regular, r#Atomic, etc.) are implemented correctly, with each struct providing a try_parse_with_partial method for parsing according to the rule's grammar. The use of generics (RepOnce, Sequence3, etc.) in these implementations is appropriate and leverages Rust's type system for compile-time correctness checks. Ensure that each rule's grammar is accurately represented in the implementation and that the documentation comments accurately describe the rule and its grammar.
  • 1032-1038: The generics module correctly re-exports the necessary types from the pest crate, facilitating their use in the typed parsing implementations. This approach keeps the code DRY and makes it easier to maintain and update.

Comment on lines +51 to +71
/*
ascii_digits = ASCII_DIGIT+
ascii_nonzero_digits = ASCII_NONZERO_DIGIT+
ascii_bin_digits = ASCII_BIN_DIGIT+
ascii_oct_digits = ASCII_OCT_DIGIT+
ascii_hex_digits = ASCII_HEX_DIGIT+
ascii_alpha_lowers = ASCII_ALPHA_LOWER+
ascii_alpha_uppers = ASCII_ALPHA_UPPER+
ascii_alphas = ASCII_ALPHA+
ascii_alphanumerics = ASCII_ALPHANUMERIC+
asciis = ASCII+
newline = NEWLINE+
unicode = XID_START - XID_CONTINUE*
SYMBOL = "shadows builtin"

han = HAN+
hangul = HANGUL+
hiragana = HIRAGANA+
arabic = ARABIC+
emoji = EMOJI+
*/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section from lines 51 to 71 is commented out and includes definitions for various ASCII and Unicode character classes, as well as some specific language characters and emoji. If these rules are not intended for use in the current version of the grammar, it's good practice to remove them to avoid confusion. However, if they are planned for future use, consider adding a comment explaining their purpose and the conditions under which they should be uncommented and utilized.

Comment on lines +91 to +94
Invalid segment of grammar below (repeated rule)

WHITESPACE = _"hi"
*/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The segment from lines 91 to 94 is commented out and mentions an "Invalid segment of grammar below (repeated rule)" with a rule definition for WHITESPACE. If this rule is indeed invalid or not required, it's best to remove this segment to prevent confusion. If it's kept for educational or documentation purposes, consider enhancing the comment to clarify the context and reason for its inclusion.

Comment on lines +1 to +32
#[allow(dead_code, missing_docs, non_camel_case_types, clippy::upper_case_acronyms)]
#[derive(Clone, Copy, Debug, Eq, Hash, Ord, PartialEq, PartialOrd)]
pub enum Rule {
EOI,
r#Regular,
r#Atomic,
r#NonAtomic,
r#ExactString,
r#CharRange,
r#Any,
r#Seq,
r#Choice,
r#Rep,
r#RepAtLeastOnce,
r#Opt,
r#RepExact,
r#RepLeft,
r#RepRight,
r#RepLeftRight,
r#Pos,
r#Neg,
r#Push,
r#Pop,
r#PopAll,
r#Peek,
r#PeekUnlimited,
r#PeekLeft,
r#PeekRight,
r#PeekLeftRight,
r#Drop,
r#PeekAll,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Rule enum is well-defined with a comprehensive set of rules. However, ensure that all rules are documented, especially the newly introduced ones, to maintain consistency with existing documentation practices and improve code readability.

@TheVeryDarkness
Copy link
Contributor

(just note to myself -- I made a mistake of writing ~ = (" " | "\t" | "\r" | "\n")* which wouldn't terminate)

Sorry, it's caused by my wrong implementation, and it should have be fixed in fcf8f7d.

@tomtau
Copy link
Contributor Author

tomtau commented Feb 20, 2024

What I think it potentially a bit of a missed opportunity for pest3 is still making use of the pairs API. Using a just-in-time pull-type parser that would simply be called in the methods generated by pest-typed would very likely give much better results. I'm also trying to keep in mind the case where someone might want to do extremely light processing of the parsed results, e.g. increment a bunch of counters depending on the rules, and this use case should IMO be easy to optimize with an approach like this very close to ideal performance.

thanks for the comments @dragostis ! As for the API, I think that's what @TheVeryDarkness mentioned in "By the way, my favorite ways (accesser api and structural api) haven't been fully implemented yet", so we can try to rewrite the tests/examples once they are there.
I'll merge this PR -- @TheVeryDarkness great work! Feel free to open follow up PRs with those TODO changes you mentioned in #1 (comment)

@tomtau tomtau merged commit 7d8add0 into pest-parser:master Feb 20, 2024
TheVeryDarkness added a commit to TheVeryDarkness/pest3 that referenced this pull request Feb 20, 2024
tomtau pushed a commit to TheVeryDarkness/pest3 that referenced this pull request Feb 21, 2024
tomtau pushed a commit that referenced this pull request Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants