- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10
Adds Pest Grammar to Ion Conversion #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The purpose of this commit is the start of a tool to generate Ion formatted ASTs from Pest grammars, particularly that of PartiQL. We could then easily process the grammar outside of Rust in scripts that could be used to generate things like TextMate grammars for IDE support or LaTeX fragments for the specification document. Adds the `pestion` crate to provide a simple trait `PestToElement` and implementations over the Pest AST and `&str` syntax definitions to create Ion `Element` serialized forms. Resolves partiql#35.
| Codecov Report
 @@            Coverage Diff             @@
##             main      #38      +/-   ##
==========================================
- Coverage   90.66%   89.42%   -1.24%     
==========================================
  Files           7       11       +4     
  Lines         482      700     +218     
==========================================
+ Hits          437      626     +189     
- Misses         45       74      +29     
 Continue to review full report at Codecov. 
 | 
        
          
                pestion/Cargo.toml
              
                Outdated
          
        
      | @@ -0,0 +1,29 @@ | |||
| [package] | |||
| name = "pestion" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless this is a pun/reference that I'm missing, I suggest naming this pest-ion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We chatted about this offline, but the idea here was just a portmanteau of Pest and Ion into something that is pronounced like "question". I was trying to avoid a hyphenated name, so let's get a second opinion from someone like @dlurton / @alancai98 / @abhikuhikar to see if they don't like the name and suggest any alternatives.
If we do like it, I will add something to the README. If we don't I can refactor the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight preference to pest-ion so it doesn't get too closely associated with something David sent offline: https://www.pestions.com/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possibility is pest2ion. It avoids the hyphen and any association with that pest control website.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the consensus is pest-ion so I will refactor it.  The problem with pest2ion is that it will look very jarring as a crate (e.g. use pest2ion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to pest-ion.
        
          
                pestion/src/lib.rs
              
                Outdated
          
        
      | //! // parse a Pest grammar and convert it to Ion element | ||
| //! let element = r#"a = @{ "a" | "b" ~ "c" }"#.pest_to_element()?; | ||
| //! | ||
| //! // the grammar is a struct containing a field for each rule | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These examples are fine, but I think it would be more immediately illuminating to dump the element to pretty Ion text so folks could see how it's being modeled.
I'm not sure what the most idiomatic way to show that in a doc comment is. With a println! and a comment showing the expected output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I can replicate this by parsing an element from Ion text and comparing the two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 783de7f
        
          
                pestion/src/lib.rs
              
                Outdated
          
        
      | (text_token("type"), self.ty.pest_to_element()?), | ||
| (text_token("expression"), self.expr.pest_to_element()?), | ||
| ]; | ||
| Ok(Self::Element::new_struct(fields)) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Element::new_struct takes an iterator:
    /// Builds a `struct` from Struct using Builder.
    fn new_struct<K, V, I>(structure: I) -> Self::Element
    where
        K: Into<Self::SymbolToken>,
        V: Into<Self::Element>,
        I: IntoIterator<Item = (K, V)>;You could make fields into a slice to save yourself an allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not make it into a slice because I need ownership of the pair (I would get a &(_, _)) that we construct the struct with (without a clone), However, I am able to write it as:
        let fields = std::array::IntoIter::new([
            (text_token("type"), self.ty.pest_to_element()?),
            (text_token("expression"), self.expr.pest_to_element()?),
        ]);
        Ok(Self::Element::new_struct(fields))Note that the explicit IntoIter::new is needed because sadly, arrays don't implement IntoIter...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 783de7f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The explicit IntoIter::new is a bit unfortunate, but happily it looks like arrays will get a stable implementation of IntoIterator in the next Rust release (v1.53), which I believe happens in ~2 weeks.
        
          
                pestion/src/lib.rs
              
                Outdated
          
        
      | Expr::Str(text) => vec![ | ||
| text_token("string").into(), | ||
| text_token("exact").into(), | ||
| String(text).into(), | ||
| ], | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Element::new_sexp takes an iterator:
    /// Builds a `sexp` from Sequence using Builder.
    fn new_sexp<I: IntoIterator<Item = Self::Element>>(seq: I) -> Self::Element;so each of these vec![]s could be a slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, this is challenging because the slice iterator will yield &OwnedElement which is not compatible to Iterator<Item = OwnedElement>, I could rewrite this using ArrayVec/SmallVec, but I don't know if that is worth it.  What do you think about that, is there something I am missing with the slice suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a factoring with SmallVec, could not do it with ArrayVec because of the limitations in that data type (specifically Copy requirements).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 783de7f
| { | ||
| a: { | ||
| type: non_atomic, | ||
| expression: (predicate positive (identifier "b")) | ||
| }, | ||
| b: { | ||
| type: non_atomic, | ||
| expression: (predicate negative (string exact "hi")) | ||
| } | ||
| }"# | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the intent is to generate BNF, it would make sense output that in the same order they were defined in the pest grammar, but structs are unordered... so... just wondering how you would control the ordering of the production rules in your BNF in that case.
I guess one could work around this by using the reader API of their favorite Ion library, but in the case of an element API the order may not be preserved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rules are not ordered and PEG (and BNF) rules are declarative.  I could've modeled this as a list of rules where the name was a field in the struct, but I hoisted them up to the top-level because I figured that made more logical sense.
In the above example, it shouldn't matter what the order of a and b, the meaning is the same.
| (sequence | ||
| (sequence | ||
| (string exact "a") | ||
| (string insensitive "b") | ||
| ) | ||
| (string exact "c") | ||
| ) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be flattened?
i.e. something like:
(sequence (string exact "a") (string insensitive "b") (string exact "c"))
It seems like that would be a more useful representation of this production rule. I'm guess not sure why the nesting is needed. The BNF would be flattened, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a TODO (see #37).
As per feedback from @zslayton. Renames `PestToElement` to `TryPestToElement` which captures the fallible conversion (e.g. from `str` slices) and makes `PestToElement` be an infallible version of the API. This is a more correct factoring, but also needed to avoid the intermediate copy for `Vec<AstNode>` conversion. Changes `AstRule` structure generation to use an array as it has a fixed number of fields. Adds `smallvec` to make it easier to model dynamically sized yet stack allocated fields for sub-structures for `Expr` conversion. Adjusts the doc test to be more illustrative.
Updates surrounding meta-data and README. Also renames the following types: * `PestionResult` => `PestToIonResult` * `PestionError` => `PestToIonError`
The purpose of this is the beginning of a tool to generate Ion ASTs for Pest grammars. This leverages the Pest AST in the
pest_metacrate to construct a reasonable Ion representation. The idea is that this could be bridged to a CLI and then piped into a script to generate things like TextMate grammars and the LaTeX fragments for our specification.Adds the
pestioncrate to provide a simple traitPestToElementandimplementations over the Pest AST and
&strsyntax definitions tocreate Ion
Elementserialized forms.Resolves #35.
Here is a rendering of the doc comment example:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.