Refactor KQL parser into common base and language extensions #1734

albertlockett · 2026-01-07T17:12:00Z

In #1722 we added an if/else expression to the KQL parser that may not be supported by all engines that execute some KQL query.

Seeing as this is a departure from standard KQL, and there may be other new expressions added in the future (such as route_ro), a need was identified to have a common base grammar with the ability to add/parse language extensions. This PR implements this capability.

The pest grammar is split into base.pest and kql.pest, and the KQL parser now uses both these grammar files using pest's "load multiple grammars" feature.

A second parser is added for a hypothetical, KQL-inpisred query-language that will support the if/else expression and other future extensions. This is added as a new crate under rust/otal-dataflow/crates/opl. It also uses base.pest, and another grammar file called opl.pest where future language extensions will be added.

Making the parser functions generic:

There are many parser utility functions in the kql-parser crate that ideally all KQL derived parses could use. Inside these functions, there are many checks for which variant of the Rule enum (derived by pest_derive::Parser) is being handled. For example:

otel-arrow/rust/experimental/query_engine/kql-parser/src/scalar_expression.rs

Lines 353 to 361 in 5d48012

    
           Rule::conditional_unary_expressions => parse_conditional_unary_expressions(rule, scope)?, 
        
           Rule::conversion_unary_expressions => parse_conversion_unary_expressions(rule, scope)?, 
        
           Rule::string_unary_expressions => parse_string_unary_expressions(rule, scope)?, 
        
           Rule::parse_unary_expressions => parse_parse_unary_expressions(rule, scope)?, 
        
           Rule::array_unary_expressions => parse_array_unary_expressions(rule, scope)?, 
        
           Rule::math_unary_expressions => parse_math_unary_expressions(rule, scope)?, 
        
           Rule::temporal_unary_expressions => parse_temporal_unary_expressions(rule, scope)?, 
        
           Rule::logical_unary_expressions => parse_logical_unary_expressions(rule, scope)?, 
        
           Rule::extract_json_expression => {

One challenge in making these functions generic is that they need to operate on a concrete enum type, and pest_derive::Parser proc macro generates a different Rule enum for every parser.

The solution in this PR is to derive a base pest_derive::Parser (and hence, the associated Rule enum), for the base.pest grammar. This is done in kql-parser/src/base_parser.rs. The parser utility functions for each expression are then made generic over a type of pest Rule that can be converted into base_parser::Rule. A trait is provided called TryAsBaseRule that encapsulates the conversion logic, so in this PR we see many functions changed to take a generic R where pest::iterators::Pair<'_, R> implements TryAsBaseRule.

Implementing the TryInto for some derived Rule into base_parser::Rule would be somewhat tedious to do by hand, because we'd need to write a match with a branch for every variant of the enum, and update these conversion functions each time we add a new rule to base.pest. To avoid this, a procedural macro is created to generate this conversion code. The macro lives in kql-parser/src/macros, and parsers simply need to use this proc macro to make their rules compatible with the generic parser functions:

#[derive(Parser, BaseRuleCompatible)]
#[grammar = "base.pest"]
#[grammar = "kql.pest"]
pub(crate) struct KqlPestParser;

One additional challenge related to parser functions generic is that the scalar_expression::parse_scalar_expression uses a PrattParser, which takes the Pair as its argument. Pest doesn't provide any simple way convert a Pair<'_, R> to Pair<'_, base_parser::Rule>, which means that the generic parse_scalar_expression function also needs a way to generically create a PrattParser that accepts Pair<'_, R>.

To handle that challenge, the BaseRuleCompatible procedural macro also creates a PrattParser for the derived Rule, and implements a trait for the Rule enum that can be used to access the PrattParser. The trait is called ScalarExprRules, and so this trait bound is also added to the generic parser function signatures.

codecov · 2026-01-07T17:19:10Z

Codecov Report

❌ Patch coverage is 95.58600% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.07%. Comparing base (32a6fbb) to head (f608b86).
⚠️ Report is 18 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1734      +/-   ##
==========================================
- Coverage   84.08%   84.07%   -0.02%     
==========================================
  Files         469      474       +5     
  Lines      137651   136635    -1016     
==========================================
- Hits       115746   114876     -870     
+ Misses      21371    21225     -146     
  Partials      534      534

Components	Coverage Δ
otap-dataflow	`85.32% <81.05%> (-0.03%)`	⬇️
query_abstraction	`80.61% <ø> (ø)`
query_engine	`90.42% <98.04%> (+0.03%)`	⬆️
syslog_cef_receivers	`∅ <ø> (∅)`
otel-arrow-go	`53.50% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lquerel

LGTM.

But before merging, I would like to get feedback from @CodeBlanch and @drewrelmas given the scale of the impact on KQL.

drewrelmas

I've not had sufficient bandwidth to do an in-depth review of this change yet, but my initial impression is that this is the right direction! I would definitely prefer moving forward with this first over #1722.

I think this is some confirmation that the expression tree / IL approach we took initially will pay off and allow any future extension to other languages. I envision the eventual processor to allow selection of parser in input configuration (or multiple 'contrib' processors built around specific parsers that leverage the same engine internally) to meet different end-user requirements.

albertlockett · 2026-01-07T22:26:35Z

Converting to draft after feedback from Jan 7th SIG meeting - this needs some refactoring

albertlockett · 2026-01-08T14:40:33Z

On the Jan 7th SIG call, we came to the conclusion that sharing the Pest grammar was untenable and it would be best if an OPL parser had its own Pest grammar.

There was thinking at the time that maybe we could still share the parser code, which translates the Pest rules into our expression AST. On the surface, this seems desirable because there's a lot of code in the kql-parser crate that would need to be duplicated otherwise.

However as I dig into this more, I'm realizing that without a shared grammar, sharing this parser code is probably not the best approach, and OPL should probably just implement its own parser. In the paragraphs below I'll explain why.

In the shared parser code model, I imagine we'll make our parser utilities accept derived Rules from either the KQL or OPL parser (as was implemented in this PR). Effectively, this would mean that we implement TryInto<kql_parser::Rule> for opl_parser::Rule, which simply converts one Rule to the other for the enum variant with the same name. Our parser functions are generic over R: TryInto<kql_parser::Rule>.

What's not captured anywhere in this scheme is the hierarchical relationship between the rules and how the parser handles them, and this could result in subtle bugs.

For example, consider how kql-parser parses null literals. The rule hierarchy looks like scalar_expression -> scalar_unary_expression -> type_unary_expression -> null_literal. Accordingly, when parsing we call scalar_expression::parse_scalar_unary_expression which matches the rule to type_unary_expression which then calls scalar_primitive_expressions::parse_type_unary_expression, which then handles each typed scalar variant:

otel-arrow/rust/experimental/query_engine/kql-parser/src/scalar_expression.rs

Lines 69 to 75 in 9b0aa84

    
           pub(crate) fn parse_scalar_expression( 
        
               scalar_expression_rule: Pair<Rule>, 
        
               scope: &dyn ParserScope, 
        
           ) -> Result<ScalarExpression, ParserError> { 
        
               PRATT_PARSER 
        
                   .map_primary(|primary| match primary.as_rule() { 
        
                       Rule::scalar_unary_expression => parse_scalar_unary_expression(primary, scope),

otel-arrow/rust/experimental/query_engine/kql-parser/src/scalar_primitive_expressions.rs

Lines 13 to 31 in 9b0aa84

    
           pub(crate) fn parse_type_unary_expressions( 
        
               type_unary_expressions_rule: Pair<Rule>, 
        
           ) -> Result<StaticScalarExpression, ParserError> { 
        
               let rule = type_unary_expressions_rule.into_inner().next().unwrap(); 
        
               Ok(match rule.as_rule() { 
        
                   Rule::null_literal => parse_standard_null_literal(rule), 
        
                   Rule::real_expression => parse_real_expression(rule)?, 
        
                   Rule::datetime_expression => parse_datetime_expression(rule)?, 
        
                   Rule::time_expression => parse_timespan_expression(rule)?, 
        
                   Rule::regex_expression => parse_regex_expression(rule)?, 
        
                   Rule::dynamic_expression => parse_dynamic_expression(rule)?, 
        
                   Rule::true_literal | Rule::false_literal => parse_standard_bool_literal(rule), 
        
                   Rule::double_literal => parse_standard_double_literal(rule, None)?, 
        
                   Rule::integer_literal => parse_standard_integer_literal(rule)?, 
        
                   Rule::string_literal => parse_string_literal(rule), 
        
                   _ => panic!("Unexpected rule in type_unary_expressions: {rule}"), 
        
               }) 
        
           }

Let's say for some reason that KQL needs needs to reorganize its grammar, and null_literal becomes a child of scalar_unary_expression:

diff --git a/rust/experimental/query_engine/kql-parser/src/kql.pest b/rust/experimental/query_engine/kql-parser/src/kql.pest
index c2bc4ced..9d65cd1f 100644
--- a/rust/experimental/query_engine/kql-parser/src/kql.pest
+++ b/rust/experimental/query_engine/kql-parser/src/kql.pest
@@ -134,8 +134,7 @@ dynamic_map_expression = { "{" ~ (dynamic_map_item_expression ~ ("," ~ dynamic_m
 dynamic_inner_expression = _{ dynamic_array_expression|dynamic_map_expression|type_unary_expressions }
 dynamic_expression = { "dynamic" ~ "(" ~ dynamic_inner_expression ~ ")" }
 type_unary_expressions = {
-    null_literal
-    | real_expression
+    real_expression
     | datetime_expression
     | time_expression
     | regex_expression
@@ -237,7 +236,8 @@ backwards. For example if integer_literal is defined before time_expression "1h"
 would be parsed as integer_literal(1) and the remaining "h" would be fed into
 the next rule. */
 scalar_unary_expression = {
-    type_unary_expressions
+    null_literal
+    | type_unary_expressions
     | get_type_expression
     | conditional_unary_expressions
     | conversion_unary_expressions
diff --git a/rust/experimental/query_engine/kql-parser/src/scalar_expression.rs b/rust/experimental/query_engine/kql-parser/src/scalar_expression.rs
index 0f45a327..36d56325 100644
--- a/rust/experimental/query_engine/kql-parser/src/scalar_expression.rs
+++ b/rust/experimental/query_engine/kql-parser/src/scalar_expression.rs
@@ -296,6 +296,9 @@ pub(crate) fn parse_scalar_unary_expression(
     let rule = scalar_unary_expression_rule.into_inner().next().unwrap();
 
     Ok(match rule.as_rule() {
+        Rule::null_literal => {
+            ScalarExpression::Static(parse_standard_null_literal(rule))
+        }
         Rule::type_unary_expressions => {
             ScalarExpression::Static(parse_type_unary_expressions(rule)?)
         }
diff --git a/rust/experimental/query_engine/kql-parser/src/scalar_primitive_expressions.rs b/rust/experimental/query_engine/kql-parser/src/scalar_primitive_expressions.rs
index 58db3601..6a33990a 100644
--- a/rust/experimental/query_engine/kql-parser/src/scalar_primitive_expressions.rs
+++ b/rust/experimental/query_engine/kql-parser/src/scalar_primitive_expressions.rs
@@ -16,7 +16,6 @@ pub(crate) fn parse_type_unary_expressions(
     let rule = type_unary_expressions_rule.into_inner().next().unwrap();
 
     Ok(match rule.as_rule() {
-        Rule::null_literal => parse_standard_null_literal(rule),
         Rule::real_expression => parse_real_expression(rule)?,
         Rule::datetime_expression => parse_datetime_expression(rule)?,
         Rule::time_expression => parse_timespan_expression(rule)?,

This works, and all the kql-parser tests will pass. However, if OPL had been using the same organization of its grammar rules, the parsing using the shared code would fail for OPL unless it also made the same adjustment to its grammar (not only would it fail, it would panic at scalar_primitive_expressions.rs::28).

This brings me to my first point, which is that without a shared grammar either:

a) the organization of the rules becomes an immutable contract between kql-parser and crates that share its parser code. We'd probably consider this is untenable as it places undue restrictions on kql-parser's ability to adapt its own grammar
b) OPL parser needs to have its own set of parser tests to catch these issues, which means that all the test cases from kql-parser get duplicated anyway even if we share the parser code.

The second difficulty I see in sharing this parser code is that OPL may wish to make modifications to expressions relatively deep within the expression tree. For example, OPL might wish to support untyped null literal (whereas KQL requires parsing null as string(null)), or string interpolation..

When parsing strings, for example, we wind up in a call stack like:

...
scalar_expression::parse_scalar_expression
scalar_expression::parse_scalar_unary_expression
scalar_primitive_expressions::parse_type_unary_expressions
scalar_primitive_expressions::parse_string_literal

and at the bottom of this call stack, we need to call some custom OPL specific string parsing code. To accommodate this parse_string_literal could becomes generic over the Rule which implements some trait for string parsing. From the perspective of kql-parser crate this adds a complexity, especially as more trait methods are added for custom parsing behaviour. Note that the more custom parsing behaviour that is introduced, the more complex it becomes to support and the more dubious are the benefits of sharing the parser code in the first place. At the point it stops being worth it, the complexity actually makes it harder back out.

TL;DR - without a shared grammar, sharing the parser code would lead to a brittle parser implementation OPL unless it implements its own test suite (which means half the code from kql-parser kind of gets duplicated anyway) and also introduces extra complexity into the kql-parser to support custom behaviour for certain types. Given these drawbacks, and the benefits/desire to be masters of our own destiny, I propose OPL just implement its own parser.

albertlockett added 9 commits January 7, 2026 11:37

stash if/else code to avoid conflict

97175df

started splitting parser to add base grammar

9a6f01b

split parser into generic base and kql overrides

0e03708

added proc macro to generate rule conversion

11c4b7e

Add the pratt parser derivation to the proc macro

acad07d

there's a working OPL parser that uses common code

57a9d9d

code cleanup and comments

1c597c8

clippy, format and tests for macro

ddcdd9a

fix docs test

d7b8740

albertlockett requested a review from a team as a code owner January 7, 2026 17:12

github-project-automation bot added this to OTel-Arrow Jan 7, 2026

github-actions bot added rust Pull requests that update Rust code query-engine Query Engine / Transform related tasks query-engine-kql KQL usage of Query Engine labels Jan 7, 2026

albertlockett mentioned this pull request Jan 7, 2026

Add if/else if/else expression to KQL parser #1722

Closed

format parser abstraction

f608b86

lquerel approved these changes Jan 7, 2026

View reviewed changes

drewrelmas reviewed Jan 7, 2026

View reviewed changes

albertlockett marked this pull request as draft January 7, 2026 22:26

albertlockett closed this Jan 8, 2026

github-project-automation bot moved this to Done in OTel-Arrow Jan 8, 2026

This was referenced Jan 8, 2026

OPL Parser #1743

Closed

KQL Parsing support different 'flavors' of language when parsing #1728

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor KQL parser into common base and language extensions #1734

Refactor KQL parser into common base and language extensions #1734

Uh oh!

albertlockett commented Jan 7, 2026 •

edited

Loading

Uh oh!

codecov bot commented Jan 7, 2026 •

edited

Loading

Uh oh!

lquerel left a comment

Uh oh!

drewrelmas left a comment

Uh oh!

albertlockett commented Jan 7, 2026

Uh oh!

albertlockett commented Jan 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	Rule::conditional_unary_expressions => parse_conditional_unary_expressions(rule, scope)?,
	Rule::conversion_unary_expressions => parse_conversion_unary_expressions(rule, scope)?,
	Rule::string_unary_expressions => parse_string_unary_expressions(rule, scope)?,
	Rule::parse_unary_expressions => parse_parse_unary_expressions(rule, scope)?,
	Rule::array_unary_expressions => parse_array_unary_expressions(rule, scope)?,
	Rule::math_unary_expressions => parse_math_unary_expressions(rule, scope)?,
	Rule::temporal_unary_expressions => parse_temporal_unary_expressions(rule, scope)?,
	Rule::logical_unary_expressions => parse_logical_unary_expressions(rule, scope)?,
	Rule::extract_json_expression => {

Refactor KQL parser into common base and language extensions #1734

Refactor KQL parser into common base and language extensions #1734

Uh oh!

Conversation

albertlockett commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lquerel left a comment

Choose a reason for hiding this comment

Uh oh!

drewrelmas left a comment

Choose a reason for hiding this comment

Uh oh!

albertlockett commented Jan 7, 2026

Uh oh!

albertlockett commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

albertlockett commented Jan 7, 2026 •

edited

Loading

codecov bot commented Jan 7, 2026 •

edited

Loading

albertlockett commented Jan 8, 2026 •

edited

Loading