Rewrite the parser without Jison #94

RubenVerborgh · 2019-11-06T22:39:47Z

It's simply too slow.

jacoscaz · 2022-02-27T09:53:07Z

Interesting benchmark: https://chevrotain.io/performance/

jacoscaz · 2024-10-24T16:26:55Z

@RubenVerborgh any reason to prefer parser generators that do not require a code generation step (e.g. Chevrotain), or would you be ok with code generation (e.g. ANTLR4)?

rubensworks · 2024-10-24T18:38:07Z

Aha, I wasn't aware of this issue.
As a coincidence, our new colleague @jitsedesmet just started investigating this a couple of days ago.

@RubenVerborgh Jitse will gather your thoughts about this soon.
@jacoscaz If you would like to be involved in this effort, do let us know :-)

RubenVerborgh · 2024-10-24T19:20:24Z

@jacoscaz No reason whatsoever, back in the day jison was just the quickest thing that helped me push this out.
I always wanted to hand-roll a parser (like with N3.js), so the jison code was only meant to last one summer anyway.

That summer was 10 years ago last August 😂

jacoscaz · 2024-10-24T19:58:15Z

Hello @RubenVerborgh and @rubensworks (and @jitsedesmet)!

If someone's already looking at this from your side @rubensworks it's probably a bad idea for me to get involved as (I think?) you have the advantage of geographical proximity. I'm also much less competent in SPARQL; I should be considered a last resort :)

I came back to this as, coincidentally, I've recently worked on a completely unrelated project that also required a parser; taking care of this one too would not have been that great of a context switch. We went with ANTLR4 after looking at quite a few metric, mainly performance but also long-term maintainability, and we're really happy with the outcome.

jitsedesmet · 2024-10-24T20:13:51Z

Hey @jacoscaz you already know my work though ;)

I started with Chevrotain, mainly for the performance, but I like the other benefits like expendability, fault tolerance and those visualizations look nice too. That being said, I have not yet looked at ANTLR4 beyond the code example implemented by chevrotain, so I will look closer tomorrow :D

My first impression of it was "oh but then you have new syntax".
(Being in a few hours into writing chevrotain code, I start to doubt that maybe I want new syntax xD)

jacoscaz · 2024-10-24T20:31:50Z

you already know my work though ;)

That's precisely why I should be considered a last resort, your domain expertise is vastly greater than mine @jitsedesmet!

As for Chevrotain, we found it to be much faster than everything else on V8 (Node.js, Chrome and Chromium-based browsers) but we found ANTLR4 to perform much more consistently across different runtimes, even though roughly 1.5~ slower in absolute terms than Chevrotain on V8. However, we also did notice a high degree of specificity to the grammar we were using, with different combinations of grammar and runtime leading to significant performance differences.

If there's a subset of SPARQL that manages to be small enough so that developing the respective grammar doesn't take too long while still remaining representative of the entirety of SPARQL from a complexity perspective I would try to get that going in ANTLR4, Chevrotain and a few others and see how they fare.

EDIT: even now, running the Chevrotain benchmarks linked above in Safari on a MacBook Pro M1 has ANTLR4 almost twice as fast as Chevrotain whereas running in Chrome on the same machine sees Chevrotain hitting almost 4x the amount of ops/sec that ANTLR4 hits.

RubenVerborgh · 2024-10-25T08:22:58Z

@jitsedesmet We do need to make a good cost/benefit analysis for this specific case—and our future plans with SPARQL—of using parser generator versus hand-rolling a parser.

Both options come with different maintainability characteristics. I.e., some kinds of maintenance/evolution tasks are easier with one codebase versus another. So let's have a close look, because as you can see above, codebases tend to live longer than you'd want 😅

RubenVerborgh · 2024-10-25T08:24:42Z

Although Chevrotain seems to strike a great balance; there is always the option to conditionally (!) include certain rules, our auto-generate some, which is… powerful.

jacoscaz · 2024-10-25T08:45:53Z

we do need to make a good cost/benefit analysis for this specific case—and our future plans with SPARQL—of using parser generator versus hand-rolling a parser

Contrary to a lot of popular wisdom, I personally think that hand-rolling is (almost) always going to prove superior in the long term (assuming competent programmers, which in this case is a more than safe assumption). However, how long that long term might be is very context-specific.

Given the success of n3, which manages to deal with the complexities of supporting multiple similar formats, the fact that SPARQL doesn't change all that quickly and the fact that the current Jison implementation lasted more than most web frameworks do, hand-rolling should be considered first IMHO.

rubensworks · 2024-10-25T08:49:07Z

the fact that SPARQL doesn't change all that quickly

This assumption is likely to change after SPARQL 1.2 comes out, as the W3C WG is planning to be transformed into a maintenance group that could lead to more quickly evolving SPARQL (and RDF) spec versions.
(this is one of the main reasons why @jitsedesmet is investigating this)

RubenVerborgh · 2024-10-25T08:52:59Z

the fact that SPARQL doesn't change all that quickly

This assumption is likely to change after SPARQL 1.2 comes out,

And we're also writing this parser for research purposes; we want the liberty to quickly add keywords and constructs to test things and make proposals.

jitsedesmet · 2024-10-25T08:59:00Z

Correct, one of the main requirements of this project is expandability/ modularity/ modifications.
Example: Comunica has some new date implementations but cannot support the new adjust function because the parser does not support it.
It would be nice to be able to swap in different parser implementations. (and those implementations not to constantly be written from scratch)

At first I also thought "hand writing this cannot be that hard", but the fault tolerance, for example, already looks hard to me. A lib that can ease our lives in that regard will be nice.

jacoscaz · 2024-10-25T09:16:10Z

This assumption is likely to change after SPARQL 1.2 comes out, as the W3C WG is planning to be transformed into a maintenance group that could lead to more quickly evolving SPARQL (and RDF) spec versions.

And we're also writing this parser for research purposes; we want the liberty to quickly add keywords and constructs to test things and make proposals.

Correct, one of the main requirements of this project is expandability/ modularity/ modifications.

Nothing like additional context to change my mind! Yeah, given the above hand-rolling would be much harder to justify. I'm somewhat worried to hear that SPARQL might be subject to a faster pace of change but that's a different matter entirely.

All right, I'll follow this from the sidelines but happy to exchange notes whenever!

RubenVerborgh added the enhancement label Nov 6, 2019

RubenVerborgh mentioned this issue Nov 6, 2019

Browser performance issue on iterative property access LDflex/Query-Solid#45

Open

jitsedesmet mentioned this issue Oct 30, 2024

Custom / Extension Aggregates comunica/comunica#1456

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite the parser without Jison #94

Rewrite the parser without Jison #94

RubenVerborgh commented Nov 6, 2019

jacoscaz commented Feb 27, 2022

jacoscaz commented Oct 24, 2024

rubensworks commented Oct 24, 2024

RubenVerborgh commented Oct 24, 2024 •

edited

Loading

jacoscaz commented Oct 24, 2024 •

edited

Loading

jitsedesmet commented Oct 24, 2024

jacoscaz commented Oct 24, 2024 •

edited

Loading

RubenVerborgh commented Oct 25, 2024

RubenVerborgh commented Oct 25, 2024

jacoscaz commented Oct 25, 2024

rubensworks commented Oct 25, 2024

RubenVerborgh commented Oct 25, 2024

jitsedesmet commented Oct 25, 2024

jacoscaz commented Oct 25, 2024

Rewrite the parser without Jison #94

Rewrite the parser without Jison #94

Comments

RubenVerborgh commented Nov 6, 2019

jacoscaz commented Feb 27, 2022

jacoscaz commented Oct 24, 2024

rubensworks commented Oct 24, 2024

RubenVerborgh commented Oct 24, 2024 • edited Loading

jacoscaz commented Oct 24, 2024 • edited Loading

jitsedesmet commented Oct 24, 2024

jacoscaz commented Oct 24, 2024 • edited Loading

RubenVerborgh commented Oct 25, 2024

RubenVerborgh commented Oct 25, 2024

jacoscaz commented Oct 25, 2024

rubensworks commented Oct 25, 2024

RubenVerborgh commented Oct 25, 2024

jitsedesmet commented Oct 25, 2024

jacoscaz commented Oct 25, 2024

RubenVerborgh commented Oct 24, 2024 •

edited

Loading

jacoscaz commented Oct 24, 2024 •

edited

Loading

jacoscaz commented Oct 24, 2024 •

edited

Loading