-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite the parser without Jison #94
Comments
Interesting benchmark: https://chevrotain.io/performance/ |
@RubenVerborgh any reason to prefer parser generators that do not require a code generation step (e.g. Chevrotain), or would you be ok with code generation (e.g. ANTLR4)? |
Aha, I wasn't aware of this issue. @RubenVerborgh Jitse will gather your thoughts about this soon. |
@jacoscaz No reason whatsoever, back in the day jison was just the quickest thing that helped me push this out. That summer was 10 years ago last August 😂 |
Hello @RubenVerborgh and @rubensworks (and @jitsedesmet)! If someone's already looking at this from your side @rubensworks it's probably a bad idea for me to get involved as (I think?) you have the advantage of geographical proximity. I'm also much less competent in SPARQL; I should be considered a last resort :) I came back to this as, coincidentally, I've recently worked on a completely unrelated project that also required a parser; taking care of this one too would not have been that great of a context switch. We went with ANTLR4 after looking at quite a few metric, mainly performance but also long-term maintainability, and we're really happy with the outcome. |
Hey @jacoscaz you already know my work though ;) I started with Chevrotain, mainly for the performance, but I like the other benefits like expendability, fault tolerance and those visualizations look nice too. That being said, I have not yet looked at ANTLR4 beyond the code example implemented by chevrotain, so I will look closer tomorrow :D My first impression of it was "oh but then you have new syntax". |
That's precisely why I should be considered a last resort, your domain expertise is vastly greater than mine @jitsedesmet! As for Chevrotain, we found it to be much faster than everything else on V8 (Node.js, Chrome and Chromium-based browsers) but we found ANTLR4 to perform much more consistently across different runtimes, even though roughly 1.5~ slower in absolute terms than Chevrotain on V8. However, we also did notice a high degree of specificity to the grammar we were using, with different combinations of grammar and runtime leading to significant performance differences. If there's a subset of SPARQL that manages to be small enough so that developing the respective grammar doesn't take too long while still remaining representative of the entirety of SPARQL from a complexity perspective I would try to get that going in ANTLR4, Chevrotain and a few others and see how they fare. EDIT: even now, running the Chevrotain benchmarks linked above in Safari on a MacBook Pro M1 has ANTLR4 almost twice as fast as Chevrotain whereas running in Chrome on the same machine sees Chevrotain hitting almost 4x the amount of ops/sec that ANTLR4 hits. |
@jitsedesmet We do need to make a good cost/benefit analysis for this specific case—and our future plans with SPARQL—of using parser generator versus hand-rolling a parser. Both options come with different maintainability characteristics. I.e., some kinds of maintenance/evolution tasks are easier with one codebase versus another. So let's have a close look, because as you can see above, codebases tend to live longer than you'd want 😅 |
Although Chevrotain seems to strike a great balance; there is always the option to conditionally (!) include certain rules, our auto-generate some, which is… powerful. |
Contrary to a lot of popular wisdom, I personally think that hand-rolling is (almost) always going to prove superior in the long term (assuming competent programmers, which in this case is a more than safe assumption). However, how long that long term might be is very context-specific. Given the success of |
This assumption is likely to change after SPARQL 1.2 comes out, as the W3C WG is planning to be transformed into a maintenance group that could lead to more quickly evolving SPARQL (and RDF) spec versions. |
And we're also writing this parser for research purposes; we want the liberty to quickly add keywords and constructs to test things and make proposals. |
Correct, one of the main requirements of this project is expandability/ modularity/ modifications. At first I also thought "hand writing this cannot be that hard", but the fault tolerance, for example, already looks hard to me. A lib that can ease our lives in that regard will be nice. |
Nothing like additional context to change my mind! Yeah, given the above hand-rolling would be much harder to justify. I'm somewhat worried to hear that SPARQL might be subject to a faster pace of change but that's a different matter entirely. All right, I'll follow this from the sidelines but happy to exchange notes whenever! |
It's simply too slow.
The text was updated successfully, but these errors were encountered: