Replies: 1 comment 6 replies
-
I think this might help to understand what is going on: https://pointfreeco.github.io/swift-parsing/main/documentation/parsing/stringabstractions |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi! I have written a small HTML lexer to convert any string to tokens representing whatever text and HTML elements it contains. The intended use case is to parse strings that use HTML for formatting, but it can also tokenize full HTML documents. I have previously written a handcrafted lexer using an API similar to Scanner, but which uses a custom collection wrapper instead of Scanner to gain some performance.
The code for both lexers are in the
swift-parsing
branch of this repo:https://github.com/BjornRuud/HTMLLexer/tree/swift-parsing
Benchmarking the two lexers the swift-parsing one is much slower than the handcrafted one. Tokenizing the HTML specification web page, a document that is 85 KB in size, takes 77 ms for the hancrafted lexer and 14.8 seconds for the swift-parsing lexer on my mac. This is my first attempt at using swift-parsing so I'm sure there is much that can be optimized in my code, but I profiled the code and found something surprising.
The code spends most of it's runtime performing
Substring.distance(from:to:)
which is being called byCollection.count
. Is that normal behaviour for Substring? Does it calculate count on each Substring allocation? Can this be avoided?Beta Was this translation helpful? Give feedback.
All reactions