-
Notifications
You must be signed in to change notification settings - Fork 873
Any tips on how to speed up build of a _very_ large table? (~1500 rows) #1374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
With #1375 I'm able to see where it is getting stuck. Looks like it is in the
Now to figure out which treeprocessor it is within |
Looks like its the
|
Ah it looks like the InlineTreeprocessor skips AtomicStrings. I'll see if I can adjust our table generation logic to only use atomic strings then. |
I doubt that will work. Atomic strings are simply an instance of a custom Python class. They would not be retained in a external text file. They are intended to be used internally to instruct the parser that that string is fully parsed and to ignore it in any future parsing. |
@waylan ok, thanks! Any suggestions on how to fix this issue then? Or speed up the InlineTreeprocessor? |
The InlineTreeProcessor is simple a wrapper around all inline processors (see inlinepatterns.py. It could be anything in there. Although, I have to wonder if perhaps there is an issue with the syntax of your tables. If you have the table extension enabled, then the large table should have converted all of the table syntax already by the time we get to the InlineTreeProcessor and we would simply be parsing the contents of each cell at this point. Table cells usually tend to have pretty simple content, so this is not usually an issue. Unless you have some unusual cell content... Or it could be that stepping through all of the many thousands of cells is what is slowing things down. And come to think of it, the InlineTreeProcessor does some extra shenanigans to support some sophisticated nesting (as explained here; in fact, it could be that the entire discussion in #798 is related to this - I don't have a way to know without some sample input). Running that for each cell on thousands of cells could add up. Although, I can't imagine ~30 minutes for that. |
@waylan inside each cell is either one link (e.g. Not sure if it matters either but VS Code markdown preview utility can render the table just fine very quickly. |
I don't see any obvious reason for the slowdown from what you have provided. Again, take a look at #798. Even if the specific issue there is not relevant, our general approach to performance issues and priorities are discussed in detail. It could be that you have hit an edge case on some regex which we could tweak or it could be that a fix would require completely rewriting how inline parser works. Python-Markdown is very old and back when its structure was first designed very few extensions existed in any implementation. Therefore, it doesn't always work well for some newer syntax. We haven't rewritten as that would require us to abandon the rich ecosystem of existing third party extensions. |
Correct me if I'm wrong, and I definitely might be, as I don't have enough information on what you're trying to achieve, but if you're generating the table, why not just generate HTML and omit Markdown processing completely? |
@squidfunk this was going to be my "plan b" if we cant figure it out here 😅 |
@flynneva if you're dealing with a lot of data, it might be a scalable approach 😉 However, I understand that you might want to keep Markdown parsing inside of table cells, e.g. text formatting, icons, etc., which would require some extra work. |
@waylan just figured it out 🙃 it isn't an issue at all with this repo or the table itself....Its the generator we have. The issue was that just before the md table, there is an HTML link in the file (e.g. Removing that link, or adding a space between it and the table fixes my issue. Closing this as it is resolved. Big thanks for helping me find the root cause 🙏🏼 @waylan @squidfunk |
That may or may not help; it depends on what the issue is. The Markdown parser will still parse all of the raw HTML (as HTML) only to find the end of the block of text it should ignore. However, it uses the HTML parser in the Python standard library which is generally fast enough but does have a few weird edge cases of its own. That said, all inline processing is avoided, so the current slowdown would be avoided. |
First off - thanks for the awesome project 🙏🏼
I use the mkdocs-material + python-markdown integration quite heavily with the docs I build and I am running into a corner case which I'm not quite sure how best to improve / optimize.
We have a super large markdown table (~1500 rows with 3 columns, with md links in each cell) that is auto-generated and when trying to build with
mkdocs build
, it hangs specifically on thepython-markdown
bit and takes ~30min just to build that one file 😅The last debug print I see before the ~30min wait until the next file are some
Successfully imported extension
andSuccessfully loaded extension
fromcore.py
within this package:After this last print it hangs for ~30min until moving onto the next file.
Main questions
The text was updated successfully, but these errors were encountered: