Skip to content
This repository has been archived by the owner on Apr 4, 2022. It is now read-only.

Index bloat #104

Open
paulhuggett opened this issue Aug 2, 2021 · 0 comments
Open

Index bloat #104

paulhuggett opened this issue Aug 2, 2021 · 0 comments
Labels
inefficiency The results are correct, but less efficient than they should be.

Comments

@paulhuggett
Copy link
Contributor

paulhuggett commented Aug 2, 2021

When a pstore transaction is committed, all of the modified indices are updated. This principally involves flushing the updated branch records to the store. As with traditional B-Tree-style indices, we want our index trees to be wide and shallow to minimize the number of different branches that must be visited during a search. The downside of this is that the records being flushed during commit are rather large.

I did an experiment by instrumenting pstore-write so that it prints the amount of data being written during transaction commt and then adding a large number of files to a database with one transaction for each. Here is a graph of the data that this produced:

image

As you can see, the result is nicely logarithmic (good), but that after the first 5000 or so insertions, each commit requires ~1200 bytes. Not good. We need to find a way of reducing that overhead.

This is worse in the compiler because each compilation is typically updating three indexes (those for compilations, names, and fragments).

@paulhuggett paulhuggett added the inefficiency The results are correct, but less efficient than they should be. label Aug 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
inefficiency The results are correct, but less efficient than they should be.
Projects
None yet
Development

No branches or pull requests

1 participant