- 
                Notifications
    You must be signed in to change notification settings 
- Fork 47
MMap or memory-based access to content #298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| This looks great and should be super helpful! I wonder if it'd be good to have the default file mode loaded from an environment variable to make it a bit easier? | 
| As you wish but I would prefer avoiding environment variables since for some datasets MEMORY is OK, but not for others. The environment variable could control the default. I had a 5x speedup on a big cluster even with the MMAP strategy. So if OK, I proceed on updating all the docs_store() methods (I will only implement the strategy for LZ4-based doc stores, leaving a warning for the others) | 
| And OK for having options like this (or you prefer a more direct  | 
| OK, just implemented the change for all the  Could you run the full test (I don't have all the collections installed) so as to check if anything's wrong? | 
| Amazing, thanks a bunch! Still running through the tests, but these ones with   | 
| 
 OK, this should be fixed | 
| Thanks! I got the following test failures that look related to this change:  | 
| For 
 | 
| Gentle ping here – the tests should be fixed ( | 
With my current cluster, disk access can be painfully slow. What I propose here is to add an argument to
doc_stores()with a generic option (that only has afile_accessfield for now, but just leaving it like this for the future) that allows to control how the content (LZ4 pickled for now) is accessed:I just did the modifications for msmarco-passages so that this works:
Before continuing, I wanted to check if you would like such a PR, and if any modifications on the current way I designed it is needed (since it involves modifying all the
docs_store()methods.