Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OFS isn't safe for multi-process/thread access #11

Open
rossjones opened this issue Feb 4, 2014 · 5 comments
Open

OFS isn't safe for multi-process/thread access #11

rossjones opened this issue Feb 4, 2014 · 5 comments

Comments

@rossjones
Copy link

The persisting of state in https://github.com/okfn/ofs/blob/master/ofs/local/storedjson.py#L12 has been causing issues where the file will be corrupted, or overwritten with incorrect data by another process. There is nothing stopping one process/thread from writing to the file whilst another process is also trying to write (there is a race between reading/writing).

State should be persisted somewhere other than a file on disk. Setting O_EXCL on the fd isn't a safe solution (it'll just show up as blocked/sleeping processes waiting for access).

@rufuspollock
Copy link
Member

tbh the storedjson lib here was very alpha and not much used or tested (I think this was an experiment from @benosteen).

@benosteen
Copy link
Contributor

(Disclaimer: I wrote the original code for storedjson, but I'm not aware of how it is currently in use in ofs!)

The storedjson was written from the point of view as a fallback - when you don't have a persistent store to use (redis, mongo, etc) but need to try something out, run a test suite without damaging/interacting with your current store or - this was my particular use case - provide read-only access to a dataset without spinning up a persistent store. If you have a naturally parallelisable, independent set of tasks, then this non-threadsafe file access won't readily show up as the problem it is.

Wrapping the storedjson file access with something more intelligent is the way forward, but if we have got to the point where threadsafe access is important, then using a different backend might be in order?

@rossjones
Copy link
Author

Wish there was a universal well-known way of marking stuff as unsafe for certain uses - I'd use it lots.

I don't think the latest CKAN depends on it any more, but it is still currently there for backwards compat. I think time will cure that specific problem, but I reported it just in case ;) Perhaps storing the metadata next to the file /x/y/myfile.csv and /x/y/myfile.json (or something similar) might work as a way of avoiding more heavyweight state storage? Presumably this is only written once when the file is written/modified and so won't suffer the write contention as it'll only be read after that.

@lwcolton
Copy link

I am implementing the missing filestore plugin and the way I'm doing it is having a root dir where all the OFS stuff goes such as /var/ofs and then a metadata dir that mirrors the actual storage dir, such that a file like
/var/ofs/buckets/bucket_name/label_name
would have its metadata in
/var/ofs/metadata/bucket_name/labal_name.json

@rossjones
Copy link
Author

Sounds good. Before writing the file/metadata, it's probably worth writing to an alternate file and then using os.rename as it's atomic (an uninterruptible syscall) on most platforms and will save you needing to flock anything before reading/writing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants