-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine MIT Engaging dandi/001
and dandi/002
storage directories into a single virtual file system/namespace?
#8
Comments
dandi/001
and dandi/002
storage durectories into a single virtual file system/namespace?dandi/001
and dandi/002
storage directories into a single virtual file system/namespace?
single virtual filesystem would be much better -- I do not think |
@kabilar - check in with michel about the virtual layer. i'm not sure that's an easy solution. i agree that s3invsync shouldn't be in the business of volume management. however, it should be able to take a set of paths and spill over if it detects out of space in any location and go to the next location to continue. users could specify a path to store the index, but the downloaded objects could be across filesystems. |
Just sent an email. |
it isn't as easy as simply generating a single file, and going to the next "part" when first one is "full". there is no "objects" -- I guess it would be possible to design logic of inspecting/dealing with multiple leading paths, but this would likely have many negative impacts on performance etc. |
I can certainly appreciate that this would be additional work and could effect performance. Given that its going to cost 60k + 6k recurring after the first year, perhaps it is most cost effective to implement this feature in @yarikoptic @jwodder Can we map out what it would take to implement this feature? And then we can make a decision. |
@kabilar Exactly how do you want this feature to behave? The most obvious option would be to download files to the first filesystem until it's full, then move on to the next file system and so forth, all the while retaining intermediate directory components in paths (e.g., two adjacent keys |
Hi @jwodder, this plan makes sense to me. How and how often would you check for the space available on the filesystem? With parallel jobs this could get a bit tricky to make sure there is enough space remaining as the filesystem approaches capacity. Additionally, we can make it so that |
I was thinking of just checking whether any write failures had an |
Thanks. I think that would be fine for our use case. |
some thoughts:
altogether - sounds feasible, but I have fears of hidden obstacles coming up often, which could have been avoided by relying on proper volume management at filesystem level -- could there may be some LVM or another be put over those storage storage volumes? |
Hi @satra, based on the email discussions with the ORCD team from September 2024, each storage server has 1.1 PiB and DANDI requested 1.5 PB so they had to split up the DANDI space between the two storage servers, as shown below:
Michel had proposed a couple of options to create a single virtual file system/namespace for the DANDI storage.
Should we pursue these options or just try to get
s3invsync
to work with multiple target directories (i.e./orcd/data/dandi/001
and/orcd/data/dandi/002
)? I am inclined to the latter option since we have more control over the timeline, but perhaps @jwodder and @yarikoptic have a preference here? Thanks.The text was updated successfully, but these errors were encountered: