Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard link untranscoded files when copied within filesystem #61

Open
nichobi opened this issue Oct 19, 2020 · 9 comments
Open

Hard link untranscoded files when copied within filesystem #61

nichobi opened this issue Oct 19, 2020 · 9 comments

Comments

@nichobi
Copy link

nichobi commented Oct 19, 2020

When copying files from the library to a new directory, within the same filesystem, the files should be hard linked rather than copied. This would take less time than copying and use less space, without any obvious downsides.

Use case:
I keep a lossy version of my library next to my regular library, for easy copying to other devices. Any songs that are lossless are transcoded but the lossy ones are copied over as is. Currently all lossy files end up taking up space in both directories.

@geigerzaehler
Copy link
Owner

Seems link a good idea. os.link() is what we would need to use.

If you feel like, please open a PR, @nichobi. I’m happy to help with any questions.

@nichobi
Copy link
Author

nichobi commented Oct 20, 2020

I had a look around and found a hardlink() function in beets util that simplifies implementation. What I'm unsure of is how to determine when to hardlink vs copy. Some options I've considered:

  • Somehow check whether files are on the same filesystem and hardlinks are supported. Have not found any easy way to check this.
  • Add a config option (per alternatives directory). What should be done if hardlink is set to true but fails?
  • Try to hardlink whenever a file is copied and fall back to copying on errors. This may be ineffecient when copying many files.
  • Attempt a hardlink for the first file or a test file, use the result to determine whether to hardlink or copy remaining files. What if our test files succeeds but future files fail?

Do you have any opinion on which to go with? I'm happy to open a PR, just not sure in what manner to start working.

@geigerzaehler
Copy link
Owner

I wasn’t aware of beets.util.hardlink(). Makes sense to use it.

Thanks for the analysis on the different approaches. I think the last one makes the most sense. We could also add a config option that disables this behavior, for example if the user knows that the collection is on a different filesystem. But this is something we can always add later.

@wisp3rwind
Copy link
Collaborator

There's also reflinking of files (for some filesystems, such as btrfs). I think this is one more reason why no hard-/ref-linking should happen by default (because it's not clear what should be preferred), but rather only if a per-alternative config option is set. In addition, hardlinking by default changes the behaviour when the files in the alternative collection are modified (e.g. by a player writing rating tags): Currently, this will not affect files in the main beets library.

The cp command has a --reflink=[never|auto|always] flag, maybe the hardlink option could also take these values, with the same meaning.

Try to hardlink whenever a file is copied and fall back to copying on errors. This may be ineffecient when copying many files.

I doubt that this would really the bottleneck when updating alternatives (of course, I haven't measured it). I suppose that the system call to hardlink fails rather quickly if it is not supported by the filesystem.

@geigerzaehler
Copy link
Owner

In addition, hardlinking by default changes the behaviour when the files in the alternative collection are modified (e.g. by a player writing rating tags): Currently, this will not affect files in the main beets library.

Excellent point! This is indeed a good reason not to use hardlinking by default. In general it makes me wonder whether hardlinking is a good idea. Reflinking is a lot better but also less widely supported. (It’s only available on some file systems and not in the Python stdlib yet although there is a package for it.)

An alternative solution for your use case, @nichobi, would be to enable symlinks alongside transcoding. Basically we would add a flag that would symlink files instead of copying them if they don’t need to be transcoded. This is different from format: link where all files are symlinked by default. Would this be acceptable @nichobi?

@nichobi
Copy link
Author

nichobi commented Oct 21, 2020

Symlinks could be a solution, but might be unstable if the main library is modified. Symlinks would break if a file is moved to a different path or deleted, requiring an alt update to repair. Hardlinks would still point to the same data, even if the main library is changed. For my use case, hardlinks would work better, but I can see why it might be troublesome.
What seems best to me would be to make it an option, so the user could pick from copy, link/symlink, hardlink or reflink. Keeping the default value as copy seems the most sane, but gives the user the ability to pick whatever option works best for them.

@wisp3rwind
Copy link
Collaborator

I'd say, all of symlink, hardlink, reflink could be implemented (or one for now, adding others as requested), everyone could then choose his or her preferred method. For example, we could add the options

alt:
    phone:
        # ...
        link: [never|auto|always]
        linktype: [hardlink|symlink|reflink]

If I understand @nichobi correctly, symlinks might be somewhat inconvenient, because they'd require special care when copying to the other devices, e.g. cp --dereference or rsync --copy-links since a simple cp or rsync would copy the link.

Symlinks would break if a file is moved to a different path or deleted, requiring an alt update to repair.

On the other hand, the old, hardlinked/copied files might contain stale (meta)data. I don't think the validity of an alternative collection after changes to the beets database and before the next alt update is something we should care too much about.

@geigerzaehler
Copy link
Owner

I'd say, all of symlink, hardlink, reflink could be implemented (or one for now, adding others as requested), everyone could then choose his or her preferred method.

This seems to be the right approach given that every option has their own benefits and drawbacks and the user probably knows best what they want.

For configuring this I’d condense your approach @wisp3rwind: We would just provide one link option per alternative with values false, hardlink, symlink, and reflink. I don’t see why need an auto option.

@wisp3rwind
Copy link
Collaborator

For configuring this I’d condense your approach @wisp3rwind: We would just provide one link option per alternative with values false, hardlink, symlink, and reflink. I don’t see why need an auto option.

In that case, I think alt update should abort if a hardlink/symlink/reflink fails. Otherwise, with a silent fallback to copying, it's somewhat hard to verify that you've configured the alternative in a way that is supported on your filesystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants