Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per-directory sync exclusions #1558

Open
eigengrau opened this issue Mar 12, 2014 · 32 comments
Open

Per-directory sync exclusions #1558

eigengrau opened this issue Mar 12, 2014 · 32 comments
Labels

Comments

@eigengrau
Copy link

eigengrau commented Mar 12, 2014

It would be a useful enhancement if sync exclusions could not only be specified globally, but also per directory. E.g., I often would like to ignore the intermediate output of my LaTeX-projects (*.out, *.run.xml, etc.). I cannot ignore these glob patterns globally, since they are also used by files which I would like to sync.

A possible solution would be to allow local .sync-exclude.lst files within folders. These exclusion lists would apply only to paths within and below the directory within which they are placed.

An alternative approach would keep the global exclusion list, but allow ** in glob patterns to match across subdirs recursively (same as the Bash “globstar”). E.g., this would allow specifying a pattern latex/**/*.out to match .out files only if they are somewhere beneath latex/.

@oruchreis
Copy link

+1 for this enhancement but allowing local exclude files within folders would be more useful as I said in my duplicate suggestion #4068

@arifemre
Copy link

arifemre commented Nov 4, 2015

I'm waiting this feature, too

@Awpteamoose
Copy link

Pretty please, I thought the current ignore settings were like that and got burned after I confifgured the server and everything.

@dragotin dragotin added this to the backlog milestone Jan 10, 2016
dragotin pushed a commit that referenced this issue Jan 10, 2016
@dragotin
Copy link
Contributor

Have you checked exclude patterns like

latex/*/*.out

That works. See d89edc3

@danimo danimo removed this from the backlog milestone Feb 22, 2016
@qw3ry
Copy link

qw3ry commented May 4, 2016

I really like this idea and was just about to submit the same ticket when I luckily found this one.

I like to point out that this is not only about LaTeX output. For me this feature would mean

  • no more separate handling of ignore lists for each client (that is on each machine)
  • a central option to exclude files from sync for shared projects (like repositories used by multiple OC users)

Furthermore I would recommend using the syntax from .gitignore-files (or an extension), as it is easy and those files are quite common (and therefore, their syntax). Furthermore one could be symlinked to another instead of duplicating the file (Of course the client could allow the user using .gitignore-files itself).

@guruz guruz added the sev4-low label Jul 26, 2016
@jvcdk
Copy link

jvcdk commented Aug 8, 2016

+1 from me as well.

Any news from ownCloud team... is this something anyone is working on? Would a patch be accepted if I implemented it?

~Jørn

@hodyroff
Copy link

Please comment on #1558 (comment)
Generally yes, this could be interesting on a per folder bases. Challenge is how to make a user experience which works and can be understood or stays away from the not so technical "normal" office user of ownCloud.

@qw3ry
Copy link

qw3ry commented Mar 16, 2017

@hodyroff Re user experience:

  • for "staying away from the average user": using a hidden file to configure it (like .gitignore) will keep this setting away from non-technical users.
  • for "nice UX": One could edit the global ignores with a syntax using wildcards, e.g. /some/path/to/a/folder/**/*.html. OTOH I'd prefer something like .gitignore because you can easily copy'n'paste these between different folders.

I notice this is basically what @eigengrau suggested. Are there any problems with this approach?

@cpesoft
Copy link

cpesoft commented Aug 4, 2017

+1 for this feature

@sryze
Copy link

sryze commented Aug 5, 2019

This would be useful for me as well, e.g. for excluding node_modules and build outputs (something like .owncloudignore)

@ashthespy
Copy link

I've been scouring the docs, but haven't found an elegant solution yet. I just realised I've been syncing couple of GB of Rust ./target/release/.fingerprint files.

A lot of other repos also sync all my build artefacts across and I'd like to avoid hardcoding paths on each of my machines, and check in a .owncloudignore or something similar to each repo to avoid this in the future.

@maxfrei750
Copy link

@TheOneRing Thanks for pointing me to this issue.

This seems to be a highly requested feature, which unfortunately has not been implemented in the past 7 years. Maybe it would be helpful, if one of the core developers could shed some light onto why this feature is not being addressed and what the community could do to help with the implementation. Thanks.

@maxfrei750
Copy link

@TheOneRing Would you mind helping with this issue? Either directly or by pinging the most suitable developer(s)?

@maxfrei750
Copy link

@ogoffart @dragotin @michaelstingl Since @TheOneRing is apparently unavailable for this issue, maybe one of you can help?

Is this feature wanted by the Owncloud team? If not, when why not? If yes, then what could the community do to speed up the implementation and what would reasonable first steps look like? Thanks!

@HeyHouBenjo
Copy link

Is there any chance this will be implemented some day? I have like 40 git repos which I don't want to be synced spread in different places and I have to add them manually to the ignore list via the client editor on all my devices. Exhausting.

@TheOneRing
Copy link
Contributor

Is there any chance this will be implemented some day? I have like 40 git repos which I don't want to be synced spread in different places and I have to add them manually to the ignore list via the client editor on all my devices. Exhausting.

Pro tip: Don't add stuff you don't want to be synced to your sync folder.

@maxfrei750
Copy link

@TheOneRing I must say that your "pro tip" came across a little snippy. After all, you don't know the use case of @HeyHouBenjo and I think there are very legitimate reasons to have git repos that you don't want to be synced in a larger context of data, that you want to be synced.

And I'm fully aware that this is FOSS, so there is nothing the users can demand, but it would have been much more helpful, if you had directed us to a starting point to implement the solution ourselves.

@HeyHouBenjo
Copy link

Unfortunately, I am one of those "Need to sync my complete Documents folder" guys, because otherwise, I wouldnt need to ignore 40 git repos, but rather create like 30 synced folders /:

@TheOneRing
Copy link
Contributor

I'm aware that my comment was not very productive, my suggestion however is that you sync your whole Documents folder, create an additional directory Src in your home folder and place the git repos there.
Most IDE's will put the build folder into the git folder or next to it, if that folder is not ignored as well there will be hundreds of changed files within seconds, all of them need to be synced.


Additionally placing files you don't want to sync in to a different folder will save you from surprises like unsynced files due to a tiny error in an ignore pattern.

The feature you request sounds simple, however parsing the ignore list for each folder in a huge file tree would probably have a bad performance.

I saw the .gitignore files mentioned as an example, however while git will usually visit a folder once we will regularly scan the folder. We can either keep a rather complex tree of excludes in ram or compute the recursive tree over and over again.
Both options are not simple and the changes to the current code would probably huge.

On the ux I see a different hurdle.
Git is rather complex, some people kept praising SVN because its "so simple" for years.
We would need to indicate why a specific file was not handled, in the ui, in the file browser or the forum and our enterprise support will drown in requests "why was my file not synced".

The amount of QA needed would be huge.

You can of course say its a pro feature but that won't keep non pro users from playing with it.

TLTR: This is not a tiny improvement, this is a complex change for a niche audience.

@HeyHouBenjo
Copy link

HeyHouBenjo commented Nov 29, 2021

Thank you for the explanation, I will look into exporting all git repos from my Documents folder into /home/src.

However, I saw this thread
=> Folder X with subfolder Y and maybe many different other files/dirs should be ignored if there is a pattern specified like: "/**/Y"
This way, I could simply manually add "/**/.git" to the ignore list in the client.
Wouldnt this be much easier to implement? If not, correct me please.

@TheOneRing
Copy link
Contributor

Thank you for the explanation, I will look into exporting all git repos from my Documents folder into /home/src.

However, I saw this thread => Folder X with subfolder Y and maybe many different other files/dirs should be ignored if there is a pattern specified like: "/**/Y" This way, I could simply manually add "/**/.git" to the ignore list in the client. Wouldnt this be much easier to implement? If not, correct me please.

This would only ignore the .git directory which is ignored by default (hidde/dot files).
Without that directory the essence of git gets lost and you rely on ownCloud for versioning.
ownCloud versioning is meant for single files and not for project, git is much more suited for development than ownCloud.

@HeyHouBenjo
Copy link

Thank you for the explanation, I will look into exporting all git repos from my Documents folder into /home/src.
However, I saw this thread => Folder X with subfolder Y and maybe many different other files/dirs should be ignored if there is a pattern specified like: "/**/Y" This way, I could simply manually add "/**/.git" to the ignore list in the client. Wouldnt this be much easier to implement? If not, correct me please.

This would only ignore the .git directory which is ignored by default (hidde/dot files). Without that directory the essence of git gets lost and you rely on ownCloud for versioning. ownCloud versioning is meant for single files and not for project, git is much more suited for development than ownCloud.

I think you're getting me wrong here, I meant to exclude a complete folder X only if there is a file or folder Y in it and the pattern /**/Y/ is set by the user in the ignore list. Maybe ** should be replaced with something different thats not in use yet, possibly (). This way, the whole git repo could be ignored if the user adds /()/.git to the ignore list.

@TheOneRing
Copy link
Contributor

Yes I got you wrong sorry.
As a work around you could add *_git to your exclude pattern and ensure that you git folders always end in _git.

@ashthespy
Copy link

@TheOneRing would there be a large performance hit if I were to manually add a bunch of folders to the exclude list?
I could then offload the tree walking and path collection to a script that could be run from time to time.

@TheOneRing
Copy link
Contributor

One central exclude list is no problem, its exactly what we have right now so that sounds possible.
For a per directory solution there would just be so many edge cases.

@maxfrei750
Copy link

@TheOneRing Thanks for the helpful input. The user's exclude list is located in ~/.config/ownCloud/sync-exclude.lst (at least for me). Would it be OK to change this text file with an external program (e.g. a python script) or will it cause problems with OC?

@TheOneRing
Copy link
Contributor

The supported way is to use the settings ui but as long as you stick to the format it should work.

@maxfrei750
Copy link

@TheOneRing Out of curiosity, I just recursively iterated all files in my owncloud directory (~1,2 million files, 250GB, please don't judge me 😅) and it took ~3 seconds. For the sake of a better understanding, what would cause the large performance hit, if not the file iteration itself?

@TheOneRing
Copy link
Contributor

We have users with insane folder structures and the performance hit I'd expect would come from the parsing of an exclude file for in theory every single folder. (Yes that is not very realistic).
Every exclude file would extend its parent excludes etc.
The result would be a huge code change in the core sync code.

So my estimation is lots and lots of effort, lots and lots of qa, added complexity, potential bad ux, and a tiny gain.

@maxfrei750
Copy link

I see, thanks for elaborating. Is there a way to hook into the syncing process (e.g. specify a script that is executed before the sync? Then users could just write a script to automatically parse their .owncloudignore files and add them to sync-exclude.lst before each sync. I imagine that such a hook could also be useful for many other edge case applications.

@ashthespy
Copy link

ashthespy commented Nov 30, 2021

My plan was to keep it simple. If a folder has a .syncignore just ignore that parent. Should solve my particular issues, but also probably 90% of what people are seeking here! :-)

I'll go with the external script route adding fully defined paths to the global exclude list.

@LecrisUT
Copy link

LecrisUT commented Dec 5, 2022

In the meantime Nextcloud already implemented .sync-exclude.lst local to each folder and fixed a bug in its parsing. I do not see any performance issues as it is a simple if statement, and in general .sync-exclude.lst is much smaller and simpler than a global wild-card list. Please prioritize this issue a bit, even if it does not have a UI yet (nextcloud didn't implement one yet either).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests