-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use nbdime as a clever git filter #478
Comments
Hi! If you wanted to make a git filter based on nbdime, I think the simplest logic would be to:
This could possibly be added to nbdime as another CLI entry point, but it might be better to play around with the idea as a separate script first (simply importing the methods from nbdime). If you get something working, we can look at helping getting it integrated with nbdime via a PR. |
Hi! |
@qmarcou did you get anywhere with this? FYI, |
Hi @kynan |
I'm really interested in this feature. |
Yes that's the idea, sadly I never had the time to dig dipper into it... |
Hi,
First of all thank you so much for the work on nbdime, it really makes jupyter notebooks integration in a version control scheme much easier!
Still, I'm still struggling to get some kind of "optimal" git tracking of my notebooks by preventing metadata and output to be changed at every commit. I have checked (hopefully thoroughly) the different issues (e.g #423 and #410 ) and pieces of documentation related to this.
From what I gathered, here is what I got (please correct me if I'm wrong):
git add
, even trying to exploit the latter's option--patch
, in order to add selectively cells input/output/metadata.Basically this only leaves 2 solutions: either track every single change in metadata and output or never have them in the git history.
I think it would be good to have an intermediate one allowing to track (chosen) metadata and output and add changes in metadata/output to commits only when desired. It would be quite helpful when you have a notebook full of plots, some of them potentially long to generate, to be able to keep a png of it inside the notebook (though I agree that if the plots takes time to generate one should probably find a workaround by saving processed data and/or the figure in a convenient format).
I was thinking along this line trying to find a solution, and I thikn I found a track:
The idea would be to use nbdime as a smarter filter than nbstripout. Since nbdime is able to nicely compute diffs one could exploit this ability to revert all changes in input/metadata/output to be similar to the last commit (the idea would be to have something similar to
git checkout myfile.ipynb
that would only revert pieces of the file).For example if one only wants to commit changes in input cells, nbdime could compute a diff on everything but input cells (usually we would have used nbdime the other way around), and then revert all differences found in that diff to the last commit (we should be doable since the diff gives a line by line mapping, such that line by line substitutions/insertions/deletions can be performed). This would be executed as a special git input filter for instance (people would be able to create git aliases for different git add strategies). I think this approach would be a good compromise the the problem exposed above.
Maybe I'm missing some details making this approach untractable, but given how nicely nbdime works I feel it could be implemented. What do you think?
Sorry for the very long message I've been trying to make myself as clear as possible. Again thanks for the good work!
Best
The text was updated successfully, but these errors were encountered: