-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Forward args to _get_remote_config() and honour core/no_scm if present #10719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forward args to _get_remote_config() and honour core/no_scm if present #10719
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #10719 +/- ##
==========================================
+ Coverage 90.68% 91.08% +0.39%
==========================================
Files 504 504
Lines 39795 39970 +175
Branches 3141 3158 +17
==========================================
+ Hits 36087 36405 +318
+ Misses 3042 2938 -104
+ Partials 666 627 -39 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
dvc/repo/open_repo.py
Outdated
# It seems some tests might be passing a 'config' key that is not a dict | ||
if not isinstance(user_config, dict): | ||
user_config = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into this, some tests send kwargs = {'config': None, ...
; this safeguard protects against this.
dvc/repo/open_repo.py
Outdated
|
||
if no_scm_flag is not None: | ||
# Honour specific SCM treatment if requested in the call | ||
repo = Repo(url, config={"core": {"no_scm": no_scm_flag}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIR, Repo(config=...)
should just work.
repo = Repo(url, config={"core": {"no_scm": no_scm_flag}}) | |
repo = Repo(url, config=kwargs.get("config")) |
I don't want to specialize core.no_scm
in any way or handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it doesn't feel ideal to handle core.no_scm
itself. Your solution makes sense and it works for my specific use case, but it triggers other errors in the dvc
test suite; which makes me think there are other non-core.no_scm
configuration options that are being used that _get_remote_config()
doesn't like.
(I went with the core.no_scm
specific approach to highlight the need.)
These are the errors I get when using repo = Repo(url, config=kwargs.get("config"))
:
FAILED tests/func/test_import.py::test_import_no_hash[files1-expected_info_calls1] - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_repo_index.py::test_data_index - dvc_data.index.index.DataIndexDirError: failed to load directory ('edir',)
FAILED tests/func/repro/test_repro_pull.py::test_repro_pulls_missing_import - dvc.exceptions.ReproductionError: failed to reproduce 'foo.dvc'
FAILED tests/func/test_data_cloud.py::test_pull_external_dvc_imports - FileNotFoundError: [Errno 2] No storage files available: 'foo'
FAILED tests/func/test_data_cloud.py::test_pull_external_dvc_imports_mixed - FileNotFoundError: [Errno 2] No storage files available: 'foo'
FAILED tests/func/repro/test_repro.py::test_repro_pulls_missing_import - dvc.exceptions.ReproductionError: failed to reproduce 'foo.dvc'
FAILED tests/func/test_import.py::test_import_dir - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_import_file_from_dir - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_import_file_from_dir_to_dir - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_import_rev - FileNotFoundError: [Errno 2] No storage files available: 'foo'
FAILED tests/func/test_import.py::test_pull_imported_stage - dvc.exceptions.CheckoutError: Checkout failed for following targets:
FAILED tests/func/test_import.py::test_pull_import_no_download - FileNotFoundError: [Errno 2] No such file or directory: '/private/var/folders/_3/h8jc4f6d5gg6dwk7t6464_z80000gp/T/pytest-of-rodrigo.goya/pytest-17/popen-gw13/test_pull_import_no_download0/.dvc/cache/fs/local/e3501e821bcee8f40107794afbe767d1/.F0VDC8H_fGFnvnEcrt...
FAILED tests/func/test_import.py::test_pull_import_no_download_rev_lock - dvc.exceptions.DownloadError: 1 files failed to download
FAILED tests/func/test_import.py::test_pull_imported_directory_stage[dir] - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_pull_imported_directory_stage[dir/] - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_pull_wildcard_imported_directory_stage - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir123',)
FAILED tests/func/test_update.py::test_update_import[True] - FileNotFoundError: [Errno 2] No storage files available: 'version'
FAILED tests/func/test_import.py::test_pull_non_workspace - FileNotFoundError: [Errno 2] No storage files available: 'foo'
FAILED tests/func/test_update.py::test_update_import_after_remote_updates_to_dvc - FileNotFoundError: [Errno 2] No storage files available: 'version'
FAILED tests/func/test_import.py::test_import_with_jobs - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir1',)
91cf291
to
f598ecd
Compare
f598ecd
to
652560a
Compare
@skshetry, Ok, I finally got some time to track down the errors. I took your suggestion of using something closer to I tracked the errors that were triggered by that call to the way that Since the function of I modified the code to to the above, and added some comments to the code for future reference. I also added a related test to the test suite. Please let me know what you think. Note: FYI, I've been seeing some flakiness in the test suite from what seems to be a race condition in the logs, where a message of
|
Great research, @rgoya. I have taken a quick look at the PR and it looks good. Please give me some time (maybe this week) to look at this more closely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for contributing! The way RepoDependency
sets the cache
looks fragile and we should figure out a better solution in the future. Your current solution is practical and works well within the constraints.
I also appreciate the effort you’ve put into adding detailed comments and a very good test.
That said, I want to note that core.no_scm
may not always work reliably in all scenarios. Configuration options like this are contextual, so they might not work on every cases.
I’m happy to approve this. Thank you for your time and effort in improving the project! 🙂
Thank you, @skshetry. It's been quite illustrative peeking into the |
@skshetry, incidentally, do you have a defined release schedule? (Just wondering when this change would make it into my conda-environment) |
Hopefully, by Wednesday. If not, then I'll release by the end of next week (I am out till early next week). |
Alright! I'll keep an eye out for it. Thanks! |
This `_get_remote_config()` should use `uninitialized=True` so that it supports more broader kinds of broken/partially-initialized repositories, including some that have `.dvc` directory missing, or `.git` directory missing. This partially reverts #10719. #10608 is also fixed, and no longer requires `core.no_scm` to be passed. This was already supported by `dvc.api.get_url` as it uses `uninitialized=True`, but this was not respected in `_get_remote_config()` where it would fail before. That said, this whole `open_repo`/`_get_remote_config` is terribly broken. For one, it is opening a local repository, and forcing it's remote config to a repository opened with `Repo(rev=...)`, where the config may be different.
This `_get_remote_config()` should use `uninitialized=True` so that it supports more broader kinds of broken/partially-initialized repositories, including some that have `.dvc` directory missing, or `.git` directory missing. This partially reverts #10719. #10608 is also fixed, and no longer requires `core.no_scm` to be passed. This was already supported by `dvc.api.get_url` as it uses `uninitialized=True`, but this was not respected in `_get_remote_config()` where it would fail before. That said, this whole `open_repo`/`_get_remote_config` is terribly broken. For one, it is opening a local repository, and forcing it's remote config to a repository opened with `Repo(rev=...)`, where the config may be different.
This `_get_remote_config()` should use `uninitialized=True` so that it supports more broader kinds of broken/partially-initialized repositories, including some that have `.dvc` directory missing, or `.git` directory missing. This partially reverts #10719. #10608 is also fixed, and no longer requires `core.no_scm` to be passed. This was already supported by `dvc.api.get_url` as it uses `uninitialized=True`, but this was not respected in `_get_remote_config()` where it would fail before. That said, this whole `open_repo`/`_get_remote_config` is terribly broken. For one, it is opening a local repository, and forcing it's remote config to a repository opened with `Repo(rev=...)`, where the config may be different.
This `_get_remote_config()` should use `uninitialized=True` so that it supports more broader kinds of broken/partially-initialized repositories, including some that have `.dvc` directory missing, or `.git` directory missing. This partially reverts #10719. #10608 is also fixed, and no longer requires `core.no_scm` to be passed. This was already supported by `dvc.api.get_url` as it uses `uninitialized=True`, but this was not respected in `_get_remote_config()` where it would fail before. That said, this whole `open_repo`/`_get_remote_config` is terribly broken. For one, it is opening a local repository, and forcing it's remote config to a repository opened with `Repo(rev=...)`, where the config may be different.
This is a proposed fix for #10608, the code here makes steps 9 and 10 described in the issue work.
Summary:
This change allows a user to access the dvc information in an environment that is disconnected from the original Git backend (e.g. in a deployed container, see #10608), by using something like:
Description:
Mainly, a call to
dvc/repo/open_repo.py:open_repo(url, *args, **kwargs)
may contain a parameterconfig
in**kwargs
. With thisconfig
a user might indicate they do not want to access the repo with Git support, by usingconfig={"core": {"no_scm": True}}
.During the execution of
dvc/repo/open_repo.py:open_repo()
, there is a call to a functiondvc/repo/open_repo.py:_get_remote_config()
that returns the remote configuration({"core": {"remote"}}
. This is then merged to the user providedconfig
parameter before callingRepo(url, *args, **kwargs)
.dvc/repo/open_repo.py:_get_remote_config()
, in turn, does a quickRepo()
call to get the remote configuration. However, it does not use any of the parameters requested viadvc/repo/open_repo.py:open_repo()
and thus relies entirely on the contents of.dvc/config
. This means that even if the user requested no SCM support, it will try to look for a Git repo if.dvc/config
says so, and fail if it does not find it.This PR modifies
dvc/repo/open_repo.py:_get_remote_config()
to receive*args, **kwargs
and honour the request to use or ignore Git support when accessing the dvc repo.❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏