-
Notifications
You must be signed in to change notification settings - Fork 7
Set default filename for RemoteData if no filenames passed
#109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: fix/105/handle-remote-data-argument-placeholders
Are you sure you want to change the base?
Set default filename for RemoteData if no filenames passed
#109
Conversation
…'t have `filenames` passed
|
The problem here is not really one with the implementation in Just like |
|
Hi @sphuber, thanks for chiming in! Indeed, I do agree that the root cause is the way We considered both options that are currently available when registering the parent directory of our file of interest, however, we were not happy with either:
We also thought about extending |
|
Just another update here: What we are trying to achieve actually does work when we do specify the appropriate Though, if one doesn't want to force users to provide the |
|
Thanks for the updates @GeigerJ2 . Could you please sketch the exact example that you are trying to run? That would make it easier to understand what the real problem is and if there really is a problem in |
|
Hi @sphuber, thanks for your feedback, and sorry for the delay in my response here. Based on the def test_nodes_remote_files_filename(generate_calc_job, generate_code, tmp_path, aiida_localhost):
"""Test the ``nodes`` and ``filenames`` inputs with ``RemoteData`` nodes."""
remote_path_a = tmp_path / 'remote_a' / 'file_a.txt'
remote_path_b = tmp_path / 'remote_b' / 'file_b.txt'
remote_path_a.parent.mkdir()
remote_path_b.parent.mkdir()
remote_path_a.write_text('content a')
remote_path_b.write_text('content b')
remote_data_a = RemoteData(remote_path=str(remote_path_a.absolute()), computer=aiida_localhost)
remote_data_b = RemoteData(remote_path=str(remote_path_b.absolute()), computer=aiida_localhost)
inputs = {
'code': generate_code(),
'arguments': ['{remote_a}'],
'nodes': {
'remote_a': remote_data_a,
'remote_b': remote_data_b,
},
'filenames': {'remote_a': 'target_remote'},
}
dirpath, calc_info = generate_calc_job('core.shell', inputs)
code_info = calc_info.codes_info[0]
assert code_info.cmdline_params == ['target_remote']
assert calc_info.remote_symlink_list == []
assert sorted(calc_info.remote_copy_list) == [
(aiida_localhost.uuid, str(remote_path_a), 'target_remote'),
(aiida_localhost.uuid, str(remote_path_b / '*'), '.'),
]Here, the last [('aa2a9b45-00b4-4018-933b-ae9ca1f8cede',
'/tmp/pytest-of-geiger_j/pytest-28/test_nodes_remote_data_files_f0/remote_a/file_a.txt',
'target_remote'),
('aa2a9b45-00b4-4018-933b-ae9ca1f8cede',
'/tmp/pytest-of-geiger_j/pytest-28/test_nodes_remote_data_files_f0/remote_b/file_b.txt/*',
'.')]The first instruction in the Now, if one argues that the I extended the def test_nodes_remote_data_filename(generate_calc_job, generate_code, tmp_path, aiida_localhost):
"""Test the ``nodes`` and ``filenames`` inputs with ``RemoteData`` nodes."""
remote_path_a = tmp_path / 'remote_a'
remote_path_b = tmp_path / 'remote_b'
remote_path_a.mkdir()
remote_path_b.mkdir()
(remote_path_a / 'file_a.txt').write_text('content a')
(remote_path_b / 'file_b.txt').write_text('content b')
remote_path_c = tmp_path / 'remote_c' / 'file_c.txt'
remote_path_d = tmp_path / 'remote_d' / 'file_d.txt'
remote_path_c.parent.mkdir()
remote_path_d.parent.mkdir()
remote_path_c.write_text('content c')
remote_path_d.write_text('content d')
remote_data_a = RemoteData(remote_path=str(remote_path_a.absolute()), computer=aiida_localhost)
remote_data_b = RemoteData(remote_path=str(remote_path_b.absolute()), computer=aiida_localhost)
remote_data_c = RemoteData(remote_path=str(remote_path_c.absolute()), computer=aiida_localhost)
remote_data_d = RemoteData(remote_path=str(remote_path_d.absolute()), computer=aiida_localhost)
inputs = {
'code': generate_code(),
'arguments': ['{remote_a}'],
'nodes': {
'remote_a': remote_data_a,
'remote_b': remote_data_b,
'remote_c': remote_data_c,
'remote_d': remote_data_d,
},
'filenames': {
'remote_a': 'target_remote',
'remote_c': 'target_remote_file',
},
}
dirpath, calc_info = generate_calc_job('core.shell', inputs)
code_info = calc_info.codes_info[0]
assert code_info.cmdline_params == ['target_remote']
assert calc_info.remote_symlink_list == []
assert sorted(calc_info.remote_copy_list) == [
(aiida_localhost.uuid, str(remote_path_a), 'target_remote'),
(aiida_localhost.uuid, str(remote_path_b / '*'), '.'),
(aiida_localhost.uuid, str(remote_path_c), 'target_remote_file'),
(aiida_localhost.uuid, str(remote_path_d), 'file_d.txt'),
]At this point, I'd say it's up to you if it's a use case you would like to include, or you deem it irrelevant because |
|
Thanks for the detailed example. The problem indeed seems to boil down to the fact that This assumption breaks down if That is why I really think this should first be fixed on the side of Finally, instead of this workaround here. Is there no way in your use case to just use the directory of the target file as the path of your |
Working off of #108, and pair-coded with @agoscinski 🫶
Given that one intends to pass a
RemoteDataobject that references a file (not a directory) on a remote machine, e.g., in the following minimal working example:This would fail silently, with the reason for it being the instructions that are generated here:
aiida-shell/src/aiida_shell/calculations/shell.py
Lines 348 to 352 in f2acef2
As one can see, if
filenamesis not given, theremote_pathof theRemoteDatanode is appended with/*, leading to aninstructionsuch as:hence, not linking/copying anything.
With this PR, if the
RemoteDatas path points to a file, and a corresponding targetfilenamein the working directory is not given, the actual filename is set as the default inhandle_remote_data_nodes, as well asprocess_arguments_and_nodes(to resolve theargumentsplaceholder in the final CLI command to the actual filename).While debugging, we were also struggling with some errors during symlinking originating from SFTP in paramiko called via the
SshTransport, but I currently fail to reproduce those in a clean environment with an SSH-configuredComputer.