Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A sibling command to rule them all #685

Open
mih opened this issue May 15, 2024 · 3 comments
Open

A sibling command to rule them all #685

mih opened this issue May 15, 2024 · 3 comments

Comments

@mih
Copy link
Member

mih commented May 15, 2024

This extends the ideas from #684

With sibling operations factored out into (standalone) implementation that are driven through a standard protocol, we are in the position to have a single sibling command for any and all sibling types (same concept as initremote and enableremote for git-annex).

It would unify the implementations of the various create-sibling-... implementations with the common operations provided by the age-old siblings command, and also add new ones like:

Whether or not this is a single command (sibling) with some subcommands, or a set of command (like git-annex has them) is a matter of taste. The important part is that a large amount of boilerplate code from all the individual, non-standardized implementations goes away.

We would need:

  • the sibling operations protocol
  • a generic sibling command (set)
  • a range of sibling handler implementations

The implementation and a migration to this new approach would be easy. Support for individual sibling types could be added one-by-one. In most cases, it should be easy to preserve a substitute for each create-sibling- command that simply maps its API onto the new sibling create call signature.

It might be worth considering the API of the new command(s) to be inspired by (and a superset of) git annex (init|enable)remote. It may be a sensible way to achieve a uniform configurability of any kind of sibling (git, git-annex, something-datalad).

@christian-monch
Copy link
Contributor

christian-monch commented Jul 15, 2024

General description

This outlines the ideas for a protocol that supports communication between datalad and "sibling
handler implementations". It is modelled after the (git-annex protocol)[https://git-annex.branchable.com/design/external_special_remote_protocol/] for communication between git-annex and external special remote implementations.

The concepts divides the tasks into generic tasks that are implemented in a sibling-agnostic way in datalad and tasks that are remote-specific and are implemented in the handlers.

Generic Tasks (implemented in datalad's sibling command)

  • check for existing names
  • query siblings
  • remove siblings (this might have to be sibling-specific?)
  • manage publish dependencies

Sibling-specific tasks (implemented in handlers)

  • create siblings
  • delete siblings
  • enable siblings (added because datalad code contains specific handling of WEBDAV)
  • configure siblings

Protocol example

The following would be a typical communication between datalad and a sibling-handler for ORA remotes during creation of an ORA-remote:

DATALAD -- RIA/ORA handler

<-- VERSION 1
--> CREATESIBLING ria-store-1
<-- GETGITDIR
--> VALUE /home/datalad/test-1/.git
<-- GETPARAMETER url
--> VALUE ria+ssh://localhost/tmp/ria-test-store-1
<-- GETPARAMETER storage-only
--> VALUE no
<-- GETPARAMETER new-store-ok
--> VALUE yes

[... handler might ask for more parameters here] 

<-- SIBLINGS
--> VALUE
-->
<-- CREATESIBLING-SUCCESS ria-store-1 ria-store-1-storage

Protocol definition:

Each sibling handler has to support the following commands

  • CREATESIBLING name
    requests the creation of a sibling with the given name. If creation was successful, the handler answers with:

    • CREATESIBLING-SUCCESS name name*
      where name are the names of the siblings (remotes) that were created.
      If the sibling(s) could not be created, the handler answers with:
    • CREATESIBLING-FAILURE error-message
      The handler might send ERROR-message at any time in addition.
  • DELETESIBLING name
    requests the deletion of a sibling with the given name. If deletion was successful, the handler answers with:

    • DELETESIBLING-SUCCESS
      If the sibling could not be deleted, the handler answers with:
    • DELETESIBLING-FAILURE error-message
      The handler might send ERROR-message at any time in addition.
  • ENABLESIBLING name
    requests enabling of a sibling with the given name. The handler would usually use GETCONFIGLIST name to read all configurations for the remote. If the operation was successful, the handler answers with:

    • ENABLESIBLING-SUCCESS
      If the sibling could not be enabled, the handler answers with:
    • ENABLESIBLING-FAILURE error-message
      The handler might send ERROR-message at any time in addition.
  • CONFIGURESIBLING name
    requests configuration of a sibling with the given name. The handler would usually use GETCONFIGLIST name to read all configurations for the remote. If the operation was successful, the handler answers with:

    • CONFIGURESIBLING-SUCCESS
      If the sibling could not be configured, the handler answers with:
    • CONFIGURESIBLING-FAILURE error-message
      The handler might send ERROR-message at any time in addition.

datalad should support the following commands:

  • GETGITDIR
    reply with VALUE and the directory of the git-repository to which the sibling belongs.

  • GETCONFIG name config-key
    reply with VALUE and the content of the configuration key config-key (encoded to escape newlines) of the sibling with the name name.

  • GETCONFIGLIST name
    reply with a list of all configuration keys and values of the sibling with the name name. A list consists of a line VALUE and arbitrarily many non-empty lines with list-values, followed by an empty line.

  • SETCONFIG name config-name value
    set the content of the configuration key config-key of the sibling with the name name to the value value (value should be encoded to escape newlines).

  • GETPARAMETER name
    reply with the value of the parameter name, i.e. of key-value parameters that were given to datalad siblings .... If a parameter is not set, an empty value will be returned.

  • PROGRESS int int?
    show progress to the user. The first integer is the number of elements that are processed. The optional second parameter is the total number of elements that should be processed. Can be sent during the execution of CREATESIBLING, DELETESIBLING, ENABLESIBLING, or CONFIGURESIBLING.

  • ERROR error-message
    show an error message to the user. Can be sent any time.

  • DEBUG debug-message
    show a debug message to the user, if debug is enabled in datalad. Can be sent any time.

  • INFO info-message
    show an info message to the user. Can be sent any time.

  • SIBLINGS
    reply with a list of names of all currently existing siblings.

Open questions

  • Should data be exchanges via JSON-strings, e.g. CREATESIBLING ria-store-1 {"name": "ria-store-1"}?
  • Should the protocol provide special support for credential handling, or would it be enough for a handler to read generic configuration values?
  • Is GETCONFIGLIST necessary, or should the handlers read sibling configuration information from the git-repository?
  • Is SIBLINGS necessary, or should the handlers read sibling names from the git-repository?
  • Do we need/want external programs, or could we also support a Python-interface (analog to _GitHubLike in datalad/distributed/create_sibling_ghlike.py) that would allow Python-plugins as sibling-handlers? There could be a special "external" sibling handler that translates between the Python-interface and the protocol described above.

@mih
Copy link
Member Author

mih commented Jul 16, 2024

In addition to the open questions, I think the protocol should support capability reporting. For example, a sibling handler for something read-only could not delete a sibling. Rather than trying and failing, it should be able to say that this is not support (or rather say what operations are supported). This would also streamline future extensions. Let's say some converting ability, we a handler for a future sibling type can convert an existing ria sibling setup to its own type.

Re data encoding: I have a preference for keeping it very simple. The git-annex protocol has worked very well with this simplicity. I do not see a need to go beyond that, personally.

Re query approach: I thing the engine should be able to provide all essential bits of information. If that is not sufficient for a particular handler, the git repo is there to help out, but that blows up implementation complexity and is a problem for future-proofing implementations.

Re credentials: tricky one. It would be absolute instrumental if this feature would be available. I would not be able to come up with an approach myself, where I would have the confidence to say "this will work".

Re handler implementations as external programs: yes we want to support external programs. Having a(nother) handler that can take a Python class and do the right thing is no contradiction from my POV.

@christian-monch
Copy link
Contributor

christian-monch commented Jul 16, 2024

Thank you for the comments. Below are notes from a conversation about the individual points that were raised:

In addition to the open questions, I think the protocol should support capability reporting. For example, a sibling handler for something read-only could not delete a sibling. Rather than trying and failing, it should be able to say that this is not support (or rather say what operations are supported). This would also streamline future extensions. Let's say some converting ability, we a handler for a future sibling type can convert an existing ria sibling setup to its own type.

Good idea, I will add commands for capability reporting.

Re data encoding: I have a preference for keeping it very simple. The git-annex protocol has worked very well with this simplicity. I do not see a need to go beyond that, personally.

Let's keep it to simple strings without newline then.

Re query approach: I think the engine should be able to provide all essential bits of information. If that is not sufficient for a particular handler, the git repo is there to help out, but that blows up implementation complexity and is a problem for future-proofing implementations.

Ok, query capabilities will be provided by the engine.

Re credentials: tricky one. It would be absolute instrumental if this feature would be available. I would not be able to come up with an approach myself, where I would have the confidence to say "this will work".

We keep this open for now.

Re handler implementations as external programs: yes we want to support external programs. Having a(nother) handler that can take a Python class and do the right thing is no contradiction from my POV.

The priority will be on communication with external programs. Python-based handler can be implemented based on an adapted annexremote-package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants