Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rsync flag option to copy files using rsync #3143

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

olamilekan000
Copy link
Contributor

change adds rsync as an option to the copy command

issue #2198

@olamilekan000 olamilekan000 force-pushed the add-rsycn-as-an-option-for-copying-files-btw-host-and-guest branch 6 times, most recently from a508328 to 4b57bc6 Compare January 23, 2025 17:09
Copy link
Member

@nirs nirs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have rsync in the guest in all cases? Minimal vm images do not have rsync.

Also adding options is bad both for users and for testing. If it is better to copy with rsync, we should detect if rsync is available in the guest and host, and use it without adding new option. Otherwise fall back to scp.

If we always have rsync, better to use it instead of scp and minimize the testing matrix.

cmd/limactl/copy.go Show resolved Hide resolved
cmd/limactl/copy.go Show resolved Hide resolved
cmd/limactl/copy.go Outdated Show resolved Hide resolved
@@ -34,6 +34,7 @@ func newCopyCommand() *cobra.Command {

copyCommand.Flags().BoolP("recursive", "r", false, "copy directories recursively")
copyCommand.Flags().BoolP("verbose", "v", false, "enable verbose output")
copyCommand.Flags().BoolP("rsync", "", false, "use rsync for copying instead of scp")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not flexible enough. What if we want to another tool later? If we add an option, adding a "tool" or "backend" option will be better, with possible values "scp", "rsync".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tool option will default to rsync right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add an option rsync sounds like a better default. But this is trivial to change if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just remove the flag and only support rsync, unless there is a use case where scp is preferable over rsync

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rsync isn't available by default on Alpine, Arch Linux, and Debian (see here for pipeline logs). A potential fix for this is to ssh into the guest vm and check if rsync is installed. @jandubois suggested running a command like this within the guest VM:
rsync --rsh="ssh" remote_user@remote_host rsync --version .

However, implementing this approach introduces a limitation: rsync won’t work in scenarios where the instance name is unavailable, as the instance name is needed to fetch the hostname using store.Inspect(instName).

For example, this command would fail since it doesn’t provide the instance name:

limactl copy -r /etc/config /tmp

On the other hand, this would work because it includes the instance name:

limactl copy -r default:/etc/config /tmp

@nirs @AkihiroSuda @jandubois @afbjorklund I’d like to hear your thoughts on this approach and any potential alternatives. Would love to know if there's a better way to handle this while ensuring rsync works reliably.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rsync isn't available by default on Alpine, Arch Linux, and Debian (see here for pipeline logs).

So to support these distros with required rsync, the templates will have to install rsync. Can you check what is the cost in time and space?

A potential fix for this

This is not a fix, but the way to deal with missing rsync.

is to ssh into the guest vm and check if rsync is installed. @jandubois suggested running a command like this within the guest VM: rsync --rsh="ssh" remote_user@remote_host rsync --version .

We can also use ssh user@vm-ip command -v rsync. We already use ssh to preform the initial connection so it should be easier to use it to detect guest capabilities.

However, implementing this approach introduces a limitation: rsync won’t work in scenarios where the instance name is unavailable, as the instance name is needed to fetch the hostname using store.Inspect(instName).

For example, this command would fail since it doesn’t provide the instance name:

limactl copy -r /etc/config /tmp

The online help instructs to prefix the file name with the instance name. It cannot work without this regardless of the tool use to copy.

So the issue about missing instance name does not exist.

On the other hand, this would work because it includes the instance name:

limactl copy -r default:/etc/config /tmp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also use ssh user@vm-ip command -v rsync. We already use ssh to preform the initial connection so it should be easier to use it to detect guest capabilities.

@nirs correct me if i'm wrong, is this where you're referring to?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this it the code checking for initial ssh connection. If you run limactl --debug start ... you wll see the ssh commands used to detect if the guest is accessible.

@nirs
Copy link
Member

nirs commented Jan 23, 2025

Also, do we have tests for code modified in this PR? if not we should not change it before adding tests. The new rsync flow should also have tests.

@jandubois
Copy link
Member

@olamilekan000 I've asked before to please test your changes locally instead of force-pushing an update every 10 minutes for 2 hours, triggering potentially costly CI runs all the time.

@olamilekan000
Copy link
Contributor Author

@olamilekan000 I've asked before to please test your changes locally instead of force-pushing an update every 10 minutes for 2 hours, triggering potentially costly CI runs all the time.

I'd appreciate it if you can help with steps to run locally. I did try but It didn't work as expected.

@AkihiroSuda
Copy link
Member

If we always have rsync, better to use it instead of scp and minimize the testing matrix.

Yes, rsync is always available on macOS and most Linux distros.
At least it is easily installable.

@nirs
Copy link
Member

nirs commented Jan 24, 2025

@olamilekan000 I've asked before to please test your changes locally instead of force-pushing an update every 10 minutes for 2 hours, triggering potentially costly CI runs all the time.

I'd appreciate it if you can help with steps to run locally. I did try but It didn't work as expected.

@olamilekan000, Can you describe how did you try to test and why it did not work well?

You can also join the #lima slack channel to get quicker help from other developers.

@nirs nirs requested a review from afbjorklund January 24, 2025 15:22
@olamilekan000 olamilekan000 force-pushed the add-rsycn-as-an-option-for-copying-files-btw-host-and-guest branch from 4b57bc6 to aa5cf22 Compare January 24, 2025 20:45
@olamilekan000
Copy link
Contributor Author

olamilekan000 commented Jan 24, 2025

@olamilekan000 I've asked before to please test your changes locally instead of force-pushing an update every 10 minutes for 2 hours, triggering potentially costly CI runs all the time.

I'd appreciate it if you can help with steps to run locally. I did try but It didn't work as expected.

@olamilekan000, Can you describe how did you try to test and why it did not work well?

You can also join the #lima slack channel to get quicker help from other developers.

@nirs I have joined and shared my error logs on the channel.

@olamilekan000 olamilekan000 force-pushed the add-rsycn-as-an-option-for-copying-files-btw-host-and-guest branch from aa5cf22 to 4c95f10 Compare January 25, 2025 00:27
cmd/limactl/copy.go Outdated Show resolved Hide resolved
cmd/limactl/copy.go Outdated Show resolved Hide resolved
cmd/limactl/copy.go Outdated Show resolved Hide resolved
cmd/limactl/copy.go Outdated Show resolved Hide resolved
cmd/limactl/copy.go Outdated Show resolved Hide resolved
cmd/limactl/copy.go Outdated Show resolved Hide resolved
sshArgs := sshutil.SSHArgsFromOpts(sshOpts)

sshCmd := exec.Command(arg0, append(sshArgs, scpArgs...)...)
sshCmd := exec.Command(arg0, createArgs(sshArgs, copyToolArgs, defaultTool)...)
sshCmd.Stdin = cmd.InOrStdin()
sshCmd.Stdout = cmd.OutOrStdout()
sshCmd.Stderr = cmd.ErrOrStderr()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run rsync via ssh? it does not make sense.

Can you share the complete the scp and rsync commands that you want to run?

The commands can be tested from the shell without lima, using the instance ip. limactl just make this easier for the user by building the right command and running it transparently.

Copy link
Contributor Author

@olamilekan000 olamilekan000 Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for scp

scp -F ~/.lima/deb/ssh.config "myfile.txt" olalekanodukoya@lima-deb:/dump
myfile.txt                                            100%    7     2.1KB/s   00:00

for rsync

rsync -a -e "ssh -i  ~/.lima/deb/ssh.config -p 50337" myfile.txt [email protected]:/dump/

cmd/limactl/copy.go Outdated Show resolved Hide resolved
@olamilekan000 olamilekan000 force-pushed the add-rsycn-as-an-option-for-copying-files-btw-host-and-guest branch 5 times, most recently from 73fcc08 to a9bade5 Compare January 26, 2025 15:22
path := strings.Split(arg, ":")
switch len(path) {
case 1:
inst, ok := instances[instName]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How this will work? instances is an empty map created in this function.

I think we have only one case - instance:path, and it is common to both scp and rsync, so this check should be done first before we consider the copy tool. Input validation must always be done first.

}
sshStr = fmt.Sprintf("ssh -p %s -i %s", fmt.Sprintf("%d", inst.SSHLocalPort), "~/.lima/_config/user")
rsyncArgs = append(rsyncArgs, "-avz", "-e", sshStr, path[1])
instances[instName] = inst
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, you add the instance to the map so the next iteration will find it. This is very complicated and hard to follow way to handle the arguments.

We should instead process the argument first, and convert them to internal structure that will be used to construct the command. This should be common to scp and rsync, so it should be done before we consider the tool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should instead process the argument first, and convert them to internal structure that will be used to construct the command. This should be common to scp and rsync, so it should be done before we consider the tool.

The argument processing for scp and rsync is quite different, so combining them into a single function could make it complex and harder to maintain. II think it’s better to keep them separate and focus on consistent output structures instead.

cmd/limactl/copy.go Outdated Show resolved Hide resolved
cmd/limactl/copy.go Outdated Show resolved Hide resolved
cmd/limactl/copy.go Outdated Show resolved Hide resolved
cmd/limactl/copy.go Outdated Show resolved Hide resolved
cmd/limactl/copy.go Outdated Show resolved Hide resolved
cmd/limactl/copy.go Outdated Show resolved Hide resolved
@@ -424,6 +424,41 @@ func (a *HostAgent) Info(_ context.Context) (*hostagentapi.Info, error) {
return info, nil
}

func (a *HostAgent) installPackage() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not install the package here. If we depend on rsync in the guest, we can install the package using cloud-init.

In our user-data (part of cidata.iso), we can add:

packages:
- rsync

This will install rsync using cloud-init when the instance is provisioned. For distros that do not support this, the provision part of the yaml can install rsync manually.

sudo pacman -S --noconfirm rsync
else
echo "Unsupported Linux distribution. Please install rsync manually."
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you are reinventing cloudinit :-)

@nirs
Copy link
Member

nirs commented Jan 26, 2025

I think we can do this:

  • Use cloud-init to install rsync
    • If cloud-init packages does not work for a distro, it must be done in the lima.yaml provision scripts
  • Replace scp with rsync

@AkihiroSuda what do you think?

@olamilekan000 olamilekan000 force-pushed the add-rsycn-as-an-option-for-copying-files-btw-host-and-guest branch 4 times, most recently from 1d17b78 to 7c55a22 Compare January 26, 2025 22:20
@@ -192,7 +192,7 @@ if [ "$got" != "$expected" ]; then
fi

INFO "Testing limactl copy command"
tmpfile="$HOME/lima-hostname"
tmpfile="/var/tmp/lima-hostname"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not using $HOME/lima-hostname is a very good change - running tests locally should not create files in your home directory.

But this name may clash with another test, or a file created by someone else.

Best to use a temporary directory that can never clash with anything:

% mktemp -d /var/tmp/lima-test-templates.XXXXXX
/var/tmp/lima-test-templates.vuRmiE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3163
I raised a pr for the change @nirs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! but it seems that this is the only test for limactl copy.

I opened #3170 for improving limactl copy tests. I hope we can improve test coverage before we change the code to avoid regressions.

Copy link
Contributor Author

@olamilekan000 olamilekan000 Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. That can be done after #3163 gets merged.

@AkihiroSuda
Copy link
Member

Use cloud-init to install rsync

SGTM

@olamilekan000 olamilekan000 force-pushed the add-rsycn-as-an-option-for-copying-files-btw-host-and-guest branch from 7c55a22 to 54e7507 Compare January 29, 2025 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants