-
Notifications
You must be signed in to change notification settings - Fork 29
Fixes #38332 - Remove host key from known_hosts before first Ansible job #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
adamruzicka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be willing to let the parameter override slide, but we need to handle all the hosts in the inventory.
29718ad to
0e53e6e
Compare
adamruzicka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't tested yet, but it looks reasonable
| @passphrase = action_input['secrets']['key_passphrase'] | ||
| @execution_timeout_interval = action_input[:execution_timeout_interval] | ||
| @cleanup_working_dirs = action_input.fetch(:cleanup_working_dirs, true) | ||
| prune_known_hosts_on_first_execution(input.values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to use the reassembled ansible inventory as stored in @inventory instead of pulling things out of the raw input? Not a hard requirement though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose to use input.values instead of @inventory because the latter doesn't include the first_execution parameter.
While extracting this value from input.values for each host adds a bit of overhead, it seems necessary in order to access first_execution.
Do you think there’s a cleaner or more efficient way to handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that other parameters used in prune_known_hosts_on_first_execution are available through @inventory? If so, and just one first_execution param is missing, wouldn't it be logical to include that parameter there instead of rebuilding kind of the same information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the other parameters used in prune_known_hosts_on_first_execution are available through @inventory. However, I’m not sure I fully follow, I’m not really rebuilding anything. I’m simply using the original input.values because @inventory doesn’t include the first_execution field for each host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the following method to resolve this:
https://github.com/theforeman/smart_proxy_ansible/pull/99/files#diff-65b556454b9717ae33c08487e6ace03409dadb3c66b54f921b1dee1caccfb142R334-R348
ekohl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It really helps if the commit message itself and the PR description explain why something is needed. Today we use Redmine, but I think it's a good practice to maintain a good git log.
Security wise, doesn't this mean that on every single job you wipe the known SSH keys? In other words, would this effectively mean we run without any known host entries at all? If so, then why not bypass known hosts entirely and use /dev/null?
Updated.
We only remove the host key from known_hosts on the first execution for a given host and proxy combination. The goal is to prevent connection failures caused by stale SSH host keys, which can occur when a host is reprovisioned and its SSH keys are regenerated. As for using |
Is there currently a job or mechanism that removes the known hosts key? For example, after a host is posting the "finish" step you can remove it. Another mechanism you can consider is to optionally enhance the finish script to upload the known host keys and actively update the entries. The last option is to implement |
Yes, the mechanism I’m using in this PR is here. It follows the same approach used in the This cleanup runs just before establishing the SSH connection, and only on first execution for a given host/proxy pair, to avoid stale key failures after reprovisioning.
Regarding the additional suggestions:
I definitely agree it’s worth exploring long-term improvements to this, but I’d prefer to keep this PR scoped to aligning Ansible-based jobs with the behavior we already have for script-based execution. |
stejskalleos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nofaralfasi, can you please respond to the comments from @ekohl
It looks like those are the last two things to settle before the merge, unless @ofedoren and @adamruzicka have other comments to add.
|
QE: @shubhamsg199 |
2ac2809 to
e7e4051
Compare
ekohl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inline code suggestions are all very minor. However, theforeman/smart_proxy_remote_execution_ssh@cdc55cd introduced prune_known_hosts! (landed in 0.5.0) so you need to update the dependency in the gemspec.
Other than that I think it looks sensible.
ekohl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs to be modified to ~> 0.5.
| gem.add_runtime_dependency('smart_proxy_remote_execution_ssh', '~> 0.4') |
The `prune_known_hosts!` method used in this plugin was introduced in version 0.5.0, therefore the dependency needs to be updated accordingly.
Just to be extra clear, the "all other cases" really means just when using the script. Pull doesn't use ssh keys at all and ansible ignores known hosts altogether by default due to https://github.com/theforeman/puppet-foreman_proxy/blob/master/manifests/plugin/ansible/params.pp#L10 |
adamruzicka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks good.
I'm a bit unhappy about this doing more than what we do in sp-rex-ssh, but in here we have much more information about the host.z
|
Tested the changes and is working as expected. A comment is added to the SAT-27377 |
Background & Motivation:
Foreman tracks which Smart Proxy executes remote jobs for each host. On first execution, it attempts to remove any existing SSH host key for that host from
/usr/share/foreman-proxy/.ssh/known_hoststo avoid key mismatch issues.However, the cleanup mechanism only runs when the first job uses the script provider.
If the first job uses Ansible, the cleanup is skipped, but the job is still recorded as the host's first execution.
As a result, if a script job is run later against that same host, the SSH connection may fail due to a stale known_hosts entry - since the expected cleanup was already bypassed during the initial Ansible run.
This PR ensures that:
Ansible-based executions properly participate in the host key cleanup mechanism during first execution, aligning the behavior with script-based jobs and preventing subsequent SSH key mismatch errors.