Skip to content

Conversation

@nofaralfasi
Copy link
Contributor

@nofaralfasi nofaralfasi commented Mar 31, 2025

Background & Motivation:
Foreman tracks which Smart Proxy executes remote jobs for each host. On first execution, it attempts to remove any existing SSH host key for that host from /usr/share/foreman-proxy/.ssh/known_hosts to avoid key mismatch issues.

However, the cleanup mechanism only runs when the first job uses the script provider.
If the first job uses Ansible, the cleanup is skipped, but the job is still recorded as the host's first execution.

As a result, if a script job is run later against that same host, the SSH connection may fail due to a stale known_hosts entry - since the expected cleanup was already bypassed during the initial Ansible run.

This PR ensures that:
Ansible-based executions properly participate in the host key cleanup mechanism during first execution, aligning the behavior with script-based jobs and preventing subsequent SSH key mismatch errors.

Copy link
Contributor

@adamruzicka adamruzicka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be willing to let the parameter override slide, but we need to handle all the hosts in the inventory.

Copy link
Contributor

@adamruzicka adamruzicka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't tested yet, but it looks reasonable

@passphrase = action_input['secrets']['key_passphrase']
@execution_timeout_interval = action_input[:execution_timeout_interval]
@cleanup_working_dirs = action_input.fetch(:cleanup_working_dirs, true)
prune_known_hosts_on_first_execution(input.values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use the reassembled ansible inventory as stored in @inventory instead of pulling things out of the raw input? Not a hard requirement though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to use input.values instead of @inventory because the latter doesn't include the first_execution parameter.
While extracting this value from input.values for each host adds a bit of overhead, it seems necessary in order to access first_execution.
Do you think there’s a cleaner or more efficient way to handle this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that other parameters used in prune_known_hosts_on_first_execution are available through @inventory? If so, and just one first_execution param is missing, wouldn't it be logical to include that parameter there instead of rebuilding kind of the same information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the other parameters used in prune_known_hosts_on_first_execution are available through @inventory. However, I’m not sure I fully follow, I’m not really rebuilding anything. I’m simply using the original input.values because @inventory doesn’t include the first_execution field for each host.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really helps if the commit message itself and the PR description explain why something is needed. Today we use Redmine, but I think it's a good practice to maintain a good git log.

Security wise, doesn't this mean that on every single job you wipe the known SSH keys? In other words, would this effectively mean we run without any known host entries at all? If so, then why not bypass known hosts entirely and use /dev/null?

@nofaralfasi
Copy link
Contributor Author

It really helps if the commit message itself and the PR description explain why something is needed. Today we use Redmine, but I think it's a good practice to maintain a good git log.

Updated.

Security wise, doesn't this mean that on every single job you wipe the known SSH keys? In other words, would this effectively mean we run without any known host entries at all? If so, then why not bypass known hosts entirely and use /dev/null?

We only remove the host key from known_hosts on the first execution for a given host and proxy combination. The goal is to prevent connection failures caused by stale SSH host keys, which can occur when a host is reprovisioned and its SSH keys are regenerated.

As for using /dev/null for known_hosts - that's effectively disabling SSH host key verification entirely, which is a much larger security tradeoff. This approach aims to balance safety and usability by avoiding false failures due to expected host key changes, while still maintaining SSH verification for all other cases.

@ekohl
Copy link
Member

ekohl commented May 15, 2025

We only remove the host key from known_hosts on the first execution for a given host and proxy combination. The goal is to prevent connection failures caused by stale SSH host keys, which can occur when a host is reprovisioned and its SSH keys are regenerated.

Is there currently a job or mechanism that removes the known hosts key? For example, after a host is posting the "finish" step you can remove it.

Another mechanism you can consider is to optionally enhance the finish script to upload the known host keys and actively update the entries.

The last option is to implement KnownHostsCommand and have a command that actively queries Foreman. Scalability may be a concern as well as when Foreman is unavailable. Common solutions to that is a cache or actively pushing updates when Foreman changes so the Smart Proxy has a local database. This may conflict with FreeIPA/IdM which already maintains a database of host SSH keys.

@nofaralfasi
Copy link
Contributor Author

Is there currently a job or mechanism that removes the known hosts key? For example, after a host is posting the "finish" step you can remove it.

Yes, the mechanism I’m using in this PR is here.

It follows the same approach used in the foreman_remote_execution plugin for SSH-based REX jobs — specifically:

This cleanup runs just before establishing the SSH connection, and only on first execution for a given host/proxy pair, to avoid stale key failures after reprovisioning.

Another mechanism you can consider is to optionally enhance the finish script to upload the known host keys and actively update the entries.

The last option is to implement KnownHostsCommand and have a command that actively queries Foreman. Scalability may be a concern as well as when Foreman is unavailable. Common solutions to that is a cache or actively pushing updates when Foreman changes so the Smart Proxy has a local database. This may conflict with FreeIPA/IdM which already maintains a database of host SSH keys.

Regarding the additional suggestions:

  • A mechanism tied to the “finish” template could work, but it may not cover all use cases (e.g. reprovisioning outside of Foreman).
  • The KnownHostsCommand approach would be more robust, but it adds complexity, especially with regards to availability, caching, and integration with other tools like FreeIPA/IdM, as you mentioned.

I definitely agree it’s worth exploring long-term improvements to this, but I’d prefer to keep this PR scoped to aligning Ansible-based jobs with the behavior we already have for script-based execution.

Copy link
Contributor

@stejskalleos stejskalleos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nofaralfasi, can you please respond to the comments from @ekohl

It looks like those are the last two things to settle before the merge, unless @ofedoren and @adamruzicka have other comments to add.

@stejskalleos
Copy link
Contributor

QE: @shubhamsg199

@nofaralfasi nofaralfasi marked this pull request as draft June 19, 2025 09:28
@nofaralfasi nofaralfasi force-pushed the 38332 branch 2 times, most recently from 2ac2809 to e7e4051 Compare June 19, 2025 10:45
@nofaralfasi nofaralfasi marked this pull request as ready for review June 19, 2025 10:47
@nofaralfasi nofaralfasi requested review from adamruzicka and ekohl June 19, 2025 10:49
Copy link
Member

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline code suggestions are all very minor. However, theforeman/smart_proxy_remote_execution_ssh@cdc55cd introduced prune_known_hosts! (landed in 0.5.0) so you need to update the dependency in the gemspec.

Other than that I think it looks sensible.

Copy link
Member

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be modified to ~> 0.5.

gem.add_runtime_dependency('smart_proxy_remote_execution_ssh', '~> 0.4')

The `prune_known_hosts!` method used in this plugin was introduced in version 0.5.0,
therefore the dependency needs to be updated accordingly.
@adamruzicka
Copy link
Contributor

This approach aims to balance safety and usability by avoiding false failures due to expected host key changes, while still maintaining SSH verification for all other cases.

Just to be extra clear, the "all other cases" really means just when using the script. Pull doesn't use ssh keys at all and ansible ignores known hosts altogether by default due to https://github.com/theforeman/puppet-foreman_proxy/blob/master/manifests/plugin/ansible/params.pp#L10

Copy link
Contributor

@adamruzicka adamruzicka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks good.

I'm a bit unhappy about this doing more than what we do in sp-rex-ssh, but in here we have much more information about the host.z

@shubhamsg199
Copy link

shubhamsg199 commented Jul 15, 2025

Tested the changes and is working as expected. A comment is added to the SAT-27377

@nofaralfasi nofaralfasi merged commit d26952b into theforeman:master Jul 15, 2025
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants