Skip to content

Conversation

@ShimShtein
Copy link
Member

@ShimShtein ShimShtein commented Dec 3, 2025

Make sure the fact parser searches by all unique constraint combinations before creating a new object.

Steps to reproduce:
his is from memory, exact steps may be a bit different.

  1. Create an OS with:

    name: RedHat
    major: 9 (or whatever matches your host OS)
    minor: 7 (or whatever matches your host OS)
    description: RHEL 9.7

  2. Ensure no OS exists with name RHEL & major 9 & minor 7

  3. Run the installer on a RHEL 9.7 Satellite server

Copy link
Member

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to do a review, but this caught my eye. So take my comments here as suggestions.

Looking at the failed tests I think we're using assert_equal in the wrong order, leading to confusing errors. Not a blocker.

But the failing tests do look related so please have a look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have FacterDB as a dependency which contains these kinds of fact sets. Is it possible to reuse that? I see others use static files so don't consider this blocking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there is a warning under the hood of FacterDB:

./home/vagrant/foreman/.vendor/ruby/3.0.0/gems/jgrep-1.5.4/lib/jgrep.rb:27: warning: detected duplicate key "osfamily" in JSON object. This will raise an error in json 3.0 unless enabled via `allow_duplicate_key: true` at line 86377 column 2
/home/vagrant/foreman/.vendor/ruby/3.0.0/gems/jgrep-1.5.4/lib/jgrep.rb:27: warning: detected duplicate key "fqdn" in JSON object. This will raise an error in json 3.0 unless enabled via `allow_duplicate_key: true` at line 197772 column 2
/home/vagrant/foreman/.vendor/ruby/3.0.0/gems/jgrep-1.5.4/lib/jgrep.rb:27: warning: detected duplicate key "hostname" in JSON object. This will raise an error in json 3.0 unless enabled via `allow_duplicate_key: true` at line 402388 column 2

It is not problematic yet, but I preferred not to mess with external dependency.

@ShimShtein ShimShtein force-pushed the fix_operatingsystem_deduplication branch from 5a2c434 to 65dacac Compare December 3, 2025 16:52
@github-actions github-actions bot added the UI label Dec 3, 2025
@ShimShtein ShimShtein force-pushed the fix_operatingsystem_deduplication branch from 65dacac to 8fa0cfa Compare December 3, 2025 17:14
@ShimShtein
Copy link
Member Author

Fixed the original errors that @ekohl was referring to. The new errors seem trickier, since they do not appear in my local environment.

@ShimShtein
Copy link
Member Author

[test]

@ShimShtein ShimShtein force-pushed the fix_operatingsystem_deduplication branch 3 times, most recently from bf74f09 to 1f1517b Compare December 7, 2025 14:45
@ShimShtein
Copy link
Member Author

Found out that the errors came from newer gem version in the CI. Updated the gem versions locally and the errors appeared for me too.
The errors were related to problems parsing fact files in the facterdb gem. Some of those files contain invalid json.

I have fixed the issue by forcing facterdb to load only the files that are relevant to the checks we perform. Luckily those files were OK, and the tests could run successfully.

@ShimShtein ShimShtein force-pushed the fix_operatingsystem_deduplication branch from 1f1517b to 99ebc5c Compare December 7, 2025 15:05
@ekohl
Copy link
Member

ekohl commented Dec 7, 2025

Did you file an issue for the broken files?

@ShimShtein
Copy link
Member Author

Did you file an issue for the broken files?

We're using an antique version of facterdb gem. In the newer versions of the gem we don't have older facter versions of the data, so I didn't test our code against the newer facterdb. If we want to file a ticket, it would not make sense to file it against v1.27 when v4.0 is already out.

@stejskalleos stejskalleos self-requested a review December 9, 2025 12:43
@stejskalleos stejskalleos self-assigned this Dec 9, 2025
@ShimShtein ShimShtein force-pushed the fix_operatingsystem_deduplication branch from 99ebc5c to c4e65cb Compare December 9, 2025 13:54
Copy link
Contributor

@stejskalleos stejskalleos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Should the foreman_ansible fact parser be updated as well?
  • I've never had a problem with facts & duplicate OS; Could you please share the steps on how to reproduce the issue?

@ShimShtein
Copy link
Member Author

Added ansible fact management and added the reproducer steps to the description.

Basically you want to create an operating system with the same description as the one that will be generated by the fact parser, and then run the parser.

For puppet parser it's crucial, since it is used during the upgrade procedure.

@ekohl
Copy link
Member

ekohl commented Dec 11, 2025

We're using an antique version of facterdb gem. In the newer versions of the gem we don't have older facter versions of the data, so I didn't test our code against the newer facterdb. If we want to file a ticket, it would not make sense to file it against v1.27 when v4.0 is already out.

That's fair. We should look at updating, but that's for another PR.

default_scope -> { order(:title) }

# Find all operatingsystems that are duplicates of the given operating system according to all unique constraints
def self.duplicate_of(os)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a hard time interpreting what duplicate_of meant when reading it out of context. Isn't this effectively something like find_existing_matching_entry?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also find the naming not as clear, when reading it in context and would appreciate a different naming for the method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads a lot like Rails' find_or_(create|initialize)_by, especially as in most cases you pass in a newly created object just to verify it's a dupe or not.

So couldn't we re-use that in a slightly different way (pseudocode, not actually working)?

def self.find_or_create_by(attributes)
  self.find_by(attributes.slice(:name, :major, :minor)) || self.find_by(attributes.slice(:description)) || self.find_by(title: generate_title) || self.create(attributes)
end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would find_by_attributes sound better? Or maybe something like find_by_unique?

As for find_or_create - I was thinking about it, but the code does not always creates the record immediately after the find, so I didn't want to change the behavior too much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_unique_operatingsystem? find_operatingsystem_by_attributes? that way the name implies this is a special thing for OSes

def self.duplicate_of(os)
scope = where(name: os.name, major: os.major, minor: os.minor)
scope = scope.or(where(description: os.description)) if os.description.present?
scope.or(where(title: os.title || os.generate_title))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like in all examples this returns an array. I would then expect the method name to reflect this returns a plural (e.g .duplicates_of).


def operatingsystem
os = Operatingsystem.where(os_hash).first_or_initialize
os = Operatingsystem.new(os_hash)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appears to be a pattern of calling new and then checking if a duplicate exists. I can see this being overlooked as a step when re-factoring or writing new code. Would it then make sense to include the duplicate_of into the initialize method for Operatingsystem so that it's done by default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go the other way around - instead of passing a new operatingsystem object to the method, just pass a set of attributes. It makes more sense in the usage: you gather a set of attributes, and then search for an existing object that meets the defined criteria.
Maybe find_existing with a concrete set of attributes would be a better signature for this method.

Comment on lines +250 to +252
expected = FactoryBot.create(:operatingsystem, name: 'RHEL', major: '9', minor: '6', description: 'RHEL 9.6')
# now we update the description in the DB to make sure we're testing the title comparison
Operatingsystem.where(id: expected.id).update_all(description: 'ZZZ 9.6')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When trying this out manually, I ended up using an OS without a description at all, should we excplicitly test that case too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the description has a unique constraint in the DB, there would be at max one instance of such record. I think it's the same test as for any other value (that I already test).

def self.duplicate_of(os)
scope = where(name: os.name, major: os.major, minor: os.minor)
scope = scope.or(where(description: os.description)) if os.description.present?
scope.or(where(title: os.title || os.generate_title))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're passing os.title || os.generate_title because os might be unsaved yet and thus set_title didn't run yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's why I extracted the generate_title out of the set_title hook.

@ShimShtein ShimShtein force-pushed the fix_operatingsystem_deduplication branch from 8d671e0 to 1336914 Compare December 17, 2025 15:00
@ShimShtein
Copy link
Member Author

Updated the PR. Renamed the new search method and fixed all discrepancies associated with it.

@evgeni evgeni marked this pull request as draft December 17, 2025 15:12
@evgeni evgeni marked this pull request as ready for review December 17, 2025 15:12
Comment on lines 6 to 7
os = Operatingsystem.new(os_hash)
os = Operatingsystem.find_by_attributes(**os_hash).first || os
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid instantiating objects that may end up unused

Suggested change
os = Operatingsystem.new(os_hash)
os = Operatingsystem.find_by_attributes(**os_hash).first || os
os = Operatingsystem.find_by_attributes(**os_hash).first || Operatingsystem.new(os_hash)

@@ -1,5 +1,5 @@
import React from 'react';
import { act } from '@testing-library/react';
import { act } from 'react-dom/test-utils';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed to rewrite the test to another library for this bugfix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was added by mistake. Removed from the PR.

end

# Make sure the operating system object that we want to create is not a duplicate of an existing one.
os = Operatingsystem.find_by_attributes(name: os.name, major: os.major, minor: os.minor, description: os.description).first || os
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicating logic we already have and firing off more database queries than needed. I'd prefer to replace the calls of find_or_initialize_by with find_by_attributes and dealing with the fact it can return no instances.

args[:release_name] = os_release_name if os_name == 'Debian' || os_name == 'Ubuntu'
# for Ansible, the CentOS Stream can be identified by missing minor version only
args[:name] = "CentOS_Stream" if os_name == 'CentOS' && os_minor.blank?
args[:description] = os_description
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is subtly introducing a new feature. Is that intended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original code, it was added during the generation of the new operatingsystem object. Now I have extracted it earlier, since we want to make sure no OS exists with the same description.

@evgeni evgeni marked this pull request as draft December 17, 2025 15:19
@evgeni evgeni marked this pull request as ready for review December 17, 2025 15:19
@ShimShtein ShimShtein force-pushed the fix_operatingsystem_deduplication branch from 1336914 to 93d71a5 Compare December 17, 2025 15:45
@ShimShtein ShimShtein force-pushed the fix_operatingsystem_deduplication branch 2 times, most recently from de7c5e1 to 58579ac Compare December 18, 2025 10:47
Make sure the fact parser searches by all unique constraint combinations before creating a new object.
@ShimShtein ShimShtein force-pushed the fix_operatingsystem_deduplication branch from 58579ac to 50e0ef0 Compare December 18, 2025 11:08
@ShimShtein
Copy link
Member Author

@ekohl The requirement to have only a single request in puppet fact parser caused a bit of refactoring. Hopefully I didn't change the logic too much.
Ready for another round though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants