Skip to content

Conversation

@mrizzi
Copy link
Contributor

@mrizzi mrizzi commented Oct 21, 2025

Setting LC_COLLATE="C" in the embedded test database ensures the database sorting matches Rust's sorting behavior because both use byte-order comparison, making them compatible for test validation.

I had to set the env var because:

  • LC_COLLATE is set during database cluster initialization (initdb)
  • PostgreSQL reads it from the environment when initdb runs
  • postgresql_embedded (v0.20.0) doesn't expose initdb parameter customization
  • LC_COLLATE cannot be changed after database creation

@jcrossley3 let me know if this works locally for you.

Summary by Sourcery

Tests:

  • Configure embedded test database with LC_COLLATE="C" to ensure deterministic byte-order sorting matching Rust's string comparisons

@mrizzi mrizzi requested a review from jcrossley3 October 21, 2025 09:48
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Oct 21, 2025

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Injects setting of LC_COLLATE="C" into the embedded test database initialization by wrapping env::set_var in an unsafe block before calling create_for, ensuring deterministic byte-order sorting alignment between PostgreSQL and Rust in tests.

Sequence diagram for embedded test database initialization with LC_COLLATE

sequenceDiagram
    participant T as Test Runner
    participant E as embedded.rs
    participant Env as Environment
    participant PG as PostgreSQL Embedded
    T->>E: Call create()
    E->>Env: Set LC_COLLATE="C" (unsafe)
    E->>PG: create_for(default_settings())
    PG-->>E: Embedded DB instance
    E-->>T: Return (Database, PostgreSQL)
Loading

File-Level Changes

Change Details Files
Force LC_COLLATE="C" for embedded test database initialization
  • Insert unsafe block to set LC_COLLATE to "C"
  • Add explanatory comments on deterministic byte-order sorting and locale scope
common/db/src/embedded.rs

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `common/db/src/embedded.rs:28-30` </location>
<code_context>
+    // LC_COLLATE="C" ensures deterministic, byte-order sorting that matches Rust's string comparison,
+    // making tests portable across different system locales.
+    // This affects only the embedded test database, not production databases.
+    unsafe {
+        env::set_var("LC_COLLATE", "C");
+    }
     create_for(default_settings()?).await
</code_context>

<issue_to_address>
**suggestion:** Consider whether the use of 'unsafe' is necessary for setting an environment variable.

Since 'env::set_var' is safe, removing the 'unsafe' block will make the code clearer and avoid unnecessary use of unsafe.

```suggestion
    env::set_var("LC_COLLATE", "C");
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@codecov
Copy link

codecov bot commented Oct 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.46%. Comparing base (5bd7c2c) to head (70e39bd).
⚠️ Report is 20 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2051   +/-   ##
=======================================
  Coverage   68.46%   68.46%           
=======================================
  Files         366      366           
  Lines       20608    20608           
  Branches    20608    20608           
=======================================
  Hits        14109    14109           
+ Misses       5675     5671    -4     
- Partials      824      828    +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ctron
Copy link
Contributor

ctron commented Oct 21, 2025

I think this ok as a workaround. However, I'd also say it's a valid use case in general and should be supported by postgresql-embedded. Which would mean:

  • Opening an issue
  • Maybe even providing a PR

@mrizzi
Copy link
Contributor Author

mrizzi commented Oct 21, 2025

I think this ok as a workaround. However, I'd also say it's a valid use case in general and should be supported by postgresql-embedded. Which would mean:

* Opening an issue

* Maybe even providing a PR

TBH I thought about this and I was initially on the same page with you but then I also thought that, in the end, postgresql-embedded works anyway with its approach of not customizing anything during DB init (like this PR proves) but rather let the user specify the env var to influence the underlying postgresql behavior considering this is the default way postgresql is supposed to work, i.e. using environment variables.
Hence I was more incline in doing nothing towards postgresql-embedded.

@ctron
Copy link
Contributor

ctron commented Oct 21, 2025

I think it should be an option for the user (of the postgresql-embedded API) to use it. Or have the default.

@ctron
Copy link
Contributor

ctron commented Oct 21, 2025

Digging into this a big, InitDbBuilder from postgresql_embedded actually has support for this. It's just not patched through. While others are. I think this is just a lack of a PR.

@ctron
Copy link
Contributor

ctron commented Oct 21, 2025

Just digging in that code anyway. What we use for tests is:

        db.execute(Statement::from_string(
            db.get_database_backend(),
            format!("CREATE DATABASE \"{}\";", database.name),
        ))
        .await?;
        db.close().await?;

And to my understanding you can set those parameters with the CREATE DATABASE statement: https://www.postgresql.org/docs/current/sql-createdatabase.html

@ctron
Copy link
Contributor

ctron commented Oct 21, 2025

db.execute(Statement::from_string(
db.get_database_backend(),
format!("CREATE DATABASE \"{}\";", database.name),
))
.await?;
db.close().await?;

@jcrossley3
Copy link
Contributor

@jcrossley3 let me know if this works locally for you.

It does, thanks!

@mrizzi
Copy link
Contributor Author

mrizzi commented Oct 21, 2025

Just digging in that code anyway. What we use for tests is:

        db.execute(Statement::from_string(
            db.get_database_backend(),
            format!("CREATE DATABASE \"{}\";", database.name),
        ))
        .await?;
        db.close().await?;

And to my understanding you can set those parameters with the CREATE DATABASE statement: https://www.postgresql.org/docs/current/sql-createdatabase.html

db.execute(Statement::from_string(
db.get_database_backend(),
format!("CREATE DATABASE \"{}\";", database.name),
))
.await?;
db.close().await?;

Awesome, I didn't notice this bootstrap part...let me give it a try because it would look a cleaner solution to me.

@mrizzi
Copy link
Contributor Author

mrizzi commented Oct 21, 2025

Just digging in that code anyway. What we use for tests is:

        db.execute(Statement::from_string(
            db.get_database_backend(),
            format!("CREATE DATABASE \"{}\";", database.name),
        ))
        .await?;
        db.close().await?;

And to my understanding you can set those parameters with the CREATE DATABASE statement: https://www.postgresql.org/docs/current/sql-createdatabase.html

@ctron Sorry, I'm getting old because yesterday I noticed this but I stopped myself from touching it and I forgot about.
The bootstrap function is used also in

match trustify_db::Database::bootstrap(&self.database).await {

and Trustify can not force the LC_COLLATE that a user can legitimately defined for the DB they provide for running Trustify.
The alternative was to make LC_COLLATE condition in bootstrap but I evaluated it too risky for something that is meant to never be applied in prod and always be applied exclusively to tests so I preferred the env::set_var just for embedded DB.

Copy link
Contributor

@jcrossley3 jcrossley3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll approve and defer to you and @ctron to agree to merge.

@jcrossley3
Copy link
Contributor

jcrossley3 commented Oct 21, 2025

Just an FYI, but in addition to setting collation during db create, we could also do it at table creation or even at query time, e.g.

    -- Create a table with a column-level collation
    CREATE TABLE products (
        product_id INT PRIMARY KEY,
        product_name VARCHAR(100) COLLATE "fr_FR.utf8"
    );

    -- Use an expression-level collation in a query
    SELECT name FROM employees ORDER BY name COLLATE "C";

I'm not sure which is correct, tbh. But if we want to leave it up to the user, I'd say our test is wrong. I'd vote to use test fixtures that are predictable regardless of collation config, i.e. use only numbers or letters.

@ctron
Copy link
Contributor

ctron commented Oct 23, 2025

That code (bootstrap) should never be used in production. As a product (DB) user should not have the permissions to drop a database.

If you're concerned with it, you could create an additional argument and pass it along to enable this behavior.

I'd find this safe than setting a (process) global env-var.

@mrizzi
Copy link
Contributor Author

mrizzi commented Oct 24, 2025

That code (bootstrap) should never be used in production. As a product (DB) user should not have the permissions to drop a database.

If you're concerned with it, you could create an additional argument and pass it along to enable this behavior.

I'd find this safe than setting a (process) global env-var.

"Never used" sounds great to me 🤩
I've switched the approach to the CREATE DATABASE statement.

@jcrossley3
Copy link
Contributor

I confirmed the fix locally, but can you squash the 3 redundantly-named commits before merging, please?

@mrizzi mrizzi requested a review from ctron October 24, 2025 16:30
@jcrossley3 jcrossley3 added this pull request to the merge queue Oct 30, 2025
Merged via the queue into guacsec:main with commit 390833b Oct 30, 2025
7 checks passed
@mrizzi mrizzi deleted the test-fix-locale branch October 30, 2025 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants