Skip to content

Conversation

@alphaprinz
Copy link
Contributor

@alphaprinz alphaprinz commented Oct 7, 2025

Describe the Problem

Use new 'read only' pool (connected to cnpg replicated read-only cluster) in order to alleviate load from main, writable pg cluster.

Explain the Changes

  1. Move some background queries to read-only pool:
    -db cleancer
    -object reclaimer's first find
    -scrubber's first find

Issues: Fixed #xxx / Gap #xxx

Testing Instructions:

  • Doc added/updated
  • Tests added

Summary by CodeRabbit

  • New Features

    • Extended executeSQL method to accept an optional pool preference parameter, enabling flexible routing of queries to specific database pools for improved resource optimization.
  • Improvements

    • Enhanced routing of read operations to utilize read-only pool replicas, improving query distribution and database performance efficiency.

@coderabbitai
Copy link

coderabbitai bot commented Oct 7, 2025

Walkthrough

The changes introduce preferred pool selection for SQL query execution, allowing read operations to route to read-only replicas. Three files were modified: the TypeScript interface adds a preferred_pool option to executeSQL, the MDStore class applies read-only pool hints to read queries, and PostgresTable implements pool routing logic with fallback handling.

Changes

Cohort / File(s) Summary
TypeScript Interface Update
src/sdk/nb.d.ts
Added optional preferred_pool?: string field to the options object parameter in DBCollection.executeSQL method signature.
MDStore Read Operations
src/server/object_services/md_store.js
Applied preferred_pool: 'read_only' to multiple read query paths including find_unreclaimed_objects, iterate_all_chunks, has_any_blocks_for_chunk, has_any_parts_for_chunk, find_deleted_chunks, and similar read-query surfaces.
PostgresTable Pool Routing
src/util/postgres_client.js
Implemented per-query pool selection via options.preferred_pool in executeSQL; added fallback logic in get_pool to retry with default pool if requested pool is unavailable; updated find and findOne to utilize pool selection; added JSDoc documentation for preferred_pool parameter; added debug logging in _do_query.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant MDStore
    participant PostgresTable
    participant PoolMgr as Pool Manager

    Caller->>MDStore: Read operation (e.g., find_unreclaimed_objects)
    MDStore->>PostgresTable: executeSQL(query, params, {preferred_pool: 'read_only'})
    PostgresTable->>PoolMgr: get_pool('read_only')
    
    alt Pool found
        PoolMgr-->>PostgresTable: read_only pool
    else Pool not found
        PoolMgr->>PoolMgr: Fallback to default pool
        PoolMgr-->>PostgresTable: default pool
    end
    
    PostgresTable->>PostgresTable: _do_query(pool, query, params)
    PostgresTable-->>MDStore: query result
    MDStore-->>Caller: operation result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

The changes involve consistent pattern application across multiple files with new fallback logic in pool selection, but follow a straightforward parameter-passing mechanism without complex interdependencies.

Possibly related PRs

  • noobaa/noobaa-core#9197: Directly related modification to src/util/postgres_client.js and SQL call sites to implement read-only pool and per-query pool selection mechanism.

Suggested reviewers

  • dannyzaken

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title Check ❓ Inconclusive The title "Read only uses" is related to the changeset's core concept of implementing read-only pool support, as evidenced by the public API changes to executeSQL and the routing of background queries to a read-only pool. However, the title is vague and uses non-descriptive phrasing that obscures the main change. The term "uses" is ambiguous and doesn't clearly communicate whether the PR is about adding pool selection capability, implementing pool routing, or something else. A teammate scanning the commit history would have difficulty understanding the specific nature of this implementation from the title alone. Consider revising the title to be more descriptive and concrete. Examples of clearer titles would be "Add preferred_pool option to SQL execution" or "Route background queries to read-only pool" or "Implement read-only pool support for queries", which would immediately communicate the primary change to reviewers and future developers examining the repository history.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@alphaprinz alphaprinz marked this pull request as ready for review October 17, 2025 22:47
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/util/postgres_client.js (1)

632-642: Review the fallback behavior in get_pool.

The fallback logic on lines 636-637 silently falls back to the default pool when a requested pool key is invalid:

if (key && key !== this.pool_key) {
    return this.get_pool();
}

Concerns:

  1. Silent fallback masks configuration issues: If code requests a non-existent pool (e.g., typo like 'read_onyl'), it silently uses the default pool instead of failing fast.
  2. Misleading error message: After the fallback attempt, line 639 throws an error mentioning the requested key, but the fallback might have already succeeded with this.pool_key.

Recommendation:
Consider removing the fallback or adding a warning log when falling back, so configuration issues are visible:

 get_pool(key = this.pool_key) {
     const pool = this.client.get_pool(key);
     if (!pool) {
-        //if original get_pool was no for the default this.pool_key, try also this.pool_key
+        // If original get_pool was not for the default this.pool_key, try also this.pool_key
         if (key && key !== this.pool_key) {
+            dbg.warn(`Pool ${key} not found, falling back to default pool ${this.pool_key}`);
             return this.get_pool();
         }
         throw new Error(`The postgres clients pool ${key} disconnected`);
     }
     return pool;
 }

Alternatively, remove the fallback entirely to fail fast on invalid pool names.

src/server/object_services/md_store.js (1)

814-835: Route mapReduce operations to read-only pool for consistent replica distribution.

The list_objects and list_object_versions methods use mapReduce for queries with delimiters (lines 814-824 and 864-874), but these calls don't pass preferred_pool to route to the read-only pool. While the find() operations support routing via preferred_pool: 'read_only' (as seen elsewhere in md_store.js), the mapReduce calls should be consistent.

To fix:

  1. Update mapReduceListObjects() and mapReduceAggregate() in postgres_client.js to extract options.preferred_pool and pass it to this.single_query() (currently they call this.single_query(mr_q) without the pool parameter)
  2. Pass preferred_pool: 'read_only' in the options object when calling this._objects.mapReduce() at lines 814-824 and 864-874 in md_store.js
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2367740 and d699655.

📒 Files selected for processing (4)
  • src/sdk/nb.d.ts (1 hunks)
  • src/server/bg_services/db_cleaner.js (1 hunks)
  • src/server/object_services/md_store.js (5 hunks)
  • src/util/postgres_client.js (5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-08T13:12:46.728Z
Learnt from: naveenpaul1
PR: noobaa/noobaa-core#9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.

Applied to files:

  • src/server/object_services/md_store.js
🧬 Code graph analysis (1)
src/server/bg_services/db_cleaner.js (1)
src/server/object_services/md_store.js (3)
  • dbg (14-14)
  • P (13-13)
  • config (28-28)
🔇 Additional comments (10)
src/server/object_services/md_store.js (6)

779-779: LGTM! Read-only pool routing for unreclaimed objects.

Correctly routes the find operation to the read-only pool, which aligns with the PR objective of offloading background read queries from the main cluster.


1127-1127: LGTM! Read-only pool routing for deleted objects query.

The SQL query correctly uses the read-only pool via the options parameter, consistent with the pattern established in this PR.


1603-1603: LGTM! Read-only pool routing for chunk iteration.

Appropriately routes the chunk iteration query to the read-only pool for the scrubber's background operations.


1778-1779: LGTM! Read-only pool routing for deleted chunks.

Correctly adds preferred_pool: 'read_only' to the find options for deleted chunks cleanup query.


1787-1788: LGTM! Read-only pool routing for chunk existence checks.

Both has_any_blocks_for_chunk and has_any_parts_for_chunk correctly route to the read-only pool. These existence checks are read-only operations suitable for read replicas.

Also applies to: 1796-1797


2039-2040: LGTM! Read-only pool routing for deleted blocks.

Correctly routes the deleted blocks query to the read-only pool for background cleanup operations.

src/util/postgres_client.js (3)

252-252: Helpful debug logging for pool routing.

The added debug log that includes pg_client.options?.host will be useful for verifying which pool (and host) is being used for each query, aiding in troubleshooting read-only pool routing.


726-733: LGTM! Clean implementation of preferred_pool in executeSQL.

The JSDoc and implementation correctly support routing queries to a preferred pool. The pattern of passing options.preferred_pool to get_pool() is clean and extensible.


937-937: LGTM! Consistent preferred_pool support in find and findOne.

Both methods correctly pass options.preferred_pool to get_pool(), enabling read-only pool routing for these common query methods. The implementation is consistent with the executeSQL pattern.

Also applies to: 954-954

src/sdk/nb.d.ts (1)

769-769: LGTM! TypeScript interface correctly updated.

The addition of preferred_pool?: string to the executeSQL options parameter aligns with the implementation in postgres_client.js and properly documents the new functionality for TypeScript consumers.

Signed-off-by: Amit Prinz Setter <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/util/postgres_client.js (1)

635-638: Add warning log when falling back to default pool.

The silent fallback could mask misconfigurations. If the read_only pool is expected but unavailable, queries will silently route to the default pool, defeating the load distribution purpose.

Apply this diff to add visibility:

         //if original get_pool was not for the default this.pool_key, try also this.pool_key
         if (key && key !== this.pool_key) {
+            dbg.warn(`Pool ${key} not available for table ${this.name}, falling back to default pool ${this.pool_key}`);
             return this.get_pool();
         }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d699655 and af555fb.

📒 Files selected for processing (1)
  • src/util/postgres_client.js (5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (4)
src/util/postgres_client.js (4)

251-252: LGTM! Helpful debug logging for pool routing verification.

The host logging will be useful for confirming that queries are being routed to the correct pool (read-only vs. default).


954-954: LGTM! Consistent with find implementation.

The preferred_pool support in findOne is consistent with the find implementation. Same eventual consistency considerations apply.


937-937: LGTM. Routing find queries to read replicas is appropriate for background maintenance operations.

The callers using preferred_pool: 'read_only' are background operations (find_unreclaimed_objects, find_deleted_objects, chunk/block existence checks) where eventual consistency is acceptable. These align with the stated use cases (object reclaimer, scrubber) and do not impact critical read paths.


726-733: Review comment is incorrect—executeSQL is designed for standalone queries and not used within transactions.

The code shows that executeSQL and PgTransaction have separate execution paths:

  • PgTransaction (lines 335-378) uses its own query() method with a dedicated pg_client connection for transactional queries
  • executeSQL (line 731) gets its own pool connection via pool.connect() for standalone queries
  • All executeSQL call sites in md_store.js are standalone queries, never invoked within transaction contexts
  • The preferred_pool option is appropriate for directing standalone queries to specific pools (e.g., read-only)

There is no mixing of transaction and standalone query execution paths, and no risk of routing queries to incorrect pools during transactions.

Likely an incorrect or invalid review comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant