Skip to content

Conversation

@dieriba
Copy link
Contributor

@dieriba dieriba commented Nov 12, 2025

No description provided.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Nov 12, 2025

Deploying windmill with  Cloudflare Pages  Cloudflare Pages

Latest commit: e74e04b
Status: ✅  Deploy successful!
Preview URL: https://6884fea8.windmill.pages.dev
Branch Preview URL: https://dieri-triggers-queue-mode.windmill.pages.dev

View logs

@dieriba dieriba marked this pull request as ready for review November 14, 2025 23:27
@dieriba dieriba requested a review from rubenfiszel as a code owner November 14, 2025 23:27
@claude
Copy link
Contributor

claude bot commented Nov 14, 2025

Claude finished @dieriba's task —— View job


Code Review Complete

Reviewing PR #7116: "feat: triggers queue mode"

Tasks

  • Review database migration files
  • Review backend code changes in jobs.rs
  • Review trigger handler changes
  • Check SQLx query changes for best practices
  • Review OpenAPI spec updates
  • Check for security implications
  • Assess performance considerations
  • Provide inline comments for specific issues
  • Write comprehensive summary
  • Add testing instructions

Summary

This PR implements a "queue mode" feature for triggers, allowing trigger events to be queued for manual processing rather than executing immediately. The implementation adds an active_mode boolean field to all trigger tables.

Architecture Overview

Core Mechanism:

  • When active_mode = true (default): Triggers execute immediately (existing behavior)
  • When active_mode = false: Jobs are scheduled for 9999-12-31 23:59:59, effectively suspending them until manually resumed

Key Components:

  1. Database migration adding active_mode column to all trigger tables
  2. New API endpoints for bulk resume/cancel of suspended trigger jobs
  3. Integration into trigger execution flow via trigger_runnable_inner()
  4. Frontend support across all trigger types

Code Quality Assessment

✅ Strengths

  1. Backward Compatibility: Migration uses DEFAULT TRUE, preserving existing behavior
  2. Comprehensive Coverage: Feature implemented across all trigger types (HTTP, WebSocket, Kafka, NATS, MQTT, Postgres, SQS, GCP, Email)
  3. Security: Bulk operations properly require admin privileges
  4. Consistent Implementation: Uses the trait-based trigger system effectively

⚠️ Issues & Concerns

1. Dead Code - process_queued_jobs field (Minor)

  • Defined in BaseTriggerData but completely unused
  • Should either be implemented, removed, or documented as reserved for future use
  • See inline comment on backend/windmill-api/src/triggers/mod.rs:129

2. Magic Date Anti-Pattern (Moderate)

  • Using 9999-12-31 23:59:59 as a "suspended" marker is problematic:
    • Not self-documenting - requires understanding the magic constant
    • Could cause issues with date arithmetic, analytics, or database constraints
    • Makes queries less intuitive (need to check against magic date)
  • Better approach: Add explicit suspended boolean to v2_job_queue table
  • See inline comment on backend/windmill-api/src/triggers/trigger_helpers.rs:535

3. Index Efficiency (Minor)

  • New composite indexes (workspace_id, path, active_mode) may be over-indexed
  • If queries rarely filter by active_mode, these duplicate existing unique constraints
  • Consider (workspace_id, active_mode) if the goal is finding all inactive triggers
  • See inline comment on migration file

4. Confusing Naming (Minor)

  • Checking !active_mode is a double negative that hurts readability
  • Consider queue_mode or is_queued for clearer intent
  • OpenAPI description has typo: "incomming" → "incoming"
  • See inline comment on backend/windmill-api/src/triggers/http/handler.rs:1044

5. Missing Documentation (Minor)

  • No docstrings explaining the queue mode concept
  • Missing explanation of the relationship between active_mode and scheduled_for
  • No comments on when to use bulk operations vs individual job management

6. Potential Race Condition (Minor)

  • Bulk resume/cancel operations aren't transactional
  • Job state could change between query and update
  • Low risk but worth considering for production robustness

Security Assessment

✅ Security Positives

  • Admin-only access for bulk operations (require_admin)
  • No SQL injection risks (properly parameterized queries)
  • Maintains existing authorization patterns

⚠️ Security Considerations

  • No audit logging for bulk operations (resume/cancel)
  • Consider adding audit trail for compliance/debugging

Performance Considerations

Potential Issues

  1. Index Bloat: 9 new composite indexes across trigger tables may impact:

    • Write performance during trigger create/update
    • Storage overhead
    • Vacuum/maintenance operations
  2. Query Performance: Jobs with far-future dates might:

    • Skew query optimizer statistics
    • Complicate index range scans
    • Affect queue polling queries
  3. Bulk Operations: No pagination on resume/cancel operations

    • Could timeout with thousands of queued jobs
    • Consider adding batching or limits

Recommendations

  • Monitor index usage post-deployment
  • Add query performance tests for queue polling with suspended jobs
  • Consider pagination for bulk operations

SQLx Query Review

✅ Good Practices Observed

  • Explicit column selection (avoiding SELECT *)
  • Proper parameterization
  • Type-safe queries via sqlx macros

Issues Found

  • Most query changes are just adding active_mode to existing queries (good)
  • No performance-critical queries appear to be negatively impacted

Missing Elements

  1. Tests: No test files included in the PR

    • Need unit tests for queue mode logic
    • Integration tests for bulk operations
    • Edge case tests (switching between modes, race conditions)
  2. Documentation:

    • No README or doc updates explaining the feature
    • No migration guide for users
    • No API documentation for new endpoints
  3. Observability:

    • No metrics for suspended job counts
    • No logging of mode transitions
    • Consider adding metrics for monitoring

Recommendations

High Priority

  1. Remove or implement process_queued_jobs - Dead code
  2. Add tests - Critical for reliability
  3. Add audit logging - For bulk operations
  4. Fix typo - "incomming" in OpenAPI spec

Medium Priority

  1. Consider replacing magic date - Use explicit suspended field
  2. Add documentation - Feature explanation and API docs
  3. Review index strategy - Ensure they provide value
  4. Add pagination - For bulk operations

Low Priority

  1. Improve naming - queue_mode vs !active_mode
  2. Add docstrings - Explain the feature
  3. Add observability - Metrics and logging

Testing Instructions

To verify the queue mode feature through the app interface:

Navigate to any trigger configuration page (HTTP Routes, WebSocket, Kafka, NATS, MQTT, Postgres, SQS, GCP Pub/Sub, or Email triggers). When creating or editing a trigger, locate the "Active Mode" toggle or checkbox. With Active Mode disabled (set to false), create the trigger and send test events to it. Instead of executing immediately, these events should appear as suspended jobs in the queue with a scheduled date far in the future (9999-12-31). Access the runs or queue view to see these suspended jobs listed. As an admin user, use the bulk operations feature to either resume all suspended jobs for that trigger (which should execute them immediately) or cancel them in bulk. Verify that when Active Mode is enabled (true), events execute immediately as normal. Test switching a trigger between active and queue modes to ensure the behavior changes correctly. Check that non-admin users cannot access the bulk resume/cancel operations.


Overall Assessment

Verdict: ✅ Approve with recommendations

The implementation is solid and follows established patterns. The main concerns are:

  • Unused process_queued_jobs field (cleanup needed)
  • Magic date pattern (architectural concern, not blocking)
  • Missing tests (should be addressed before merge)
  • Lack of documentation (important for users)

The feature provides useful functionality for managing trigger load and debugging. With the addition of tests and documentation, this would be production-ready. The magic date approach works but consider refactoring to a more explicit design in a future iteration.


pub is_flow: bool,
pub enabled: Option<bool>,
pub active_mode: Option<bool>,
pub process_queued_jobs: Option<bool>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused field: The process_queued_jobs field is defined in BaseTriggerData but appears to be unused throughout the codebase. This could be:

  1. Dead code that should be removed
  2. An incomplete feature implementation
  3. Reserved for future use

Recommendation: Either implement the functionality for this field or remove it to avoid confusion. If it's intended for future use, add a comment explaining its purpose.

ALTER TABLE email_trigger
ADD COLUMN active_mode BOOLEAN NOT NULL DEFAULT TRUE;

CREATE INDEX idx_gcp_trigger_active_mode ON gcp_trigger(workspace_id,path,active_mode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration Safety - Good Practice: The migration uses NOT NULL DEFAULT TRUE which is excellent for backward compatibility. Existing triggers will automatically be in active mode, preserving current behavior.

Index Design Consideration: The composite indexes include workspace_id, path, active_mode. Consider whether queries will filter by active_mode frequently enough to justify these indexes. If most queries are just by (workspace_id, path), these indexes might be redundant with the existing primary key/unique constraints.

Suggestion: Review query patterns to ensure these indexes provide value. If active_mode filtering is rare, simpler indexes on just (workspace_id, active_mode) might be more appropriate for finding all inactive triggers.

};
use windmill_common::{error, jobs::JobTriggerKind, utils::require_admin};

pub async fn resume_suspended_trigger_jobs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security & Authorization - Good: The function correctly requires admin privileges before allowing bulk operations on suspended trigger jobs.

Potential Race Condition: Between checking for suspended jobs and updating them, the job state could change. Consider using a transaction or adding additional safeguards.

Documentation Needed: Add docstring comments explaining:

  • What "suspended trigger jobs" means in this context
  • The relationship between active_mode = false and scheduled_for = 9999-12-31
  • When this endpoint should be used vs individual job management

});

let user_db = user_db.unwrap_or_else(|| UserDB::new(db.clone()));
let scheduled_for = active_mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical Logic: This is the core implementation of queue mode. When active_mode is false, jobs are scheduled for the far future date (9999-12-31 23:59:59), effectively "suspending" them until manually processed.

Potential Issues:

  1. Year 9999 Problem: While unlikely to cause issues in practice, this magic date could cause problems with:

    • Time arithmetic/duration calculations
    • Reporting and analytics queries
    • Database date range constraints
  2. Better Alternative: Consider using a dedicated suspended boolean field on v2_job_queue instead of overloading scheduled_for. This would be more explicit and avoid potential edge cases.

  3. Missing Null Check: The code uses active_mode.filter(|active_mode| !*active_mode) which only sets scheduled_for when active_mode is Some(false). Ensure this aligns with the expected behavior when active_mode is None.

.map_err(|e| e.into_response())?;

let trigger_info = TriggerMetadata::new(Some(trigger.path.clone()), JobTriggerKind::Http);
if !trigger.active_mode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic Inversion: This condition appears inverted. The comment says "If set to false, each incoming event will be suspend job until ran manually", but the code checks if !trigger.active_mode, which means it triggers when active_mode is FALSE.

Clarification Needed:

  • Is this intentional double-negative logic?
  • Should the variable be named differently to avoid confusion (e.g., queue_mode instead of !active_mode)?
  • The OpenAPI description at line ~240 of openapi.yaml says "If set to false, each incomming event will be suspend job" which aligns with this logic, but it's confusing.

Recommendation: Consider renaming to make the intent clearer, or add a comment explaining why we check for !active_mode.

active_mode:
type: boolean
default: true
description: If set to false, each incomming event will be suspend job until ran manually or set it to true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typographical error found: incomming should be incoming. Additionally, consider rephrasing will be suspend job until ran manually or set it to true to improve clarity (e.g., will suspend the job until run manually or reset to true).

Suggested change
description: If set to false, each incomming event will be suspend job until ran manually or set it to true
description: If set to false, each incoming event will suspend the job until run manually or reset to true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants