-
-
Notifications
You must be signed in to change notification settings - Fork 148
chore: simplify alert creation and evaluation #1388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: simplify alert creation and evaluation #1388
Conversation
removed stream name, condition config, aggregate config from alert request add query as parameter rename aggregateConfig to thresholdConfig new payload ``` { "severity": "high", "title": "error count", "alertType": "threshold", "query": "SELECT COUNT(`body`) AS `COUNT_body` FROM `test1` WHERE `severity_text` = 'ERROR' LIMIT 500", "thresholdConfig": { "operator": ">", "value": 1000 }, "evalConfig": { "rollingWindow": { "evalStart": "5h", "evalEnd": "now", "evalFrequency": 1 } }, "targets": [ "01K0XDNX76DXGHQ1T32XG63K27" ] } ```
WalkthroughThis update refactors the alert system from an aggregate-based model to a query-and-threshold-based model, introduces migration from v1 to v2 alert configurations, and streamlines alert evaluation. It also adds robust querier selection and dispatch in the cluster handler, updates alert creation scripts, and adjusts validation and authorization logic. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant HTTPHandler
participant Alerts
participant Storage
participant Querier
User->>HTTPHandler: Create/Update Alert (v1 or v2)
HTTPHandler->>Alerts: Validate and Load Alert
Alerts->>Storage: Read alert JSON
alt v1 alert
Alerts->>Alerts: Migrate v1 to v2 (build query, threshold)
Alerts->>Storage: Save migrated v2 alert
end
Alerts-->>HTTPHandler: AlertConfig (v2)
HTTPHandler-->>User: Response
User->>HTTPHandler: Trigger Alert Evaluation
HTTPHandler->>Alerts: evaluate_alert(alert)
Alerts->>Querier: Execute alert.query (local/remote)
Querier-->>Alerts: Numeric result
Alerts->>Alerts: Compare result to threshold
Alerts->>Storage: Update alert state
Alerts-->>HTTPHandler: Evaluation outcome
HTTPHandler-->>User: Response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🔭 Outside diff range comments (1)
src/alerts/alerts_utils.rs (1)
206-234
: Consider returning Result instead of defaulting to 0.0.The function silently returns 0.0 when it can't extract a value, which could mask errors in alert evaluation. Also, it only handles Float64 and Int64 arrays.
-fn get_final_value(records: Vec<RecordBatch>) -> f64 { +fn get_final_value(records: Vec<RecordBatch>) -> Result<f64, AlertError> { trace!("records-\n{records:?}"); - if let Some(f) = records + // Check if we have any records + let first_batch = records .first() - .and_then(|batch| { - trace!("batch.column(0)-\n{:?}", batch.column(0)); - batch.column(0).as_any().downcast_ref::<Float64Array>() - }) - .map(|array| { - trace!("array-\n{array:?}"); - array.value(0) - }) - { - f - } else { - records - .first() - .and_then(|batch| { - trace!("batch.column(0)-\n{:?}", batch.column(0)); - batch.column(0).as_any().downcast_ref::<Int64Array>() - }) - .map(|array| { - trace!("array-\n{array:?}"); - array.value(0) - }) - .unwrap_or_default() as f64 - } + .ok_or_else(|| AlertError::CustomError("No records returned from query".to_string()))?; + + if batch.num_rows() == 0 { + return Err(AlertError::CustomError("Query returned empty batch".to_string())); + } + + let column = batch.column(0); + trace!("batch.column(0)-\n{:?}", column); + + // Try different numeric array types + if let Some(array) = column.as_any().downcast_ref::<Float64Array>() { + trace!("array-\n{array:?}"); + Ok(array.value(0)) + } else if let Some(array) = column.as_any().downcast_ref::<Int64Array>() { + trace!("array-\n{array:?}"); + Ok(array.value(0) as f64) + } else { + Err(AlertError::CustomError(format!( + "Unsupported array type for alert value: {:?}", + column.data_type() + ))) + } }Also update the calling code:
- Ok(get_final_value(records)) + get_final_value(records)
🧹 Nitpick comments (6)
src/handlers/http/cluster/mod.rs (1)
70-74
: Consider memory management for the querier map.The global
QUERIER_MAP
could grow unbounded if queriers are frequently added and removed from the cluster. Consider implementing a cleanup mechanism or size limit to prevent potential memory issues in long-running deployments.Also applies to: 1116-1122
src/alerts/alerts_utils.rs (3)
68-77
: Consider simplifying the pattern matching.Since
EvalConfig
currently only has one variant, the pattern matching could be simplified.-fn extract_time_range(eval_config: &super::EvalConfig) -> Result<TimeRange, AlertError> { - let (start_time, end_time) = match eval_config { - super::EvalConfig::RollingWindow(rolling_window) => (&rolling_window.eval_start, "now"), - }; - - TimeRange::parse_human_time(start_time, end_time) - .map_err(|err| AlertError::CustomError(err.to_string())) -} +fn extract_time_range(eval_config: &super::EvalConfig) -> Result<TimeRange, AlertError> { + match eval_config { + super::EvalConfig::RollingWindow(rolling_window) => { + TimeRange::parse_human_time(&rolling_window.eval_start, "now") + .map_err(|err| AlertError::CustomError(err.to_string())) + } + } +}
93-127
: Well-structured local query execution with proper error handling.The function correctly handles stream creation, query planning, and execution. Good practice adding context to error messages.
Consider adding a comment explaining why
Either::Right
represents an error case:let records = match records { Either::Left(rbs) => rbs, Either::Right(_) => { + // Right variant indicates streaming response which is not expected for aggregate queries return Err(AlertError::CustomError( "Query returned no results".to_string(), )); }
151-164
: Consider improving the error message for better debugging.The function handles all numeric JSON types correctly.
-fn convert_result_to_f64(result_value: serde_json::Value) -> Result<f64, AlertError> { - if let Some(value) = result_value.as_f64() { - Ok(value) - } else if let Some(value) = result_value.as_i64() { - Ok(value as f64) - } else if let Some(value) = result_value.as_u64() { - Ok(value as f64) - } else { - Err(AlertError::CustomError( - "Query result is not a number".to_string(), - )) - } -} +fn convert_result_to_f64(result_value: serde_json::Value) -> Result<f64, AlertError> { + if let Some(value) = result_value.as_f64() { + Ok(value) + } else if let Some(value) = result_value.as_i64() { + Ok(value as f64) + } else if let Some(value) = result_value.as_u64() { + Ok(value as f64) + } else { + Err(AlertError::CustomError( + format!("Query result is not a number, got: {:?}", result_value), + )) + } +}src/alerts/mod.rs (2)
676-699
: Well-implemented SQL query construction.Good handling of aggregate functions and proper column quoting.
Consider validating the stream name to prevent SQL injection:
fn build_base_query( aggregate_function: &AggregateFunction, aggregate_config: &JsonValue, stream: &str, ) -> Result<String, AlertError> { + // Validate stream name contains only valid characters + if !stream.chars().all(|c| c.is_alphanumeric() || c == '_' || c == '-') { + return Err(AlertError::CustomError(format!("Invalid stream name: {}", stream))); + } + let column = aggregate_config["column"].as_str().unwrap_or("*");
925-961
: Smart recursive analysis of query structure.The implementation correctly identifies aggregate queries that return single values.
Consider adding support for queries with LIMIT 1 that might return a single value:
fn is_logical_plan_aggregate(plan: &LogicalPlan) -> bool { match plan { // Direct aggregate: SELECT COUNT(*), AVG(col), etc. LogicalPlan::Aggregate(_) => true, // Projection over aggregate: SELECT COUNT(*) as total, SELECT AVG(col) as average LogicalPlan::Projection(Projection { input, expr, .. }) => { // Check if input contains an aggregate and we have exactly one expression let is_aggregate_input = Self::is_logical_plan_aggregate(input); let single_expr = expr.len() == 1; is_aggregate_input && single_expr } + // Limit over aggregate: SELECT COUNT(*) FROM table LIMIT 1 + LogicalPlan::Limit(limit) => { + limit.fetch.map_or(false, |f| f == 1) && Self::is_logical_plan_aggregate(&limit.input) + } + // Recursively check wrapped plans (Filter, Limit, Sort, etc.) _ => { // Use inputs() method to get all input plans plan.inputs() .iter() .any(|input| Self::is_logical_plan_aggregate(input)) } } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (7)
Cargo.toml
(1 hunks)resources/ingest_demo_data.sh
(3 hunks)src/alerts/alerts_utils.rs
(5 hunks)src/alerts/mod.rs
(15 hunks)src/handlers/http/alerts.rs
(4 hunks)src/handlers/http/cluster/mod.rs
(4 hunks)src/handlers/http/query.rs
(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
src/handlers/http/query.rs (2)
Learnt from: parmesant
PR: #1347
File: src/handlers/http/query.rs:0-0
Timestamp: 2025-06-18T12:44:31.983Z
Learning: The counts API in src/handlers/http/query.rs does not currently support group_by functionality in COUNT queries, so the hard-coded fields array ["start_time", "end_time", "count"] is appropriate for the current scope.
Learnt from: nikhilsinhaparseable
PR: #1340
File: src/query/mod.rs:64-66
Timestamp: 2025-06-18T06:39:04.775Z
Learning: In src/query/mod.rs, QUERY_SESSION_STATE and QUERY_SESSION serve different architectural purposes: QUERY_SESSION_STATE is used for stats calculation and allows dynamic registration of individual parquet files from the staging path (files created every minute), while QUERY_SESSION is used for object store queries with the global schema provider. Session contexts with schema providers don't support registering individual tables/parquets, so both session objects are necessary for their respective use cases.
src/handlers/http/alerts.rs (3)
Learnt from: nikhilsinhaparseable
PR: #1305
File: src/handlers/http/users/dashboards.rs:0-0
Timestamp: 2025-05-01T10:27:56.858Z
Learning: The add_tile()
function in src/handlers/http/users/dashboards.rs
should use get_dashboard_by_user(dashboard_id, &user_id)
instead of get_dashboard(dashboard_id)
to ensure proper authorization checks when modifying a dashboard.
Learnt from: nikhilsinhaparseable
PR: #1340
File: src/query/mod.rs:64-66
Timestamp: 2025-06-18T06:39:04.775Z
Learning: In src/query/mod.rs, QUERY_SESSION_STATE and QUERY_SESSION serve different architectural purposes: QUERY_SESSION_STATE is used for stats calculation and allows dynamic registration of individual parquet files from the staging path (files created every minute), while QUERY_SESSION is used for object store queries with the global schema provider. Session contexts with schema providers don't support registering individual tables/parquets, so both session objects are necessary for their respective use cases.
Learnt from: de-sh
PR: #1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)
ensures proper error handling in both modes.
src/handlers/http/cluster/mod.rs (4)
Learnt from: nikhilsinhaparseable
PR: #1340
File: src/query/mod.rs:64-66
Timestamp: 2025-06-18T06:39:04.775Z
Learning: In src/query/mod.rs, QUERY_SESSION_STATE and QUERY_SESSION serve different architectural purposes: QUERY_SESSION_STATE is used for stats calculation and allows dynamic registration of individual parquet files from the staging path (files created every minute), while QUERY_SESSION is used for object store queries with the global schema provider. Session contexts with schema providers don't support registering individual tables/parquets, so both session objects are necessary for their respective use cases.
Learnt from: nikhilsinhaparseable
PR: #1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.
Learnt from: de-sh
PR: #1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)
ensures proper error handling in both modes.
Learnt from: parmesant
PR: #1347
File: src/handlers/http/query.rs:0-0
Timestamp: 2025-06-18T12:44:31.983Z
Learning: The counts API in src/handlers/http/query.rs does not currently support group_by functionality in COUNT queries, so the hard-coded fields array ["start_time", "end_time", "count"] is appropriate for the current scope.
src/alerts/mod.rs (1)
Learnt from: de-sh
PR: #1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)
ensures proper error handling in both modes.
🧬 Code Graph Analysis (1)
src/handlers/http/alerts.rs (1)
src/utils/mod.rs (1)
user_auth_for_query
(79-88)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: coverage
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
🔇 Additional comments (17)
src/handlers/http/query.rs (1)
59-59
: LGTM!Making
TIME_ELAPSED_HEADER
public is appropriate as it's now used by the cluster handler module for query response timing extraction.src/handlers/http/alerts.rs (2)
57-57
: LGTM!The streamlined validation approach using
alert.validate(session_key)
properly encapsulates the validation logic within the alert structure.
82-82
: Consistent authorization approach.The direct use of
user_auth_for_query
withalert.query
across GET, DELETE, and PUT handlers provides a clean and consistent authorization pattern that aligns well with the query-based alert model.Also applies to: 96-96, 129-129
src/handlers/http/cluster/mod.rs (1)
1123-1208
: Well-designed querier selection with proper concurrency control.The implementation excellently handles:
- Concurrent liveness checks with semaphore-based rate limiting
- Efficient map updates for live/dead queriers
- Smart fallback from round-robin to LRU strategy
This design should scale well for typical cluster sizes.
src/alerts/alerts_utils.rs (8)
21-39
: LGTM! Clean imports aligned with the new query-based alert system.The imports are well-organized and properly support the refactored alert evaluation logic, including the new cluster query dispatch functionality.
54-66
: LGTM! Clean refactor to query-based evaluation.The simplified evaluation flow clearly separates concerns: time range extraction, query execution, threshold evaluation, and state update. Good use of the
?
operator for error propagation.
129-150
: LGTM! Clean implementation of remote query execution.The Query request is properly configured for alert evaluation (non-streaming, aggregate results). Good separation of concerns with the JSON-to-f64 conversion delegated to a helper function.
166-175
: LGTM! Complete and correct threshold evaluation.All comparison operators are properly implemented.
177-204
: Excellent state management with informative alert messages.The function correctly handles state transitions and generates detailed messages that include all relevant context (query, threshold, actual value, evaluation window). The optimization to avoid unnecessary state updates is a nice touch.
236-328
: Well-implemented filter string generation with proper SQL escaping.The function correctly handles all WHERE operators with appropriate escaping for LIKE patterns. Good validation for NULL operators and clear error messages.
330-365
: LGTM! Clean value type handling.The
ValueType
enum and its trait implementations provide good type safety for filter values.
78-92
: Incomplete mode coverage in execute_alert_queryThe
execute_alert_query
function currently handles onlyMode::All
,Mode::Query
, andMode::Prism
, but theMode
enum (src/option.rs
) also defines two additional variants:
Mode::Ingest
Mode::Index
Please confirm whether alerts should also be evaluated under these modes (for example, by routing them through
execute_local_query
or another handler) or if returning aCustomError
is the intended behavior for them.src/alerts/mod.rs (5)
19-64
: LGTM! Appropriate imports and version update.The imports properly support the new query-based system and migration logic. Boxing
AlertConfig
inAlertTask::Create
is a good choice for handling async migration.
572-603
: Excellent migration implementation from v1 to v2.The migration logic is comprehensive and handles all aspects of the v1 alert structure. Good practice to save the migrated alert immediately.
870-923
: Robust query validation with proper authorization checks.Excellent implementation that validates query syntax, user permissions, and ensures the query is suitable for alerting (returns single aggregate value).
1079-1170
: Excellent backward compatibility handling in alert loading.The function robustly handles version detection and migration with good error recovery. The logic to detect v1 alerts by checking for the presence of
stream
field or absence ofquery
field is thorough.
1044-1068
: LGTM! Appropriate error handling for invalid queries.The new
InvalidAlertQuery
error variant is properly integrated with the correct HTTP status code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/alerts/mod.rs (1)
96-104
: Consider explicit error handling for unknown versions.This matches a previous review comment. Defaulting unknown versions to V2 might hide corruption or format issues. Consider adding logging for unknown versions or returning a Result type.
🧹 Nitpick comments (1)
src/alerts/mod.rs (1)
1118-1208
: Robust alert loading with automatic migration.The loading logic properly handles the transition from v1 to v2 alerts:
- Safe version detection using raw JSON parsing
- Automatic migration of v1 alerts during load
- Proper fallback for alerts without version fields
- Good error handling with continuation on individual failures
The retry mechanism for failed AlertTask creation is a nice touch for resilience.
Consider implementing atomic migration to avoid potential inconsistencies if the process fails partway through:
// Before processing alerts, create a migration transaction let migration_batch = Vec::new(); // Collect all migrations first, then apply atomically for migration in migration_batch { store.put_alert(migration.id, &migration).await?; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
resources/ingest_demo_data.sh
(3 hunks)src/alerts/mod.rs
(15 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- resources/ingest_demo_data.sh
🧰 Additional context used
🧠 Learnings (1)
src/alerts/mod.rs (1)
Learnt from: de-sh
PR: #1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)
ensures proper error handling in both modes.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-unknown-linux-gnu
🔇 Additional comments (6)
src/alerts/mod.rs (6)
53-63
: LGTM! Helper structures are well-designed.The
BasicAlertFields
struct provides a clean way to extract common fields during migration, and updating the version constant to "v2" aligns with the migration objectives.
396-516
: Well-structured simplification of configuration.The new
ThresholdConfig
andLogicalOperator
structures are cleaner and more focused than the previous aggregate-based approach. The implementations are standard and follow Rust conventions properly.
520-569
: Structural changes align with v2 alert model.The replacement of
stream
withquery
andaggregates
withthreshold_config
correctly implements the shift from aggregate-based to query-based alerts. The conversion logic is sound.
909-999
: Excellent validation improvements using DataFusion.The new validation approach is much more robust:
- Proper query syntax validation using DataFusion's parser
- Authorization checks integrated with existing RBAC system
- Intelligent aggregate query detection through logical plan analysis
- Comprehensive error handling for various failure modes
The
is_logical_plan_aggregate
method correctly identifies aggregate queries by checking forAggregate
nodes andProjection
over aggregates.
1082-1083
: Appropriate error handling for query validation.Adding the
InvalidAlertQuery
error variant provides clear error reporting for the new query-based validation logic.
573-891
: Ensure proper validation and escaping of stream and column identifiersWhile the
value
strings in WHERE clauses are correctly escaped, thestream
andcolumn
inputs are interpolated directly into the SQL without any sanitization. This can allow injection if those fields contain unexpected characters. Please update the code to enforce safe identifier handling:
- In
build_query_from_v1
/build_base_query
(src/alerts/mod.rs):
- Validate that
stream
matches a strict pattern (e.g.^[A-Za-z0-9_]+$
) before quoting.- Similarly, validate
column
or fall back to*
if it doesn’t meet your naming rules.- Escape any internal double quotes in identifiers, or better yet, leverage a SQL‐builder or identifier‐quoting utility instead of
format!
.- Add unit tests to cover edge cases where
stream
orcolumn
contains malicious payloads.⛔ Skipped due to learnings
Learnt from: de-sh PR: parseablehq/parseable#1236 File: src/prism/logstream/mod.rs:332-332 Timestamp: 2025-03-13T11:39:52.587Z Learning: SQL injection concerns can be ignored in this codebase as all SQL queries are run against immutable data streams, limiting the potential impact of any injection.
Learnt from: de-sh PR: parseablehq/parseable#1185 File: src/handlers/http/logstream.rs:255-261 Timestamp: 2025-02-14T09:49:25.818Z Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/handlers/http/cluster/mod.rs (1)
1123-1208
: Robust querier selection with good concurrency control.The implementation properly handles:
- Concurrency limits for liveness checks to prevent resource exhaustion
- Dynamic querier map updates based on liveness
- Fallback selection strategies (round-robin → LRU)
- Proper cleanup of dead queriers
Consider caching liveness results with a TTL for frequently called operations to reduce the overhead of checking every querier on each request.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
Cargo.toml
(1 hunks)resources/ingest_demo_data.sh
(3 hunks)src/handlers/http/cluster/mod.rs
(4 hunks)
✅ Files skipped from review due to trivial changes (1)
- Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (1)
- resources/ingest_demo_data.sh
🧰 Additional context used
🧠 Learnings (1)
src/handlers/http/cluster/mod.rs (5)
Learnt from: nikhilsinhaparseable
PR: #1340
File: src/query/mod.rs:64-66
Timestamp: 2025-06-18T06:39:04.775Z
Learning: In src/query/mod.rs, QUERY_SESSION_STATE and QUERY_SESSION serve different architectural purposes: QUERY_SESSION_STATE is used for stats calculation and allows dynamic registration of individual parquet files from the staging path (files created every minute), while QUERY_SESSION is used for object store queries with the global schema provider. Session contexts with schema providers don't support registering individual tables/parquets, so both session objects are necessary for their respective use cases.
Learnt from: nikhilsinhaparseable
PR: #1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.
Learnt from: de-sh
PR: #1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)
ensures proper error handling in both modes.
Learnt from: nikhilsinhaparseable
PR: #1346
File: src/handlers/http/health_check.rs:81-90
Timestamp: 2025-06-16T02:04:58.990Z
Learning: In the shutdown function in src/handlers/http/health_check.rs, the design approach is to log errors from sync operations rather than propagate them. This is intentional because the shutdown function is called on SIGTERM/SIGINT signals, and the goal is to perform best-effort cleanup (syncing pending files to object storage) while allowing the shutdown to proceed regardless of sync failures. Logging provides debugging information without blocking the shutdown process.
Learnt from: parmesant
PR: #1347
File: src/handlers/http/query.rs:0-0
Timestamp: 2025-06-18T12:44:31.983Z
Learning: The counts API in src/handlers/http/query.rs does not currently support group_by functionality in COUNT queries, so the hard-coded fields array ["start_time", "end_time", "count"] is appropriate for the current scope.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: coverage
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
🔇 Additional comments (5)
src/handlers/http/cluster/mod.rs (5)
21-74
: LGTM! Clean import additions and proper global state management.The new imports are well-organized and appropriate for the querier selection functionality. The lazy_static usage with Arc<RwLock<>> provides proper thread-safe access to the global querier state.
1116-1121
: Well-designed querier status tracking structure.The struct appropriately captures the necessary state for querier management: metadata, availability, and usage tracking for LRU selection.
1210-1298
: Well-implemented selection algorithms with proper edge case handling.Both selection strategies are correctly implemented:
- Round-robin properly cycles through available queriers with wrap-around logic
- LRU correctly prioritizes never-used queriers and selects based on oldest usage time
- Edge cases (empty maps, missing entries) are handled appropriately
1300-1307
: Simple and correct querier availability management.The function properly marks queriers as available while preserving usage tracking for LRU selection.
1309-1372
: Excellent error handling that addresses previous review concerns.The function properly ensures queriers are marked available on all code paths:
- Serialization errors (line 1325)
- HTTP request errors (line 1340)
- Successful requests (line 1346)
This implementation correctly resolves the potential querier exhaustion issue identified in previous reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/alerts/mod.rs (2)
621-627
: Consider more robust enum parsing with explicit error handling.The migration functions use string matching with default fallbacks for enums (severity, alert operators, alert state). While functional, this could mask data corruption issues during migration.
Consider adding warning logs for unknown enum values:
let severity = match severity_str.to_lowercase().as_str() { "critical" => Severity::Critical, "high" => Severity::High, "medium" => Severity::Medium, "low" => Severity::Low, - _ => Severity::Medium, // default + _ => { + warn!("Unknown severity '{}' in v1 alert migration, defaulting to Medium", severity_str); + Severity::Medium + } };Similar patterns should be applied to operator and state parsing.
Also applies to: 824-832, 885-890
1133-1141
: Consider batching storage operations during migration.The current migration approach saves each migrated alert individually to storage. For systems with many v1 alerts, this could result in numerous storage operations during startup.
Consider collecting migrated alerts and performing batch storage operations:
let mut migrated_alerts = Vec::new(); // Collect migrations first for raw_bytes in raw_objects { // ... migration logic ... if needs_migration { let migrated = AlertConfig::migrate_from_v1(&json_value, store.as_ref()).await?; migrated_alerts.push(migrated); } } // Batch save migrated alerts if !migrated_alerts.is_empty() { // Implement batch storage operation store.put_alerts_batch(&migrated_alerts).await?; }Also applies to: 1150-1156, 1170-1176
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/alerts/mod.rs
(15 hunks)
🧰 Additional context used
🧠 Learnings (1)
src/alerts/mod.rs (2)
Learnt from: nikhilsinhaparseable
PR: #1388
File: src/alerts/mod.rs:88-104
Timestamp: 2025-07-24T11:09:21.746Z
Learning: In the Parseable alert system (src/alerts/mod.rs), alert versions are server-generated and controlled via CURRENT_ALERTS_VERSION constant, not user input. The AlertVerison enum's From<&str> implementation correctly defaults unknown versions to V2 since the server only generates known versions (v1, v2). Unknown versions would only occur in exceptional cases like file corruption, making the current fallback approach appropriate.
Learnt from: de-sh
PR: #1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)
ensures proper error handling in both modes.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: coverage
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
🔇 Additional comments (14)
src/alerts/mod.rs (14)
63-63
: Version migration to v2 looks appropriate.The update to
CURRENT_ALERTS_VERSION = "v2"
and defaultingAlertVerison
toV2
aligns well with the alert system migration. Based on the retrieved learning, the server-controlled versioning approach with fallback to v2 is correctly implemented.Also applies to: 92-93, 100-101
54-58
: Helper struct is well-designed for migration.The
BasicAlertFields
struct appropriately captures common fields needed during v1 to v2 migration, keeping the migration code clean and organized.
396-399
: New data structures align well with simplified alert model.The
ThresholdConfig
struct andLogicalOperator
enum are well-defined and follow Rust conventions. The transition from aggregate-based to threshold-based configuration simplifies the alert system appropriately.Also applies to: 502-516
524-524
: Query-based alert model is consistently implemented.The replacement of
stream
field withquery
andaggregates
/aggregate_config
withthreshold_config
is consistently applied acrossAlertRequest
andAlertConfig
structs.Also applies to: 526-526, 542-542, 544-544, 561-561, 563-563
573-602
: Migration function structure is well-organized.The
migrate_from_v1
function follows a clear pattern of extracting components, building the migrated structure, and persisting it. The error handling and async design are appropriate.
637-650
: SQL query building logic handles aggregate functions correctly.The query building logic properly handles different aggregate functions including the special case for
COUNT DISTINCT
andCOUNT
with specific columns. The SQL syntax generation follows standard conventions.Also applies to: 676-710
773-809
: SQL injection protection is properly implemented.The WHERE clause formatting includes proper SQL escaping using
value.replace('\'', "''")
for all string values and uses double-quoted identifiers for column names, which provides good protection against SQL injection attacks.
940-961
: Comprehensive query validation with proper security checks.The validation logic is well-structured with appropriate checks for:
- Non-empty queries
- Stream name resolution and distributed stream creation
- User authorization
- Aggregate query verification
The integration with existing authorization and stream management functions maintains consistency with the broader system.
964-999
: Sophisticated aggregate query detection using DataFusion.The
is_query_aggregate
andis_logical_plan_aggregate
functions provide robust validation by analyzing the logical plan structure. The recursive analysis correctly handles:
- Direct aggregates (
LogicalPlan::Aggregate
)- Projections over aggregates with single expression validation
- Wrapped plans through recursive input checking
This approach is much more reliable than string-based query analysis.
1144-1177
: Robust migration detection and handling during alert loading.The loading logic effectively handles the v1 to v2 migration with multiple detection strategies:
- Explicit version checking
- Null query field detection
- Presence of legacy stream field
The fallback to migration for alerts without version fields ensures backward compatibility. Error handling with
continue
statements prevents single alert failures from blocking the entire loading process.
1180-1201
: Alert task creation includes appropriate retry logic.The retry mechanism for failed alert task creation provides resilience against transient failures. Boxing the alert configs aligns with the updated
AlertTask::Create
variant signature.
22-22
: Import updates support new functionality appropriately.The added imports for DataFusion logical plan analysis, JSON handling, and query functions align well with the new validation and migration capabilities.
Also applies to: 29-30, 44-44, 46-46, 49-49
84-84
: Method updates consistently apply the new data model.The updates to use
Box<AlertConfig>
,query
field, andthreshold_config
field are consistently applied across all relevant methods, maintaining API coherence.Also applies to: 899-899, 901-901, 1217-1217, 1304-1304
573-1207
: Excellent refactoring that successfully modernizes the alert system.This comprehensive migration from aggregate-based to query-based alerts demonstrates:
✅ Robust migration strategy with backward compatibility
✅ Enhanced security through proper SQL escaping and authorization checks
✅ Improved validation using DataFusion's logical plan analysis
✅ Consistent error handling throughout the migration and validation logic
✅ Clean code structure with well-organized helper functionsThe refactoring significantly simplifies the alert model while maintaining full functionality and providing a smooth upgrade path from v1 to v2 alerts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/alerts/alerts_utils.rs (1)
207-222
: Fix Clippy format string warnings.The number formatting logic is excellent, but there are Clippy warnings that should be addressed.
Apply this diff to fix the format string warnings:
- format!("{:.0}", value) + format!("{value:.0}")- format!("{:.4}", value) + format!("{value:.4}")- format!("{:.2}", value) + format!("{value:.2}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/alerts/alerts_utils.rs
(5 hunks)
🧰 Additional context used
🧠 Learnings (1)
src/alerts/alerts_utils.rs (1)
Learnt from: nikhilsinhaparseable
PR: #1388
File: src/alerts/mod.rs:88-104
Timestamp: 2025-07-24T11:09:21.746Z
Learning: In the Parseable alert system (src/alerts/mod.rs), alert versions are server-generated and controlled via CURRENT_ALERTS_VERSION constant, not user input. The AlertVerison enum's From<&str> implementation correctly defaults unknown versions to V2 since the server only generates known versions (v1, v2). Unknown versions would only occur in exceptional cases like file corruption, making the current fallback approach appropriate.
🪛 GitHub Actions: Lint, Test and Coverage Report
src/alerts/alerts_utils.rs
[error] 211-211: Clippy lint 'uninlined_format_args': variables can be used directly in the format!
string. Change format("{:.0}", value) to format("{value:.0}").
[error] 214-214: Clippy lint 'uninlined_format_args': variables can be used directly in the format!
string. Change format("{:.4}", value) to format("{value:.4}").
[error] 220-220: Clippy lint 'uninlined_format_args': variables can be used directly in the format!
string. Change format("{:.2}", value) to format("{value:.2}").
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
🔇 Additional comments (11)
src/alerts/alerts_utils.rs (11)
21-41
: LGTM! Import changes support the new query-based alert system.The new imports appropriately support the refactored alert evaluation logic, including remote query execution via cluster handlers and local query processing with DataFusion.
54-66
: LGTM! Excellent simplification of alert evaluation logic.The refactored
evaluate_alert
function properly implements the new query-and-threshold-based model, replacing the complex aggregate-based evaluation with a much cleaner flow.
68-76
: LGTM! Clean time range extraction implementation.The function correctly extracts time ranges from the evaluation configuration and handles errors appropriately.
78-91
: LGTM! Proper mode-based query execution dispatch.The function correctly routes alert query execution based on the server mode, supporting both local and remote execution patterns.
93-127
: LGTM! Solid local query execution implementation.The function properly integrates with the existing query infrastructure and handles all execution steps with appropriate error handling.
129-149
: LGTM! Proper remote query execution implementation.The function correctly constructs query requests and integrates with the cluster query infrastructure for remote execution.
151-164
: LGTM! Robust JSON to numeric conversion.The function properly handles various JSON numeric types and provides clear error messages for invalid data.
177-205
: LGTM! Alert state update properly adapted to new model.The function correctly uses the new threshold configuration and provides comprehensive alert messages including the SQL query for better troubleshooting.
224-252
: LGTM! Robust value extraction from query results.The function properly handles different numeric array types and provides a safe fallback mechanism with good debugging support.
254-346
: LGTM! Simplified filter string generation with proper SQL safety.The function correctly implements the simplified AND-only logic while maintaining comprehensive operator support and SQL injection protection through proper escaping.
348-383
: LGTM! Well-designed ValueType enum with proper trait implementations.The enum correctly handles different value types with appropriate parsing, formatting, and DataFusion integration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
src/alerts/alerts_utils.rs (1)
207-235
: Add bounds checking for safer array access.The function accesses array elements without verifying bounds, which could cause panics:
- Missing array length check:
array.value(0)
assumes the array has at least one element.- Limited type support: Only handles Float64Array and Int64Array, but queries might return other numeric types.
fn get_final_value(records: Vec<RecordBatch>) -> f64 { trace!("records-\n{records:?}"); + if records.is_empty() { + return 0.0; + } + + let batch = &records[0]; + if batch.num_columns() == 0 || batch.num_rows() == 0 { + return 0.0; + } + if let Some(f) = records .first() .and_then(|batch| { trace!("batch.column(0)-\n{:?}", batch.column(0)); batch.column(0).as_any().downcast_ref::<Float64Array>() }) + .filter(|array| array.len() > 0) .map(|array| { trace!("array-\n{array:?}"); array.value(0) }) { f } else { records .first() .and_then(|batch| { trace!("batch.column(0)-\n{:?}", batch.column(0)); batch.column(0).as_any().downcast_ref::<Int64Array>() }) + .filter(|array| array.len() > 0) .map(|array| { trace!("array-\n{array:?}"); array.value(0) }) .unwrap_or_default() as f64 } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/alerts/alerts_utils.rs
(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: coverage
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-unknown-linux-gnu
🔇 Additional comments (9)
src/alerts/alerts_utils.rs (9)
21-41
: LGTM! Import changes align with the new alert architecture.The added imports support the new query-based alert evaluation system with proper separation between local and remote execution modes.
54-66
: Excellent simplification of the alert evaluation logic.The new 3-step process (extract time range → execute query → evaluate condition) is much cleaner than the previous aggregate-based approach and aligns perfectly with the PR objectives to simplify alert creation and evaluation.
68-77
: LGTM! Time range extraction is correctly implemented.The hardcoded "now" for end_time is appropriate for rolling window evaluations. The error handling properly converts time parsing errors to AlertError.
78-91
: Excellent dispatch pattern for query execution modes.The clean separation between local (All/Query) and remote (Prism) execution with proper error handling for unsupported modes demonstrates good architectural design.
129-149
: LGTM! Remote query execution properly structured.The function correctly constructs the query request with appropriate parameters and handles errors properly. Good separation of concerns with the result conversion delegated to a separate function.
151-164
: LGTM! Comprehensive numeric type conversion.The function properly handles all common numeric JSON types (f64, i64, u64) with appropriate error handling for non-numeric results.
166-175
: LGTM! Condition evaluation logic is correct.All comparison operators are properly implemented with appropriate floating-point comparisons.
177-205
: LGTM! Alert state update properly reflects the new model.The updated message format with threshold configuration and query information is more informative than the previous aggregate-based messaging. State transition logic is correctly preserved.
237-366
: LGTM! Filter string generation correctly handles supported operations.The function properly:
- Restricts support to AND operator only (as intended)
- Escapes special characters in LIKE operations to prevent SQL injection
- Handles multiple value types with appropriate formatting
- Provides clear error messages for unsupported operations
The ValueType enum implementation is comprehensive and well-structured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/alerts/mod.rs (1)
94-103
: Version enum properly configured for v2 migration.The default to V2 and updated From implementation correctly support the migration from v1 to v2 alerts. As previously discussed, server-controlled versioning makes the fallback approach appropriate.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/alerts/mod.rs
(15 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka aarch64-apple-darwin
🔇 Additional comments (18)
src/alerts/mod.rs (18)
20-61
: LGTM! Good import organization and helper struct design.The new imports support the query-based alert system and migration functionality. The
BasicAlertFields
helper struct provides a clean way to extract common fields during v1 to v2 migration.
65-65
: Alert version successfully updated to v2.The version constant change aligns with the migration strategy to make v2 the current version for new alerts.
86-86
: Good optimization with boxed AlertConfig.Boxing the AlertConfig reduces stack allocation overhead for the enum variant, which is beneficial given the complex nested structure of alert configurations.
398-398
: Clear and descriptive struct name.Renaming to
ThresholdConfig
better describes the purpose compared to the previous aggregate-focused naming.
526-531
: Alert request structure successfully simplified.The new structure with
query
andthreshold_config
is much cleaner than the previous aggregate-based approach, making alert creation more intuitive.
563-565
: Alert configuration aligned with new query-based model.The configuration fields properly reflect the transition to SQL query-based alerts with threshold evaluation.
574-606
: Comprehensive migration orchestration.The migration function properly orchestrates the conversion from v1 to v2 format, with good error handling and automatic persistence of migrated alerts.
727-777
: Robust WHERE clause generation with proper type handling.The WHERE clause building logic properly handles schema-based type conversion and SQL injection prevention through proper escaping.
854-929
: Comprehensive data type conversion with good error handling.The type conversion logic properly handles various Arrow data types with appropriate SQL formatting. The error messages provide good context for debugging migration issues.
1039-1046
: Alert modification updated for new structure.The modification function properly updates the new query-based fields while maintaining validation of target IDs.
1080-1101
: Comprehensive query validation logic.The validation ensures:
- Query is not empty
- Stream names can be resolved
- User has proper authorization
- Query is an aggregate that returns a single value
This provides robust validation for the new query-based model.
1104-1139
: Smart aggregate query detection using logical plan analysis.Using DataFusion's logical plan analysis to determine if a query is aggregate is an excellent approach. The recursive checking handles complex query structures properly.
1284-1317
: Intelligent version detection and migration.The logic properly detects v1 alerts through multiple indicators:
- Explicit version field
- Missing query field
- Presence of stream field
This covers various scenarios where v1 alerts might exist.
1320-1343
: Robust alert task creation with retry logic.The boxed AlertConfig usage and retry mechanism ensure alerts are properly loaded even if there are temporary issues with the task channel.
1222-1223
: Appropriate error variant for query validation.The
InvalidAlertQuery
error provides clear feedback for query validation failures in the new alert system.
504-518
: Clean logical operator enum for SQL generation.The enum properly supports AND/OR operations needed for WHERE clause construction during migration, with appropriate Display formatting.
1357-1357
: Authorization properly updated for query-based alerts.The user authorization now correctly uses the alert's query field, which is cleaner than the previous aggregate-based approach.
1444-1444
: Consistent boxing in task creation.The boxed AlertConfig usage is consistent with the AlertTask enum change and provides memory optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/alerts/mod.rs (1)
574-1031
: Well-implemented migration logic with comprehensive error handling.The v1 to v2 migration logic is thorough and well-structured:
- Proper separation of concerns with helper functions
- Comprehensive error handling with informative messages
- Safe SQL query building with proper column quoting and string escaping
- Robust data type conversion based on schema information
Consider caching schema information during migration to avoid repeated fetches for the same stream, which could improve performance during startup with many alerts.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/alerts/mod.rs
(15 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: coverage
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
🔇 Additional comments (6)
src/alerts/mod.rs (6)
65-65
: LGTM: Version updates are consistent.The version constant update to "v2", default enum variant change to V2, and support for parsing "v2" strings are all correctly implemented and aligned with the migration strategy.
Also applies to: 94-95, 102-103
526-526
: LGTM: Structural changes align with query-based model.The transition from
stream
/aggregates toquery
/threshold_config
fields is consistent across AlertRequest and AlertConfig structures, properly supporting the new query-based alert model.Also applies to: 528-528, 544-544, 546-546, 563-563, 565-565
1049-1139
: Comprehensive query validation with proper security checks.The validation logic effectively covers all necessary aspects:
- Query non-emptiness and stream resolution
- Distributed execution compatibility
- User authorization verification
- Aggregate query validation using DataFusion's logical plan analysis
The recursive logical plan analysis in
is_logical_plan_aggregate
correctly identifies various aggregate query patterns.
1256-1347
: Seamless migration during alert loading.The loading logic effectively handles the transition from v1 to v2:
- Smart version detection using JSON parsing before deserialization
- Transparent migration of v1 alerts with proper error handling
- Graceful fallback behavior for alerts without version fields
- Comprehensive logging for debugging migration issues
The approach ensures backward compatibility while transitioning to the new format.
86-86
: Good optimization: Boxing AlertConfig reduces memory overhead.Boxing AlertConfig in AlertTask::Create is a sound optimization that:
- Reduces stack allocation for the large struct
- Minimizes copy overhead when passing through channels
- Is consistently applied throughout the codebase
Also applies to: 1322-1322, 1331-1331, 1444-1444
1222-1223
: Clean addition of specific error type for query validation.The new
InvalidAlertQuery
error variant provides clear, specific error handling for query validation failures, improving the user experience and debugging capabilities.Also applies to: 1245-1245
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Chores