@@ -260,36 +260,40 @@ async function generateSqlQuery(apiKey: string, schemaInfo: string, question: st
260
260
261
261
7. Multi-level Aggregations:
262
262
- Always use hierarchical aggregation for nested data:
263
- * First aggregate at the detail level to parent level
264
- * Then aggregate parents to segments
263
+ * First CTE: Aggregate details to parent level
264
+ * Second CTE: Segment parents based on characteristics
265
+ * Final CTE: Calculate overall totals if needed
265
266
- For entity-level averages:
266
- * Calculate totals per parent entity first
267
- * Then average the parent totals
267
+ * WRONG: AVG(detail.value)
268
+ * RIGHT: AVG(parent_totals.total_value)
268
269
- For entity grouping:
269
- * Determine entity characteristics at parent level using MAX/MIN
270
- * Group by these parent-level attributes
270
+ * WRONG: GROUP BY detail.attribute > 0
271
+ * RIGHT: GROUP BY parent_level.has_attribute
271
272
- For MECE (Mutually Exclusive, Collectively Exhaustive) results:
272
- * Ensure segments don't overlap by using parent-level flags
273
- * Verify segment totals match overall totals
274
- * Use parent-level COUNT instead of detail-level COUNT
273
+ * WRONG: COUNT(DISTINCT parent_id) directly from details
274
+ * RIGHT: COUNT(*) from parent-level CTE
275
275
Example pattern:
276
276
WITH detail_totals AS (
277
+ -- First aggregate all details to parent level
277
278
SELECT
278
279
parent_id,
279
- SUM(quantity) as items_total ,
280
+ SUM(quantity) as total_quantity ,
280
281
SUM(amount) as total_amount,
281
- MAX(CASE WHEN attribute > 0 THEN 1 ELSE 0 END) as has_attribute
282
+ MAX(CASE WHEN attribute > 0 THEN 1 ELSE 0 END) as has_attribute,
283
+ SUM(amount * attribute) as attribute_amount
282
284
FROM detail_table
283
285
GROUP BY parent_id
284
286
),
285
287
parent_segments AS (
288
+ -- Then segment based on parent-level characteristics
286
289
SELECT
287
290
CASE WHEN has_attribute = 1 THEN 'With Attribute'
288
291
ELSE 'Without Attribute' END as segment,
289
292
COUNT(*) as total_parents,
290
- AVG(items_total) as avg_items,
291
- SUM(total_amount) as total_value,
292
- AVG(total_amount) as avg_value
293
+ ROUND(CAST(AVG(total_quantity) AS NUMERIC), 2) as avg_quantity,
294
+ ROUND(CAST(AVG(total_amount) AS NUMERIC), 2) as avg_amount,
295
+ ROUND(CAST(AVG(attribute_amount) AS NUMERIC), 2) as avg_attr_amount,
296
+ ROUND(CAST(SUM(attribute_amount) * 100.0 / NULLIF(SUM(total_amount), 0) AS NUMERIC), 2) as attr_percentage
293
297
FROM detail_totals
294
298
GROUP BY has_attribute
295
299
)
@@ -438,36 +442,41 @@ function formatQueryResponse(sqlQuery: string): string {
438
442
* - Never use window function results directly in GROUP BY
439
443
*
440
444
* 7. "Non-MECE Results in Multi-Level Aggregations"
441
- * Problem: Mixing detail-level and parent-level calculations
442
- * Solution:
443
- * - Always use two-step aggregation for hierarchical data:
444
- * 1. First aggregate details to parent level
445
- * 2. Then aggregate parents to segments
446
- * - For parent entity characteristics:
447
- * * Use MAX() or similar to get single value per parent
448
- * - For averages:
449
- * * Calculate totals per parent first
450
- * * Then average the parent totals
451
- * Example fix:
452
- * Instead of:
453
- * SELECT AVG(quantity * amount)
454
- * FROM detail_table
455
- * GROUP BY has_attribute
456
- * Use:
457
- * WITH parent_totals AS (
458
- * SELECT parent_id,
459
- * SUM(quantity * amount) as parent_value
460
- * FROM detail_table
461
- * GROUP BY parent_id
462
- * )
463
- * SELECT AVG(parent_value)
464
- * FROM parent_totals
465
- * Testing:
466
- * - Compare total parent count with distinct parent_ids
467
- * - Verify sum of segments equals total
468
- * - Check if parent-level metrics match when calculated different ways
469
- * - Test with parents having multiple detail records
470
- * - Test with parents having mixed attribute values in details
445
+ * Problem: Incorrect aggregation levels leading to wrong averages and counts
446
+ * Solution:
447
+ * - Always use three-step aggregation for hierarchical data:
448
+ * 1. Aggregate details to parent level (all metrics per parent)
449
+ * 2. Segment parents based on characteristics
450
+ * 3. Calculate overall totals if needed
451
+ * - Common mistakes and fixes:
452
+ * Instead of:
453
+ * SELECT
454
+ * has_attribute,
455
+ * COUNT(DISTINCT parent_id) as total,
456
+ * AVG(amount) as avg_amount
457
+ * FROM details
458
+ * GROUP BY has_attribute
459
+ * Use:
460
+ * WITH parent_totals AS (
461
+ * SELECT
462
+ * parent_id,
463
+ * MAX(CASE WHEN attribute > 0 THEN 1 ELSE 0 END) as has_attribute,
464
+ * SUM(amount) as total_amount
465
+ * FROM details
466
+ * GROUP BY parent_id
467
+ * )
468
+ * SELECT
469
+ * has_attribute,
470
+ * COUNT(*) as total,
471
+ * AVG(total_amount) as avg_amount
472
+ * FROM parent_totals
473
+ * GROUP BY has_attribute
474
+ * Testing:
475
+ * - Compare results with manual calculations for a small dataset
476
+ * - Verify parent counts match between segments and totals
477
+ * - Check that averages are calculated at the correct level
478
+ * - Test with parents having varying numbers of detail records
479
+ * - Test with mixed attribute values within the same parent
471
480
*
472
481
* 8. "Overloaded Error"
473
482
* Problem: Query too complex or taking too long
0 commit comments