Skip to content
This repository was archived by the owner on Aug 22, 2025. It is now read-only.

Commit c035134

Browse files
author
Michael Liebmann
committed
refactor: improve multi-level aggregation guidance with explicit wrong/right examples
1 parent 364394d commit c035134

File tree

1 file changed

+53
-44
lines changed

1 file changed

+53
-44
lines changed

src/actions/chatWithYourDb.ts

Lines changed: 53 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -260,36 +260,40 @@ async function generateSqlQuery(apiKey: string, schemaInfo: string, question: st
260260
261261
7. Multi-level Aggregations:
262262
- Always use hierarchical aggregation for nested data:
263-
* First aggregate at the detail level to parent level
264-
* Then aggregate parents to segments
263+
* First CTE: Aggregate details to parent level
264+
* Second CTE: Segment parents based on characteristics
265+
* Final CTE: Calculate overall totals if needed
265266
- For entity-level averages:
266-
* Calculate totals per parent entity first
267-
* Then average the parent totals
267+
* WRONG: AVG(detail.value)
268+
* RIGHT: AVG(parent_totals.total_value)
268269
- For entity grouping:
269-
* Determine entity characteristics at parent level using MAX/MIN
270-
* Group by these parent-level attributes
270+
* WRONG: GROUP BY detail.attribute > 0
271+
* RIGHT: GROUP BY parent_level.has_attribute
271272
- For MECE (Mutually Exclusive, Collectively Exhaustive) results:
272-
* Ensure segments don't overlap by using parent-level flags
273-
* Verify segment totals match overall totals
274-
* Use parent-level COUNT instead of detail-level COUNT
273+
* WRONG: COUNT(DISTINCT parent_id) directly from details
274+
* RIGHT: COUNT(*) from parent-level CTE
275275
Example pattern:
276276
WITH detail_totals AS (
277+
-- First aggregate all details to parent level
277278
SELECT
278279
parent_id,
279-
SUM(quantity) as items_total,
280+
SUM(quantity) as total_quantity,
280281
SUM(amount) as total_amount,
281-
MAX(CASE WHEN attribute > 0 THEN 1 ELSE 0 END) as has_attribute
282+
MAX(CASE WHEN attribute > 0 THEN 1 ELSE 0 END) as has_attribute,
283+
SUM(amount * attribute) as attribute_amount
282284
FROM detail_table
283285
GROUP BY parent_id
284286
),
285287
parent_segments AS (
288+
-- Then segment based on parent-level characteristics
286289
SELECT
287290
CASE WHEN has_attribute = 1 THEN 'With Attribute'
288291
ELSE 'Without Attribute' END as segment,
289292
COUNT(*) as total_parents,
290-
AVG(items_total) as avg_items,
291-
SUM(total_amount) as total_value,
292-
AVG(total_amount) as avg_value
293+
ROUND(CAST(AVG(total_quantity) AS NUMERIC), 2) as avg_quantity,
294+
ROUND(CAST(AVG(total_amount) AS NUMERIC), 2) as avg_amount,
295+
ROUND(CAST(AVG(attribute_amount) AS NUMERIC), 2) as avg_attr_amount,
296+
ROUND(CAST(SUM(attribute_amount) * 100.0 / NULLIF(SUM(total_amount), 0) AS NUMERIC), 2) as attr_percentage
293297
FROM detail_totals
294298
GROUP BY has_attribute
295299
)
@@ -438,36 +442,41 @@ function formatQueryResponse(sqlQuery: string): string {
438442
* - Never use window function results directly in GROUP BY
439443
*
440444
* 7. "Non-MECE Results in Multi-Level Aggregations"
441-
* Problem: Mixing detail-level and parent-level calculations
442-
* Solution:
443-
* - Always use two-step aggregation for hierarchical data:
444-
* 1. First aggregate details to parent level
445-
* 2. Then aggregate parents to segments
446-
* - For parent entity characteristics:
447-
* * Use MAX() or similar to get single value per parent
448-
* - For averages:
449-
* * Calculate totals per parent first
450-
* * Then average the parent totals
451-
* Example fix:
452-
* Instead of:
453-
* SELECT AVG(quantity * amount)
454-
* FROM detail_table
455-
* GROUP BY has_attribute
456-
* Use:
457-
* WITH parent_totals AS (
458-
* SELECT parent_id,
459-
* SUM(quantity * amount) as parent_value
460-
* FROM detail_table
461-
* GROUP BY parent_id
462-
* )
463-
* SELECT AVG(parent_value)
464-
* FROM parent_totals
465-
* Testing:
466-
* - Compare total parent count with distinct parent_ids
467-
* - Verify sum of segments equals total
468-
* - Check if parent-level metrics match when calculated different ways
469-
* - Test with parents having multiple detail records
470-
* - Test with parents having mixed attribute values in details
445+
* Problem: Incorrect aggregation levels leading to wrong averages and counts
446+
* Solution:
447+
* - Always use three-step aggregation for hierarchical data:
448+
* 1. Aggregate details to parent level (all metrics per parent)
449+
* 2. Segment parents based on characteristics
450+
* 3. Calculate overall totals if needed
451+
* - Common mistakes and fixes:
452+
* Instead of:
453+
* SELECT
454+
* has_attribute,
455+
* COUNT(DISTINCT parent_id) as total,
456+
* AVG(amount) as avg_amount
457+
* FROM details
458+
* GROUP BY has_attribute
459+
* Use:
460+
* WITH parent_totals AS (
461+
* SELECT
462+
* parent_id,
463+
* MAX(CASE WHEN attribute > 0 THEN 1 ELSE 0 END) as has_attribute,
464+
* SUM(amount) as total_amount
465+
* FROM details
466+
* GROUP BY parent_id
467+
* )
468+
* SELECT
469+
* has_attribute,
470+
* COUNT(*) as total,
471+
* AVG(total_amount) as avg_amount
472+
* FROM parent_totals
473+
* GROUP BY has_attribute
474+
* Testing:
475+
* - Compare results with manual calculations for a small dataset
476+
* - Verify parent counts match between segments and totals
477+
* - Check that averages are calculated at the correct level
478+
* - Test with parents having varying numbers of detail records
479+
* - Test with mixed attribute values within the same parent
471480
*
472481
* 8. "Overloaded Error"
473482
* Problem: Query too complex or taking too long

0 commit comments

Comments
 (0)