diff --git a/02_activities/assignments/Cohort_8/Assignment2.md b/02_activities/assignments/Cohort_8/Assignment2.md index 47118b2ba..a0c981f2b 100644 --- a/02_activities/assignments/Cohort_8/Assignment2.md +++ b/02_activities/assignments/Cohort_8/Assignment2.md @@ -45,16 +45,36 @@ There are several tools online you can use, I'd recommend [Draw.io](https://www. **HINT:** You do not need to create any data for this prompt. This is a logical model (ERD) only. + +![alt text](image-prompt1&2-1.png) +![alt text](image-prompt1&2-1.png) + + #### Prompt 2 We want to create employee shifts, splitting up the day into morning and evening. Add this to the ERD. + + +See above + + + + #### Prompt 3 The store wants to keep customer addresses. Propose two architectures for the CUSTOMER_ADDRESS table, one that will retain changes, and another that will overwrite. Which is type 1, which is type 2? **HINT:** search type 1 vs type 2 slowly changing dimensions. ``` -Your answer... +![alt text](image-prompt3.png) + +image-prompt3.png + + + + + + ``` *** @@ -88,6 +108,15 @@ Find the NULLs and then using COALESCE, replace the NULL with a blank for the fi
-
+Answer + +SELECT +COALESCE(IFNULL(product_name, ''), '') || ', ' || COALESCE(IFNULL(product_size, 'unit'), 'unit') || ' (' || COALESCE(IFNULL(product_qty_type, ''), '') || ')' +FROM product; + + + + #### Windowed Functions 1. Write a query that selects from the customer_purchases table and numbers each customer’s visits to the farmer’s market (labeling each market date with a different number). Each customer’s first visit is labeled 1, second visit is labeled 2, etc. @@ -95,12 +124,56 @@ You can either display all rows in the customer_purchases table, with the counte **HINT**: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). + +answer + +SELECT + customer_id, + market_date, + ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date) AS visit_number +FROM customer_purchases; + + + 2. Reverse the numbering of the query from a part so each customer’s most recent visit is labeled 1, then write another query that uses this one as a subquery (or temp table) and filters the results to only the customer’s most recent visit. +answer + +SELECT + customer_id, + market_date, + ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date DESC) AS visit_number +FROM customer_purchases + + +query2 +SELECT * +FROM ( + SELECT + customer_id, + market_date, + ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date DESC) AS visit_number + FROM customer_purchases +) AS ranked_visits +WHERE visit_number = 1 + + + + 3. Using a COUNT() window function, include a value along with each row of the customer_purchases table that indicates how many different times that customer has purchased that product_id.
-
+ +answer + +SELECT + customer_id, + product_id, + market_date, + COUNT(*) OVER (PARTITION BY customer_id, product_id) AS product_purchase_count +FROM customer_purchases; + #### String manipulations 1. Some product names in the product table have descriptions like "Jar" or "Organic". These are separated from the product name with a hyphen. Create a column using SUBSTR (and a couple of other commands) that captures these, but is otherwise NULL. Remove any trailing or leading whitespaces. Don't just use a case statement for each product! @@ -110,10 +183,28 @@ You can either display all rows in the customer_purchases table, with the counte **HINT**: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. + +answer + +SELECT + product_name, + CASE + WHEN INSTR(product_name, '-') > 0 THEN TRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1)) + ELSE NULL + END AS description +FROM product; + 2. Filter the query to show any product_size value that contain a number with REGEXP.
-
+answer + +SELECT * +FROM product +WHERE product_size REGEXP '[0-9]'; + + #### UNION 1. Using a UNION, write a query that displays the market dates with the highest and lowest total sales. @@ -121,6 +212,35 @@ You can either display all rows in the customer_purchases table, with the counte *** + +answer + + +WITH date_sales AS ( + SELECT + market_date, + SUM(quantity * cost_to_customer_per_qty) AS total_sales + FROM customer_purchases + GROUP BY market_date +), +ranked_dates AS ( + SELECT + market_date, + total_sales, + RANK() OVER (ORDER BY total_sales DESC) AS best_rank, + RANK() OVER (ORDER BY total_sales ASC) AS worst_rank + FROM date_sales +) +SELECT market_date, total_sales, 'Best Day' AS category +FROM ranked_dates +WHERE best_rank = 1 + +UNION + +SELECT market_date, total_sales, 'Worst Day' AS category +FROM ranked_dates +WHERE worst_rank = 1; + ## Section 3: You can start this section following *session 5*. @@ -139,11 +259,71 @@ Steps to complete this part of the assignment:
-
+ +answer + + +WITH vendor_products AS ( + SELECT + vi.vendor_id, + vi.product_id, + original_price, + v.vendor_name, + p.product_name + FROM vendor_inventory vi + JOIN vendor v ON vi.vendor_id = v.vendor_id + JOIN product p ON vi.product_id = p.product_id +), +all_customers AS ( + SELECT customer_id FROM customer +) +SELECT + vp.vendor_name, + vp.product_name, + SUM(5 * original_price) AS total_revenue +FROM vendor_products vp +CROSS JOIN all_customers ac +GROUP BY vp.vendor_name, vp.product_name +ORDER BY vp.vendor_name, vp.product_name; + #### INSERT 1. Create a new table "product_units". This table will contain only products where the `product_qty_type = 'unit'`. It should use all of the columns from the product table, as well as a new column for the `CURRENT_TIMESTAMP`. Name the timestamp column `snapshot_timestamp`. +answer + + + +CREATE TABLE product_units AS +SELECT + p.*, + CURRENT_TIMESTAMP AS snapshot_timestamp +FROM product p +WHERE product_qty_type = 'unit'; + + 2. Using `INSERT`, add a new row to the product_unit table (with an updated timestamp). This can be any product you desire (e.g. add another record for Apple Pie). + +answer + + +INSERT INTO product_units ( + product_id, + product_name, + product_size, + product_category_id, + product_qty_type, + snapshot_timestamp +) +VALUES ( + 999, + 'Apple Pie', + '10 inch', + '7', + 'unit', + CURRENT_TIMESTAMP +); +
-
#### DELETE @@ -152,6 +332,10 @@ Steps to complete this part of the assignment: **HINT**: If you don't specify a WHERE clause, [you are going to have a bad time](https://imgflip.com/i/8iq872).
-
+answer + +DELETE FROM product_units +WHERE product_id = '999'; #### UPDATE 1. We want to add the current_quantity to the product_units table. First, add a new column, `current_quantity` to the table using the following syntax. @@ -163,3 +347,29 @@ ADD current_quantity INT; Then, using `UPDATE`, change the current_quantity equal to the **last** `quantity` value from the vendor_inventory details. **HINT**: This one is pretty hard. First, determine how to get the "last" quantity per product. Second, coalesce null values to 0 (if you don't have null values, figure out how to rearrange your query so you do.) Third, `SET current_quantity = (...your select statement...)`, remembering that WHERE can only accommodate one column. Finally, make sure you have a WHERE statement to update the right row, you'll need to use `product_units.product_id` to refer to the correct row within the product_units table. When you have all of these components, you can run the update statement. + + + +ALTER TABLE product_units +ADD current_quantitys INT; + +SELECT product_id, + COALESCE(quantity, 0) AS last_quantity +FROM ( + SELECT product_id, + quantity, + ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY market_date DESC) AS rn + FROM vendor_inventory +) t +WHERE rn = 1; + +UPDATE product_units +SET current_quantitys = ( + SELECT COALESCE(vi.quantity, 0) + FROM vendor_inventory vi + WHERE vi.product_id = product_id + ORDER BY vi.market_date DESC + LIMIT 1 +); + + diff --git a/02_activities/assignments/Cohort_8/Baruni Prabaharan - Assignment 1 Logical Diagram.xlsx b/02_activities/assignments/Cohort_8/Baruni Prabaharan - Assignment 1 Logical Diagram.xlsx new file mode 100644 index 000000000..c19ec190c Binary files /dev/null and b/02_activities/assignments/Cohort_8/Baruni Prabaharan - Assignment 1 Logical Diagram.xlsx differ diff --git a/02_activities/assignments/Cohort_8/Baruni Prabaharan Assignment 1.db b/02_activities/assignments/Cohort_8/Baruni Prabaharan Assignment 1.db new file mode 100644 index 000000000..bb787a662 Binary files /dev/null and b/02_activities/assignments/Cohort_8/Baruni Prabaharan Assignment 1.db differ diff --git a/02_activities/assignments/Cohort_8/Baruni Prabaharan Assignment 1.sqbpro b/02_activities/assignments/Cohort_8/Baruni Prabaharan Assignment 1.sqbpro new file mode 100644 index 000000000..976ebc152 --- /dev/null +++ b/02_activities/assignments/Cohort_8/Baruni Prabaharan Assignment 1.sqbpro @@ -0,0 +1,4 @@ +
SELECT * +FROM customer_purchases +WHERE product_id IN (4, 9); +
diff --git a/02_activities/assignments/Cohort_8/assignment1.sql b/02_activities/assignments/Cohort_8/assignment1.sql index c992e3205..7c65a7792 100644 --- a/02_activities/assignments/Cohort_8/assignment1.sql +++ b/02_activities/assignments/Cohort_8/assignment1.sql @@ -4,17 +4,22 @@ --SELECT /* 1. Write a query that returns everything in the customer table. */ - +SELECT * FROM customer; /* 2. Write a query that displays all of the columns and 10 rows from the cus- tomer table, sorted by customer_last_name, then customer_first_ name. */ +SELECT * FROM customer +ORDER BY customer_last_name, customer_first_name +limit 10; --WHERE /* 1. Write a query that returns all customer purchases of product IDs 4 and 9. */ - +SELECT * +FROM customer_purchases +WHERE product_id IN (4, 9); /*2. Write a query that returns all customer purchases and a new calculated column 'price' (quantity * cost_to_customer_per_qty), @@ -23,29 +28,65 @@ filtered by customer IDs between 8 and 10 (inclusive) using either: 2. one condition using BETWEEN */ -- option 1 +SELECT *, (QUANTITY*cost_to_customer_per_qty) AS PRICE +FROM customer_purchases +WHERE customer_id >= 8 AND customer_id <= 10; --- option 2 +-- option 2 +SELECT *, (QUANTITY*cost_to_customer_per_qty) AS PRICE +FROM customer_purchases +WHERE customer_id BETWEEN 8 AND 10; --CASE /* 1. Products can be sold by the individual unit or by bulk measures like lbs. or oz. Using the product table, write a query that outputs the product_id and product_name columns and add a column called prod_qty_type_condensed that displays the word “unit” if the product_qty_type is “unit,” and otherwise displays the word “bulk.” */ - +SELECT +product_id, +product_name, +CASE +WHEN product_qty_type = 'unit' THEN 'unit' +ELSE 'bulk' +END AS prod_qty_type_condensed +FROM product; /* 2. We want to flag all of the different types of pepper products that are sold at the market. add a column to the previous query called pepper_flag that outputs a 1 if the product_name contains the word “pepper” (regardless of capitalization), and otherwise outputs 0. */ +SELECT +product_id, +product_name, +CASE +WHEN product_qty_type = 'unit' THEN 'unit' +ELSE 'bulk' +END AS prod_qty_type_condensed +,CASE +WHEN LOWER(product_name) LIKE '%pepper%' THEN 1 +ELSE 0 +END AS pepper_flag +FROM product; + --JOIN /* 1. Write a query that INNER JOINs the vendor table to the vendor_booth_assignments table on the vendor_id field they both have in common, and sorts the result by vendor_name, then market_date. */ +SELECT +v.vendor_id, +v.vendor_name, +vba.market_date, +vba.booth_number +FROM vendor AS v +INNER JOIN vendor_booth_assignments AS vba +ON v.vendor_id = vba.vendor_id +ORDER BY v.vendor_name, vba.market_date; + @@ -55,6 +96,13 @@ vendor_id field they both have in common, and sorts the result by vendor_name, t -- AGGREGATE /* 1. Write a query that determines how many times each vendor has rented a booth at the farmer’s market by counting the vendor booth assignments per vendor_id. */ +SELECT +vendor_id, +COUNT(*) AS booth_rental_count +FROM vendor_booth_assignments +GROUP BY vendor_id +ORDER BY booth_rental_count DESC; + @@ -63,7 +111,17 @@ sticker to everyone who has ever spent more than $2000 at the market. Write a qu of customers for them to give stickers to, sorted by last name, then first name. HINT: This query requires you to join two tables, use an aggregate function, and use the HAVING keyword. */ - +SELECT +c.customer_id, +c.customer_first_name, +c.customer_last_name, +SUM(cp.quantity * cp.cost_to_customer_per_qty) AS total_spent +FROM customer_purchases AS cp +INNER JOIN customer AS c +ON cp.customer_id = c.customer_id +GROUP BY c.customer_id, c.customer_first_name, c.customer_last_name +HAVING SUM(cp.quantity * cp.cost_to_customer_per_qty) > 2000 +ORDER BY c.customer_last_name, c.customer_first_name; --Temp Table @@ -77,6 +135,13 @@ When inserting the new vendor, you need to appropriately align the columns to be -> To insert the new row use VALUES, specifying the value you want for each column: VALUES(col1,col2,col3,col4,col5) */ +CREATE TABLE temp.new_vendor AS +SELECT * +FROM vendor; +INSERT INTO temp.new_vendor (vendor_id, vendor_name, vendor_type, vendor_owner_first_name, vendor_owner_last_name) +VALUES (10, 'Thomass Superfood Store', 'Fresh Focused', 'Thomas', 'Rosenthal'); + + @@ -85,7 +150,11 @@ VALUES(col1,col2,col3,col4,col5) HINT: you might need to search for strfrtime modifers sqlite on the web to know what the modifers for month and year are! */ - +SELECT +customer_id, +strftime('%m', MARKET_date) AS month, +strftime('%Y', market_date) AS year +FROM customer_purchases; /* 2. Using the previous query as a base, determine how much money each customer spent in April 2022. @@ -94,3 +163,11 @@ Remember that money spent is quantity*cost_to_customer_per_qty. HINTS: you will need to AGGREGATE, GROUP BY, and filter... but remember, STRFTIME returns a STRING for your WHERE statement!! */ +SELECT +customer_id, +SUM(quantity * cost_to_customer_per_qty) AS total_spent +FROM customer_purchases +WHERE strftime('%m', market_date) = '04' +AND strftime('%Y', market_date) = '2022' +GROUP BY customer_id; + diff --git a/02_activities/assignments/Cohort_8/assignment2.sql b/02_activities/assignments/Cohort_8/assignment2.sql index c2743d3b7..0594253b8 100644 --- a/02_activities/assignments/Cohort_8/assignment2.sql +++ b/02_activities/assignments/Cohort_8/assignment2.sql @@ -21,6 +21,11 @@ The `||` values concatenate the columns into strings. Edit the appropriate columns -- you're making two edits -- and the NULL rows will be fixed. All the other rows will remain the same. */ +Answer + +SELECT +COALESCE(IFNULL(product_name, ''), '') || ', ' || COALESCE(IFNULL(product_size, 'unit'), 'unit') || ' (' || COALESCE(IFNULL(product_qty_type, ''), '') || ')' +FROM product; @@ -35,17 +40,46 @@ each new market date for each customer, or select only the unique market dates p HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). */ +SELECT + customer_id, + market_date, + ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date) AS visit_number +FROM customer_purchases; /* 2. Reverse the numbering of the query from a part so each customer’s most recent visit is labeled 1, then write another query that uses this one as a subquery (or temp table) and filters the results to only the customer’s most recent visit. */ +ELECT + customer_id, + market_date, + ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date DESC) AS visit_number +FROM customer_purchases + + +query2 +SELECT * +FROM ( + SELECT + customer_id, + market_date, + ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date DESC) AS visit_number + FROM customer_purchases +) AS ranked_visits +WHERE visit_number = 1; + + /* 3. Using a COUNT() window function, include a value along with each row of the customer_purchases table that indicates how many different times that customer has purchased that product_id. */ - +SELECT + customer_id, + product_id, + market_date, + COUNT(*) OVER (PARTITION BY customer_id, product_id) AS product_purchase_count +FROM customer_purchases; -- String manipulations /* 1. Some product names in the product table have descriptions like "Jar" or "Organic". @@ -60,9 +94,21 @@ Remove any trailing or leading whitespaces. Don't just use a case statement for Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */ +SELECT + product_name, + CASE + WHEN INSTR(product_name, '-') > 0 THEN TRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1)) + ELSE NULL + END AS description +FROM product; + + /* 2. Filter the query to show any product_size value that contain a number with REGEXP. */ +SELECT * +FROM product +WHERE product_size REGEXP '[0-9]'; -- UNION @@ -77,6 +123,31 @@ with a UNION binding them. */ +WITH date_sales AS ( + SELECT + market_date, + SUM(quantity * cost_to_customer_per_qty) AS total_sales + FROM customer_purchases + GROUP BY market_date +), +ranked_dates AS ( + SELECT + market_date, + total_sales, + RANK() OVER (ORDER BY total_sales DESC) AS best_rank, + RANK() OVER (ORDER BY total_sales ASC) AS worst_rank + FROM date_sales +) +SELECT market_date, total_sales, 'Best Day' AS category +FROM ranked_dates +WHERE best_rank = 1 + +UNION + +SELECT market_date, total_sales, 'Worst Day' AS category +FROM ranked_dates +WHERE worst_rank = 1; + /* SECTION 3 */ @@ -91,7 +162,28 @@ Think a bit about the row counts: how many distinct vendors, product names are t How many customers are there (y). Before your final group by you should have the product of those two queries (x*y). */ - +WITH vendor_products AS ( + SELECT + vi.vendor_id, + vi.product_id, + original_price, + v.vendor_name, + p.product_name + FROM vendor_inventory vi + JOIN vendor v ON vi.vendor_id = v.vendor_id + JOIN product p ON vi.product_id = p.product_id +), +all_customers AS ( + SELECT customer_id FROM customer +) +SELECT + vp.vendor_name, + vp.product_name, + SUM(5 * original_price) AS total_revenue +FROM vendor_products vp +CROSS JOIN all_customers ac +GROUP BY vp.vendor_name, vp.product_name +ORDER BY vp.vendor_name, vp.product_name; -- INSERT /*1. Create a new table "product_units". @@ -99,17 +191,40 @@ This table will contain only products where the `product_qty_type = 'unit'`. It should use all of the columns from the product table, as well as a new column for the `CURRENT_TIMESTAMP`. Name the timestamp column `snapshot_timestamp`. */ - +CREATE TABLE product_units AS +SELECT + p.*, + CURRENT_TIMESTAMP AS snapshot_timestamp +FROM product p +WHERE product_qty_type = 'unit'; /*2. Using `INSERT`, add a new row to the product_units table (with an updated timestamp). This can be any product you desire (e.g. add another record for Apple Pie). */ +INSERT INTO product_units ( + product_id, + product_name, + product_size, + product_category_id, + product_qty_type, + snapshot_timestamp +) +VALUES ( + 999, + 'Apple Pie', + '10 inch', + '7', + 'unit', + CURRENT_TIMESTAMP +); -- DELETE /* 1. Delete the older record for the whatever product you added. HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/ +DELETE FROM product_units +WHERE product_id = '999'; @@ -131,5 +246,28 @@ Finally, make sure you have a WHERE statement to update the right row, When you have all of these components, you can run the update statement. */ +ALTER TABLE product_units +ADD current_quantitys INT; + +SELECT product_id, + COALESCE(quantity, 0) AS last_quantity +FROM ( + SELECT product_id, + quantity, + ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY market_date DESC) AS rn + FROM vendor_inventory +) t +WHERE rn = 1; + +UPDATE product_units +SET current_quantitys = ( + SELECT COALESCE(vi.quantity, 0) + FROM vendor_inventory vi + WHERE vi.product_id = product_id + ORDER BY vi.market_date DESC + LIMIT 1 +); + + diff --git a/02_activities/assignments/Cohort_8/image-prompt1-2.png b/02_activities/assignments/Cohort_8/image-prompt1-2.png new file mode 100644 index 000000000..80d5fb7a6 Binary files /dev/null and b/02_activities/assignments/Cohort_8/image-prompt1-2.png differ diff --git a/02_activities/assignments/Cohort_8/image-prompt3.png b/02_activities/assignments/Cohort_8/image-prompt3.png new file mode 100644 index 000000000..6e2bf26dd Binary files /dev/null and b/02_activities/assignments/Cohort_8/image-prompt3.png differ