eigoshie · eigoshie · Apr 9, 2026
diff --git a/02_activities/assignments/DC_Cohort/.Rhistory b/02_activities/assignments/DC_Cohort/.Rhistory
@@ -0,0 +1,2 @@
+![Farmers Market Logical Model](./images/Farmers Market Logical Model.PNG)
+- These are the tables that are connected
diff --git a/02_activities/assignments/DC_Cohort/Assignment2.md b/02_activities/assignments/DC_Cohort/Assignment2.md
@@ -50,13 +50,22 @@ There are several tools online you can use, I'd recommend [Draw.io](https://www.
 #### Prompt 2
 We want to create employee shifts, splitting up the day into morning and evening. Add this to the ERD.
 
+![Bookstore Logical Model](./images/Bookstore_Logical_Model.png)
+
 #### Prompt 3
 The store wants to keep customer addresses. Propose two architectures for the CUSTOMER_ADDRESS table, one that will retain changes, and another that will overwrite. Which is type 1, which is type 2? 
 
 **HINT:** search type 1 vs type 2 slowly changing dimensions. 
 
 ```
-Your answer...
+Type 1 = overwrite all history. Type 2 = new row, history kept.
+
+Type 1 overwrites the existing address when it changes.
+There is only one row per customer showing the current address.
+
+Type 2 adds a new row each time the address changes. 
+Address history could be retained using a start_date and end_date column, or an is_current flag 
+(to track which address is the current address).
 ```
 
 ***
@@ -191,5 +200,32 @@ Consider, for example, concepts of labour, bias, LLM proliferation, moderating c
 
 
 ```
-Your thoughts...
+Boykis's article contends that machine learning systems are not autonomous, as they rely on human labour at every stage. This labour is often hidden, poorly compensated, and mentally taxing.
+
+This story raises a few key ethical issues.
+
+The first factor is labour. Data labellers and content moderators perform the essential work that makes AI function. They often encounter disturbing content, receive minimal support, and are paid very little. While the technology appears sleek and automated, it relies heavily on human labour that remains unseen.
+
+The second issue is bias. Since training data is labelled by people, it mirrors their judgments and blind spots. If labellers work quickly or lack diversity, these limitations become ingrained in the model. As a result, the model replicates this bias at scale, often appearing objective.
+
+The third factor is scale. As large language models gain wider adoption, the need for labelling and moderation grows. This requires more workers, frequently in lower-income brackets or countries with limited legal protections. The issue grows, yet the workers involved remain largely invisible.
+
+The fourth aspect is content moderation. Automated systems struggle to accurately identify harmful content, so human moderators are needed to fill that gap. However, this work can cause psychological harm, raising ethical concerns about creating systems that depend on people absorbing such harm to operate.
+
+Thus, AI ethics concerns who performs the work, the conditions under which they work, and who bears the costs.
+
+People who label training data and moderate content rarely benefit financially from the systems they help create. They often work under poor conditions, are exposed to harmful content, and are paid much less than the value of their work. Meanwhile, the corporations that deploy these systems profit, while the workers remain unseen.
+
+This is significant because the costs are tangible. Content moderators suffer psychological harm from repeatedly viewing disturbing material. Data labellers work quickly and under pressure, which can introduce bias into the machine learning models they help develop. These biased systems then operate at scale, replicating their biases across millions of interactions, often appearing objective.
+
+As AI systems are used more, these issues worsen. More data must be labelled, and more content must be moderated. The demand for this work grows while recognition for the workers diminishes. Much of this labour is outsourced to lower-income countries with weaker legal protections, exacerbating inequality.
+
+Solving these problems requires questioning who benefits from these systems, who is harmed in the process, and whether the costs are fairly shared.
 ```
+
+
+
+
+
+
+
diff --git a/02_activities/assignments/DC_Cohort/assignment2.sql b/02_activities/assignments/DC_Cohort/assignment2.sql
@@ -22,10 +22,9 @@ The `||` values concatenate the columns into strings.
 Edit the appropriate columns -- you're making two edits -- and the NULL rows will be fixed. 
 All the other rows will remain the same. */
 --QUERY 1
-
-
-
-
+SELECT 
+product_name || ', ' || COALESCE(product_size, '') || ' (' || COALESCE(product_qty_type, 'unit') || ')'
+FROM product;
 --END QUERY
 
 
@@ -40,10 +39,13 @@ each new market date for each customer, or select only the unique market dates p
 HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). 
 Filter the visits to dates before April 29, 2022. */
 --QUERY 2
-
-
-
-
+SELECT 
+    customer_id,
+    market_date,
+    DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY market_date) AS visit_number
+FROM customer_purchases
+WHERE market_date < '2022-04-29'; 
+/* April 29th is my birthday! IDK why I find this so exciting... LOL */
 --END QUERY
 
 
@@ -52,10 +54,16 @@ then write another query that uses this one as a subquery (or temp table) and fi
 only the customer’s most recent visit.
 HINT: Do not use the previous visit dates filter. */
 --QUERY 3
-
-
-
-
+WITH ranked AS (
+    SELECT 
+        customer_id,
+        market_date,
+        ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date DESC) AS visit_number
+    FROM customer_purchases
+)
+SELECT *
+FROM ranked
+WHERE visit_number = 1;
 --END QUERY
 
 
@@ -65,10 +73,13 @@ customer_purchases table that indicates how many different times that customer h
 You can make this a running count by including an ORDER BY within the PARTITION BY if desired.
 Filter the visits to dates before April 29, 2022. */
 --QUERY 4
-
-
-
-
+SELECT 
+    customer_id,
+    product_id,
+    market_date,
+    COUNT(*) OVER (PARTITION BY customer_id, product_id ORDER BY market_date) AS purchase_count
+FROM customer_purchases
+WHERE market_date < '2022-04-29';
 --END QUERY
 
 
@@ -84,19 +95,28 @@ Remove any trailing or leading whitespaces. Don't just use a case statement for
 
 Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */
 --QUERY 5
-
-
-
-
+SELECT 
+    product_name,
+    CASE 
+        WHEN INSTR(product_name, '-') > 0 
+        THEN TRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1))
+        ELSE NULL
+    END AS description
+FROM product;
 --END QUERY
 
 
 /* 2. Filter the query to show any product_size value that contain a number with REGEXP. */
 --QUERY 6
-
-
-
-
+SELECT 
+    product_name,
+    CASE 
+        WHEN INSTR(product_name, '-') > 0 
+        THEN TRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1))
+        ELSE NULL
+    END AS description
+FROM product
+WHERE product_size REGEXP '[0-9]';
 --END QUERY
 
 
@@ -110,10 +130,24 @@ HINT: There are a possibly a few ways to do this query, but if you're struggling
 3) Query the second temp table twice, once for the best day, once for the worst day, 
 with a UNION binding them. */
 --QUERY 7
-
-
-
-
+WITH daily_sales AS (
+    SELECT 
+        market_date,
+        SUM(quantity * cost_to_customer_per_qty) AS total_sales
+    FROM customer_purchases
+    GROUP BY market_date
+),
+ranked AS (
+    SELECT 
+        market_date,
+        total_sales,
+        RANK() OVER (ORDER BY total_sales DESC) AS best_rank,
+        RANK() OVER (ORDER BY total_sales ASC) AS worst_rank
+    FROM daily_sales
+)
+SELECT market_date, total_sales, 'Best Day' AS label FROM ranked WHERE best_rank = 1
+UNION
+SELECT market_date, total_sales, 'Worst Day' AS label FROM ranked WHERE worst_rank = 1;
 --END QUERY
 
 
@@ -131,10 +165,15 @@ Think a bit about the row counts: how many distinct vendors, product names are t
 How many customers are there (y). 
 Before your final group by you should have the product of those two queries (x*y).  */
 --QUERY 8
-
-
-
-
+SELECT 
+    v.vendor_name,
+    p.product_name,
+    5 * vi.original_price * COUNT(DISTINCT c.customer_id) AS total_revenue
+FROM (SELECT DISTINCT vendor_id, product_id, original_price FROM vendor_inventory) vi
+CROSS JOIN (SELECT customer_id FROM customer) c
+JOIN vendor v ON vi.vendor_id = v.vendor_id
+JOIN product p ON vi.product_id = p.product_id
+GROUP BY v.vendor_name, p.product_name;
 --END QUERY
 
 
@@ -144,20 +183,18 @@ This table will contain only products where the `product_qty_type = 'unit'`.
 It should use all of the columns from the product table, as well as a new column for the `CURRENT_TIMESTAMP`.  
 Name the timestamp column `snapshot_timestamp`. */
 --QUERY 9
-
-
-
-
+CREATE TABLE product_units AS
+SELECT *, CURRENT_TIMESTAMP AS snapshot_timestamp
+FROM product
+WHERE product_qty_type = 'unit';
 --END QUERY
 
 
 /*2. Using `INSERT`, add a new row to the product_units table (with an updated timestamp). 
 This can be any product you desire (e.g. add another record for Apple Pie). */
 --QUERY 10
-
-
-
-
+INSERT INTO product_units
+SELECT *, CURRENT_TIMESTAMP FROM product WHERE product_name = 'Apple Pie';
 --END QUERY
 
 
@@ -166,10 +203,13 @@ This can be any product you desire (e.g. add another record for Apple Pie). */
 
 HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/
 --QUERY 11
-
-
-
-
+DELETE FROM product_units
+WHERE product_name = 'Apple Pie'
+AND snapshot_timestamp = (
+    SELECT MIN(snapshot_timestamp) 
+    FROM product_units 
+    WHERE product_name = 'Apple Pie'
+);
 --END QUERY
 
 
@@ -190,10 +230,17 @@ Finally, make sure you have a WHERE statement to update the right row,
 	you'll need to use product_units.product_id to refer to the correct row within the product_units table. 
 When you have all of these components, you can run the update statement. */
 --QUERY 12
+ALTER TABLE product_units
+ADD current_quantity INT;
 
-
-
-
+UPDATE product_units
+SET current_quantity = (
+    SELECT COALESCE(vi.quantity, 0)
+    FROM vendor_inventory vi
+    WHERE vi.product_id = product_units.product_id
+    ORDER BY vi.market_date DESC
+    LIMIT 1
+);
 --END QUERY
 
 

diff --git a/02_activities/assignments/DC_Cohort/images/Bookstore_Logical_Model.png b/02_activities/assignments/DC_Cohort/images/Bookstore_Logical_Model.png
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		![Farmers Market Logical Model](./images/Farmers Market Logical Model.PNG)
		- These are the tables that are connected