diff --git a/02_activities/assignments/DC_Cohort/A2_Prompt1_Yazhu Lin.png b/02_activities/assignments/DC_Cohort/A2_Prompt1_Yazhu Lin.png new file mode 100644 index 000000000..df7c9e1bb Binary files /dev/null and b/02_activities/assignments/DC_Cohort/A2_Prompt1_Yazhu Lin.png differ diff --git a/02_activities/assignments/DC_Cohort/A2_Prompt2_Yazhu Lin.png b/02_activities/assignments/DC_Cohort/A2_Prompt2_Yazhu Lin.png new file mode 100644 index 000000000..c2b7dc60a Binary files /dev/null and b/02_activities/assignments/DC_Cohort/A2_Prompt2_Yazhu Lin.png differ diff --git a/02_activities/assignments/DC_Cohort/Assignment1.md b/02_activities/assignments/DC_Cohort/Assignment1.md index f650c9752..093fdb253 100644 --- a/02_activities/assignments/DC_Cohort/Assignment1.md +++ b/02_activities/assignments/DC_Cohort/Assignment1.md @@ -209,5 +209,5 @@ Consider, for example, concepts of fariness, inequality, social structures, marg ``` -Your thoughts... +Databases might seem like just a technical tool for storing information, but after reading this article, I started to realize they actually reflect certain values and assumptions. For example, the system described in the article assumes that everyone fits into a traditional family structure, which is not true for many people. Because of this, some individuals were excluded or faced difficulties when the database could not recognize their situation. This made me think that databases can create unfairness, even if that is not the intention. In my everyday life, I also see similar situations, like when filling out forms that only allow limited options for gender or family background. These systems simplify people’s identities, but in doing so, they may ignore important differences and experiences. As someone new to this topic, I am beginning to understand that technology is not neutral, and the way databases are designed can shape how people are treated in society. ``` diff --git a/02_activities/assignments/DC_Cohort/Assignment2.md b/02_activities/assignments/DC_Cohort/Assignment2.md index 01f991d02..489618a38 100644 --- a/02_activities/assignments/DC_Cohort/Assignment2.md +++ b/02_activities/assignments/DC_Cohort/Assignment2.md @@ -56,8 +56,15 @@ The store wants to keep customer addresses. Propose two architectures for the CU **HINT:** search type 1 vs type 2 slowly changing dimensions. ``` -Your answer... -``` +Your answer +*** +I think there are two main ways the bookstore could design a CUSTOMER_ADDRESS table, depending on whether it wants to keep only the latest address or keep a history of address changes. + +The first option is the simpler one: one row per customer, where the address just gets overwritten whenever the customer moves. In that version, the table could have fields like customer_id, street_address, city, province, postal_code, country, and maybe last_updated_date. If the customer changes address, the old address is replaced in that same row. This approach is easy to manage and works well if the store only cares about the customer’s current mailing address. But the downside is that once the address is updated, the old one is gone. This would be a Type 1 slowly changing dimension, because the new data overwrites the old data. + +The second option is to keep address history. In that version, the table would allow multiple address records for the same customer over time. So instead of using customer_id as the primary key, the table could have something like customer_address_id as the primary key, plus customer_id as a foreign key. Then it could also include effective_start_date, effective_end_date, and maybe an is_current flag. Each time the customer moves, a new row would be inserted instead of updating the old one. That means the bookstore can still see previous addresses if it ever needs them for reporting, auditing, or historical analysis. This would be a Type 2 slowly changing dimension, because changes are handled by adding a new record and preserving the old one. + +So overall, Type 1 means overwrite the address and keep only the latest version, while Type 2 means insert a new row and retain the address history. *** @@ -193,3 +200,8 @@ Consider, for example, concepts of labour, bias, LLM proliferation, moderating c ``` Your thoughts... ``` +Reading Neural nets are just people all the way down really shifted how I think about AI. Before this, I often understood AI as something highly technical and distant from ordinary human work, almost as if it were operating on its own. But this article made me realize that behind these systems are many layers of human labour that are often hidden. What looks like “intelligent” output is actually built on people labeling data, filtering harmful content, making judgment calls, and shaping what the model learns. That was one of the biggest ethical issues for me: the invisibility of labour. The article shows that AI is not just a machine process. It depends on human workers, and often those workers do difficult, repetitive, and emotionally harmful tasks without much recognition. + +Another issue that stood out to me is bias. If neural nets are trained through human decisions, then they also absorb human assumptions, categories, and inequalities. That means AI does not simply reflect the world neutrally. It can reproduce the social biases already built into society, including racial, cultural, and linguistic hierarchies. As a PhD student in Higher Education at OISE, this made me think about how technologies that seem efficient or innovative can still reinforce exclusion if we do not ask who is designing them, who is labeling the data, and whose standards are treated as normal. + +I was also struck by the ethical problem of content moderation. The article reminded me that keeping AI systems “clean” or “safe” often requires humans to be exposed to disturbing material. So the convenience users experience may come at a psychological cost to invisible workers elsewhere. More broadly, this connects AI to society in a very direct way. LLMs are not separate from social systems. They are built through labour, shaped by power, and deployed into unequal worlds. For me, the article was a strong reminder that ethical questions about AI are always also questions about people. \ No newline at end of file diff --git a/02_activities/assignments/DC_Cohort/assignment1.sql b/02_activities/assignments/DC_Cohort/assignment1.sql index a6070558c..f483c41ba 100644 --- a/02_activities/assignments/DC_Cohort/assignment1.sql +++ b/02_activities/assignments/DC_Cohort/assignment1.sql @@ -6,28 +6,28 @@ --SELECT /* 1. Write a query that returns everything in the customer table. */ --QUERY 1 - - - - +SELECT * +FROM customer --END QUERY /* 2. Write a query that displays all of the columns and 10 rows from the customer table, sorted by customer_last_name, then customer_first_ name. */ --QUERY 2 - - - - +SELECT* +FROM customer +ORDER BY customer_last_name, customer_first_name +LIMIT 10; --END QUERY --WHERE /* 1. Write a query that returns all customer purchases of product IDs 4 and 9. Limit to 25 rows of output. */ - - +SELECT * +FROM customer_purchases +WHERE product_id IN (4,9) +LIMIT 25; /*2. Write a query that returns all customer purchases and a new calculated column 'price' (quantity * cost_to_customer_per_qty), filtered by customer IDs between 8 and 10 (inclusive) using either: @@ -36,10 +36,11 @@ filtered by customer IDs between 8 and 10 (inclusive) using either: Limit to 25 rows of output. */ --QUERY 3 - - - - +SELECT * +,quantity * cost_to_customer_per_qty AS price +FROM customer_purchases +WHERE customer_id BETWEEN 8 AND 10 +LIMIT 25; --END QUERY @@ -49,10 +50,14 @@ Using the product table, write a query that outputs the product_id and product_n columns and add a column called prod_qty_type_condensed that displays the word “unit” if the product_qty_type is “unit,” and otherwise displays the word “bulk.” */ --QUERY 4 - - - - +SELECT + product_id, + product_name, + CASE + WHEN product_qty_type = 'unit' THEN 'unit' + ELSE 'bulk' + END AS prod_qty_type_condensed +FROM product; --END QUERY @@ -60,36 +65,45 @@ if the product_qty_type is “unit,” and otherwise displays the word “bulk. add a column to the previous query called pepper_flag that outputs a 1 if the product_name contains the word “pepper” (regardless of capitalization), and otherwise outputs 0. */ --QUERY 5 - - - - +SELECT + product_id, + product_name, + CASE + WHEN product_qty_type = 'unit' THEN 'unit' + ELSE 'bulk' + END AS prod_qty_type_condensed, + CASE + WHEN LOWER(product_name) LIKE '%pepper%' THEN 1 + ELSE 0 + END AS pepper_flag +FROM product; --END QUERY - --JOIN /* 1. Write a query that INNER JOINs the vendor table to the vendor_booth_assignments table on the vendor_id field they both have in common, and sorts the result by market_date, then vendor_name. Limit to 24 rows of output. */ --QUERY 6 - - - - +SELECT * +FROM vendor v +INNER JOIN vendor_booth_assignments vba + ON v.vendor_id = vba.vendor_id +ORDER BY vba.market_date, v.vendor_name +LIMIT 24; --END QUERY - /* SECTION 3 */ -- AGGREGATE /* 1. Write a query that determines how many times each vendor has rented a booth at the farmer’s market by counting the vendor booth assignments per vendor_id. */ --QUERY 7 - - - - +SELECT + vendor_id, + COUNT(*) AS booth_count +FROM vendor_booth_assignments +GROUP BY vendor_id; --END QUERY @@ -99,13 +113,24 @@ of customers for them to give stickers to, sorted by last name, then first name. HINT: This query requires you to join two tables, use an aggregate function, and use the HAVING keyword. */ --QUERY 8 - - - - +SELECT + c.customer_id, + c.customer_first_name, + c.customer_last_name, + SUM(cp.quantity * cp.cost_to_customer_per_qty) AS total_spent +FROM customer c +INNER JOIN customer_purchases cp + ON c.customer_id = cp.customer_id +GROUP BY + c.customer_id, + c.customer_first_name, + c.customer_last_name +HAVING SUM(cp.quantity * cp.cost_to_customer_per_qty) > 2000 +ORDER BY c.customer_last_name, c.customer_first_name; --END QUERY + --Temp Table /* 1. Insert the original vendor table into a temp.new_vendor and then add a 10th vendor: Thomass Superfood Store, a Fresh Focused store, owned by Thomas Rosenthal @@ -118,10 +143,24 @@ When inserting the new vendor, you need to appropriately align the columns to be VALUES(col1,col2,col3,col4,col5) */ --QUERY 9 - - - - +CREATE TEMP TABLE new_vendor AS +SELECT * +FROM vendor; + +INSERT INTO new_vendor ( + vendor_id, + vendor_name, + vendor_type, + vendor_owner_first_name, + vendor_owner_last_name +) +VALUES ( + 10, + 'Thomass Superfood Store', + 'Fresh Focused', + 'Thomas', + 'Rosenthal' +); --END QUERY @@ -132,10 +171,12 @@ HINT: you might need to search for strfrtime modifers sqlite on the web to know and year are! Limit to 25 rows of output. */ --QUERY 10 - - - - +SELECT + customer_id, + STRFTIME('%m', market_date) AS month, + STRFTIME('%Y', market_date) AS year +FROM customer_purchases +LIMIT 25; --END QUERY @@ -146,8 +187,13 @@ HINTS: you will need to AGGREGATE, GROUP BY, and filter... but remember, STRFTIME returns a STRING for your WHERE statement... AND be sure you remove the LIMIT from the previous query before aggregating!! */ --QUERY 11 +SELECT + customer_id, + SUM(quantity * cost_to_customer_per_qty) AS total_spent +FROM customer_purchases +WHERE STRFTIME('%m', market_date) = '04' + AND STRFTIME('%Y', market_date) = '2022' +GROUP BY customer_id; +--END QUERY - - ---END QUERY diff --git a/02_activities/assignments/DC_Cohort/assignment2.sql b/02_activities/assignments/DC_Cohort/assignment2.sql index f7515f625..16369c47b 100644 --- a/02_activities/assignments/DC_Cohort/assignment2.sql +++ b/02_activities/assignments/DC_Cohort/assignment2.sql @@ -22,9 +22,9 @@ The `||` values concatenate the columns into strings. Edit the appropriate columns -- you're making two edits -- and the NULL rows will be fixed. All the other rows will remain the same. */ --QUERY 1 - - - +SELECT + product_name || ', ' || COALESCE(product_size, '') || ' (' || COALESCE(product_qty_type, 'unit') || ')' +FROM product; --END QUERY @@ -40,9 +40,16 @@ each new market date for each customer, or select only the unique market dates p HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). Filter the visits to dates before April 29, 2022. */ --QUERY 2 - - - +SELECT + customer_id, + market_date, + DENSE_RANK() OVER ( + PARTITION BY customer_id + ORDER BY market_date + ) AS visit_number +FROM customer_purchases +WHERE market_date < '2022-04-29' +ORDER BY customer_id, market_date; --END QUERY @@ -52,9 +59,22 @@ then write another query that uses this one as a subquery (or temp table) and fi only the customer’s most recent visit. HINT: Do not use the previous visit dates filter. */ --QUERY 3 - - - +SELECT + customer_id, + market_date, + visit_number +FROM ( + SELECT + customer_id, + market_date, + DENSE_RANK() OVER ( + PARTITION BY customer_id + ORDER BY market_date DESC + ) AS visit_number + FROM customer_purchases +) t +WHERE visit_number = 1 +ORDER BY customer_id, market_date; --END QUERY @@ -65,9 +85,16 @@ customer_purchases table that indicates how many different times that customer h You can make this a running count by including an ORDER BY within the PARTITION BY if desired. Filter the visits to dates before April 29, 2022. */ --QUERY 4 - - - +SELECT + customer_id, + market_date, + product_id, + COUNT(*) OVER ( + PARTITION BY customer_id, product_id + ) AS product_purchase_count +FROM customer_purchases +WHERE market_date < '2022-04-29' +ORDER BY customer_id, product_id, market_date; --END QUERY @@ -84,18 +111,24 @@ Remove any trailing or leading whitespaces. Don't just use a case statement for Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */ --QUERY 5 - - - +SELECT + product_name, + CASE + WHEN INSTR(product_name, '-') > 0 + THEN TRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1)) + ELSE NULL + END AS description +FROM product; --END QUERY /* 2. Filter the query to show any product_size value that contain a number with REGEXP. */ --QUERY 6 - - - +SELECT + * +FROM product +WHERE product_size REGEXP '[0-9]'; --END QUERY @@ -110,9 +143,36 @@ HINT: There are a possibly a few ways to do this query, but if you're struggling 3) Query the second temp table twice, once for the best day, once for the worst day, with a UNION binding them. */ --QUERY 7 - - - +WITH daily_sales AS ( + SELECT + market_date, + SUM(quantity * cost_to_customer_per_qty) AS total_sales + FROM customer_purchases + GROUP BY market_date +), +ranked_sales AS ( + SELECT + market_date, + total_sales, + RANK() OVER (ORDER BY total_sales DESC) AS highest_rank, + RANK() OVER (ORDER BY total_sales ASC) AS lowest_rank + FROM daily_sales +) +SELECT + market_date, + total_sales, + 'highest total sales' AS sales_day_type +FROM ranked_sales +WHERE highest_rank = 1 + +UNION + +SELECT + market_date, + total_sales, + 'lowest total sales' AS sales_day_type +FROM ranked_sales +WHERE lowest_rank = 1; --END QUERY @@ -131,9 +191,23 @@ Think a bit about the row counts: how many distinct vendors, product names are t How many customers are there (y). Before your final group by you should have the product of those two queries (x*y). */ --QUERY 8 - - - +SELECT + v.vendor_name, + p.product_name, + COUNT(c.customer_id) * 5 * vi.original_price AS total_revenue +FROM vendor_inventory vi +JOIN vendor v + ON vi.vendor_id = v.vendor_id +JOIN product p + ON vi.product_id = p.product_id +CROSS JOIN customer c +GROUP BY + v.vendor_name, + p.product_name, + vi.original_price +ORDER BY + v.vendor_name, + p.product_name; --END QUERY @@ -144,9 +218,12 @@ This table will contain only products where the `product_qty_type = 'unit'`. It should use all of the columns from the product table, as well as a new column for the `CURRENT_TIMESTAMP`. Name the timestamp column `snapshot_timestamp`. */ --QUERY 9 - - - +CREATE TABLE product_units AS +SELECT + *, + CURRENT_TIMESTAMP AS snapshot_timestamp +FROM product +WHERE product_qty_type = 'unit'; --END QUERY @@ -154,9 +231,21 @@ Name the timestamp column `snapshot_timestamp`. */ /*2. Using `INSERT`, add a new row to the product_units table (with an updated timestamp). This can be any product you desire (e.g. add another record for Apple Pie). */ --QUERY 10 - - - +INSERT INTO product_units ( + product_id, + product_name, + product_size, + product_qty_type, + snapshot_timestamp +) +SELECT + product_id, + product_name, + product_size, + product_qty_type, + CURRENT_TIMESTAMP +FROM product +WHERE product_name = 'Apple Pie'; --END QUERY @@ -166,9 +255,13 @@ This can be any product you desire (e.g. add another record for Apple Pie). */ HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/ --QUERY 11 - - - +DELETE FROM product_units +WHERE product_name = 'Apple Pie' + AND snapshot_timestamp = ( + SELECT MIN(snapshot_timestamp) + FROM product_units + WHERE product_name = 'Apple Pie' + ); --END QUERY @@ -190,9 +283,21 @@ Finally, make sure you have a WHERE statement to update the right row, you'll need to use product_units.product_id to refer to the correct row within the product_units table. When you have all of these components, you can run the update statement. */ --QUERY 12 +ALTER TABLE product_units +ADD current_quantity INT; - - +UPDATE product_units +SET current_quantity = COALESCE( + ( + SELECT vi.quantity + FROM vendor_inventory vi + WHERE vi.product_id = product_units.product_id + ORDER BY vi.market_date DESC + LIMIT 1 + ), + 0 +) +WHERE product_id IS NOT NULL; --END QUERY diff --git a/04_this_cohort/live_code/module_2/module_2.sqbpro b/04_this_cohort/live_code/module_2/module_2.sqbpro deleted file mode 100644 index 06850fe25..000000000 --- a/04_this_cohort/live_code/module_2/module_2.sqbpro +++ /dev/null @@ -1,179 +0,0 @@ -
/* MODULE 2 */ -/* SELECT */ - - -/* 1. Select everything in the customer table */ -SELECT - -/* 2. Use sql as a calculator */ - - - -/* 3. Add order by and limit clauses */ - - - -/* 4. Select multiple specific columns */ - - - -/* 5. Add a static value in a column */ - - --------------------------------------------------------------------------------------------------------------------------------------------- -/* MODULE 2 */ -/* WHERE */ - -/* 1. Select only customer 1 from the customer table */ -SELECT * -FROM customer -WHERE - - -/* 2. Differentiate between AND and OR */ - - - -/* 3. IN */ - - - -/* 4. LIKE */ - - - -/* 5. Nulls and Blanks*/ - - - -/* 6. BETWEEN x AND y */ - - --------------------------------------------------------------------------------------------------------------------------------------------- -/* MODULE 2 */ -/* CASE */ - - -SELECT * -/* 1. Add a CASE statement declaring which days vendors should come */ - - -/* 2. Add another CASE statement for Pie Day */ - - - -/* 3. Add another CASE statement with an ELSE clause to handle rows evaluating to False */ - - - -/* 4. Experiment with selecting a different column instead of just a string value */ - - -FROM vendor - - --------------------------------------------------------------------------------------------------------------------------------------------- -/* MODULE 2 */ -/* DISTINCT */ - - -/* 1. Compare how many customer_ids are the customer_purchases table, one select with distinct, one without */ - --- 4221 rows -SELECT customer_id FROM customer_purchases - - - -/* 2. Compare the difference between selecting market_day in market_date_info, with and without distinct: - what do these difference mean?*/ - - - -/* 3. Which vendor has sold products to a customer */ - - - -/* 4. Which vendor has sold products to a customer ... and which product was it */ - - - -/* 5. Which vendor has sold products to a customer -... and which product was it? -... AND to whom was it sold*/ - - --------------------------------------------------------------------------------------------------------------------------------------------- -/* MODULE 2 */ -/* INNER JOIN */ - - -/* 1. Get product names (from product table) alongside customer_purchases - ... use an INNER JOIN to see only products that have been purchased */ - --- without table aliases - - - - -/* 2. Using the Query #4 from DISTINCT earlier - (Which vendor has sold products to a customer AND which product was it AND to whom was it sold) - - Add customers' first and last names with an INNER JOIN */ - --- using table aliases - - - --------------------------------------------------------------------------------------------------------------------------------------------- -/* MODULE 2 */ -/* LEFT JOIN */ - - -/* 1. There are products that have been bought -... but are there products that have not been bought? -Use a LEFT JOIN to find out*/ - - -/* 2. Directions of LEFT JOINs matter ...*/ - - - - -/* 3. As do which values you filter on ... */ - - - - -/* 4. Without using a RIGHT JOIN, make this query return the RIGHT JOIN result set -...**Hint, flip the order of the joins** ... - -SELECT * - -FROM product_category AS pc -LEFT JOIN product AS p - ON pc.product_category_id = p.product_category_id - ORDER by pc.product_category_id - -...Note how the row count changed from 24 to 23 -*/ - - - --------------------------------------------------------------------------------------------------------------------------------------------- -/* MODULE 2 */ -/* Multiple Table JOINs */ - - -/* 1. Using the Query #4 from DISTINCT earlier - (Which vendor has sold products to a customer AND which product was it AND to whom was it sold) - - Replace all the IDs (customer, vendor, and product) with the names instead*/ - - - -/* 2. Select product_category_name, everything from the product table, and then LEFT JOIN the customer_purchases table -... how does this LEFT JOIN affect the number of rows? - -Why do we have more rows now?*/ - -