Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions 02_activities/assignments/DC_Cohort/.Rhistory
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
![Farmers Market Logical Model](./images/Farmers Market Logical Model.PNG)
- These are the tables that are connected
40 changes: 38 additions & 2 deletions 02_activities/assignments/DC_Cohort/Assignment2.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,13 +50,22 @@ There are several tools online you can use, I'd recommend [Draw.io](https://www.
#### Prompt 2
We want to create employee shifts, splitting up the day into morning and evening. Add this to the ERD.

![Bookstore Logical Model](./images/Bookstore_Logical_Model.png)

#### Prompt 3
The store wants to keep customer addresses. Propose two architectures for the CUSTOMER_ADDRESS table, one that will retain changes, and another that will overwrite. Which is type 1, which is type 2?

**HINT:** search type 1 vs type 2 slowly changing dimensions.

```
Your answer...
Type 1 = overwrite all history. Type 2 = new row, history kept.

Type 1 overwrites the existing address when it changes.
There is only one row per customer showing the current address.

Type 2 adds a new row each time the address changes.
Address history could be retained using a start_date and end_date column, or an is_current flag
(to track which address is the current address).
```

***
Expand Down Expand Up @@ -191,5 +200,32 @@ Consider, for example, concepts of labour, bias, LLM proliferation, moderating c


```
Your thoughts...
Boykis's article contends that machine learning systems are not autonomous, as they rely on human labour at every stage. This labour is often hidden, poorly compensated, and mentally taxing.

This story raises a few key ethical issues.

The first factor is labour. Data labellers and content moderators perform the essential work that makes AI function. They often encounter disturbing content, receive minimal support, and are paid very little. While the technology appears sleek and automated, it relies heavily on human labour that remains unseen.

The second issue is bias. Since training data is labelled by people, it mirrors their judgments and blind spots. If labellers work quickly or lack diversity, these limitations become ingrained in the model. As a result, the model replicates this bias at scale, often appearing objective.

The third factor is scale. As large language models gain wider adoption, the need for labelling and moderation grows. This requires more workers, frequently in lower-income brackets or countries with limited legal protections. The issue grows, yet the workers involved remain largely invisible.

The fourth aspect is content moderation. Automated systems struggle to accurately identify harmful content, so human moderators are needed to fill that gap. However, this work can cause psychological harm, raising ethical concerns about creating systems that depend on people absorbing such harm to operate.

Thus, AI ethics concerns who performs the work, the conditions under which they work, and who bears the costs.

People who label training data and moderate content rarely benefit financially from the systems they help create. They often work under poor conditions, are exposed to harmful content, and are paid much less than the value of their work. Meanwhile, the corporations that deploy these systems profit, while the workers remain unseen.

This is significant because the costs are tangible. Content moderators suffer psychological harm from repeatedly viewing disturbing material. Data labellers work quickly and under pressure, which can introduce bias into the machine learning models they help develop. These biased systems then operate at scale, replicating their biases across millions of interactions, often appearing objective.

As AI systems are used more, these issues worsen. More data must be labelled, and more content must be moderated. The demand for this work grows while recognition for the workers diminishes. Much of this labour is outsourced to lower-income countries with weaker legal protections, exacerbating inequality.

Solving these problems requires questioning who benefits from these systems, who is harmed in the process, and whether the costs are fairly shared.
```







141 changes: 94 additions & 47 deletions 02_activities/assignments/DC_Cohort/assignment2.sql
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,9 @@ The `||` values concatenate the columns into strings.
Edit the appropriate columns -- you're making two edits -- and the NULL rows will be fixed.
All the other rows will remain the same. */
--QUERY 1




SELECT
product_name || ', ' || COALESCE(product_size, '') || ' (' || COALESCE(product_qty_type, 'unit') || ')'
FROM product;
--END QUERY


Expand All @@ -40,10 +39,13 @@ each new market date for each customer, or select only the unique market dates p
HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK().
Filter the visits to dates before April 29, 2022. */
--QUERY 2




SELECT
customer_id,
market_date,
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY market_date) AS visit_number
FROM customer_purchases
WHERE market_date < '2022-04-29';
/* April 29th is my birthday! IDK why I find this so exciting... LOL */
--END QUERY


Expand All @@ -52,10 +54,16 @@ then write another query that uses this one as a subquery (or temp table) and fi
only the customer’s most recent visit.
HINT: Do not use the previous visit dates filter. */
--QUERY 3




WITH ranked AS (
SELECT
customer_id,
market_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date DESC) AS visit_number
FROM customer_purchases
)
SELECT *
FROM ranked
WHERE visit_number = 1;
--END QUERY


Expand All @@ -65,10 +73,13 @@ customer_purchases table that indicates how many different times that customer h
You can make this a running count by including an ORDER BY within the PARTITION BY if desired.
Filter the visits to dates before April 29, 2022. */
--QUERY 4




SELECT
customer_id,
product_id,
market_date,
COUNT(*) OVER (PARTITION BY customer_id, product_id ORDER BY market_date) AS purchase_count
FROM customer_purchases
WHERE market_date < '2022-04-29';
--END QUERY


Expand All @@ -84,19 +95,28 @@ Remove any trailing or leading whitespaces. Don't just use a case statement for

Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */
--QUERY 5




SELECT
product_name,
CASE
WHEN INSTR(product_name, '-') > 0
THEN TRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1))
ELSE NULL
END AS description
FROM product;
--END QUERY


/* 2. Filter the query to show any product_size value that contain a number with REGEXP. */
--QUERY 6




SELECT
product_name,
CASE
WHEN INSTR(product_name, '-') > 0
THEN TRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1))
ELSE NULL
END AS description
FROM product
WHERE product_size REGEXP '[0-9]';
--END QUERY


Expand All @@ -110,10 +130,24 @@ HINT: There are a possibly a few ways to do this query, but if you're struggling
3) Query the second temp table twice, once for the best day, once for the worst day,
with a UNION binding them. */
--QUERY 7




WITH daily_sales AS (
SELECT
market_date,
SUM(quantity * cost_to_customer_per_qty) AS total_sales
FROM customer_purchases
GROUP BY market_date
),
ranked AS (
SELECT
market_date,
total_sales,
RANK() OVER (ORDER BY total_sales DESC) AS best_rank,
RANK() OVER (ORDER BY total_sales ASC) AS worst_rank
FROM daily_sales
)
SELECT market_date, total_sales, 'Best Day' AS label FROM ranked WHERE best_rank = 1
UNION
SELECT market_date, total_sales, 'Worst Day' AS label FROM ranked WHERE worst_rank = 1;
--END QUERY


Expand All @@ -131,10 +165,15 @@ Think a bit about the row counts: how many distinct vendors, product names are t
How many customers are there (y).
Before your final group by you should have the product of those two queries (x*y). */
--QUERY 8




SELECT
v.vendor_name,
p.product_name,
5 * vi.original_price * COUNT(DISTINCT c.customer_id) AS total_revenue
FROM (SELECT DISTINCT vendor_id, product_id, original_price FROM vendor_inventory) vi
CROSS JOIN (SELECT customer_id FROM customer) c
JOIN vendor v ON vi.vendor_id = v.vendor_id
JOIN product p ON vi.product_id = p.product_id
GROUP BY v.vendor_name, p.product_name;
--END QUERY


Expand All @@ -144,20 +183,18 @@ This table will contain only products where the `product_qty_type = 'unit'`.
It should use all of the columns from the product table, as well as a new column for the `CURRENT_TIMESTAMP`.
Name the timestamp column `snapshot_timestamp`. */
--QUERY 9




CREATE TABLE product_units AS
SELECT *, CURRENT_TIMESTAMP AS snapshot_timestamp
FROM product
WHERE product_qty_type = 'unit';
--END QUERY


/*2. Using `INSERT`, add a new row to the product_units table (with an updated timestamp).
This can be any product you desire (e.g. add another record for Apple Pie). */
--QUERY 10




INSERT INTO product_units
SELECT *, CURRENT_TIMESTAMP FROM product WHERE product_name = 'Apple Pie';
--END QUERY


Expand All @@ -166,10 +203,13 @@ This can be any product you desire (e.g. add another record for Apple Pie). */

HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/
--QUERY 11




DELETE FROM product_units
WHERE product_name = 'Apple Pie'
AND snapshot_timestamp = (
SELECT MIN(snapshot_timestamp)
FROM product_units
WHERE product_name = 'Apple Pie'
);
--END QUERY


Expand All @@ -190,10 +230,17 @@ Finally, make sure you have a WHERE statement to update the right row,
you'll need to use product_units.product_id to refer to the correct row within the product_units table.
When you have all of these components, you can run the update statement. */
--QUERY 12
ALTER TABLE product_units
ADD current_quantity INT;




UPDATE product_units
SET current_quantity = (
SELECT COALESCE(vi.quantity, 0)
FROM vendor_inventory vi
WHERE vi.product_id = product_units.product_id
ORDER BY vi.market_date DESC
LIMIT 1
);
--END QUERY


Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading