Skip to content

Commit c88acea

Browse files
authored
Create Handle_Duplicates_Additional_Examples.md
1 parent 2cd90b3 commit c88acea

File tree

1 file changed

+236
-0
lines changed

1 file changed

+236
-0
lines changed
Lines changed: 236 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
Here are **additional examples** for handling **duplicates** in SQL with more **use cases** and explanations.
2+
3+
---
4+
5+
## **1. Detecting Exact Duplicates (All Columns Match)**
6+
Find rows where all column values are identical.
7+
8+
```sql
9+
SELECT *, COUNT(*) AS count
10+
FROM employees
11+
GROUP BY id, name, age, department, salary
12+
HAVING COUNT(*) > 1;
13+
```
14+
**Explanation:**
15+
- Groups rows by all columns.
16+
- Filters groups with a **count > 1**, indicating duplicates.
17+
18+
---
19+
20+
## **2. Detecting Partial Duplicates (Based on Specific Columns)**
21+
Find duplicates based only on specific columns (e.g., **name** and **department**).
22+
23+
```sql
24+
SELECT name, department, COUNT(*) AS count
25+
FROM employees
26+
GROUP BY name, department
27+
HAVING COUNT(*) > 1;
28+
```
29+
**Use Case:** Check if employees have been assigned to the **same department** multiple times.
30+
31+
---
32+
33+
## **3. Deleting Exact Duplicates**
34+
Keep only **one occurrence** of exact duplicates and delete others.
35+
36+
```sql
37+
DELETE FROM employees
38+
WHERE id NOT IN (
39+
SELECT MIN(id)
40+
FROM employees
41+
GROUP BY name, department, salary
42+
);
43+
```
44+
**Explanation:**
45+
- Groups duplicates and keeps the row with the **minimum ID**.
46+
- Deletes all others.
47+
48+
---
49+
50+
## **4. Delete Partial Duplicates**
51+
Remove duplicates based on **name** and **department**, keeping the **lowest salary**.
52+
53+
```sql
54+
DELETE FROM employees
55+
WHERE id NOT IN (
56+
SELECT MIN(id)
57+
FROM employees
58+
GROUP BY name, department
59+
);
60+
```
61+
62+
---
63+
64+
## **5. Handling Duplicates with ROW_NUMBER()**
65+
66+
### **a. Identify Duplicates Using ROW_NUMBER()**
67+
```sql
68+
SELECT *,
69+
ROW_NUMBER() OVER(PARTITION BY name, department ORDER BY id) AS row_num
70+
FROM employees;
71+
```
72+
**Explanation:**
73+
- Assigns a **row number** within each duplicate group.
74+
- Rows with **row_num > 1** are duplicates.
75+
76+
### **b. Remove Duplicates with ROW_NUMBER()**
77+
```sql
78+
DELETE FROM employees
79+
WHERE id IN (
80+
SELECT id
81+
FROM (
82+
SELECT id,
83+
ROW_NUMBER() OVER(PARTITION BY name, department ORDER BY id) AS row_num
84+
FROM employees
85+
) subquery
86+
WHERE row_num > 1
87+
);
88+
```
89+
**Use Case:** Deletes all duplicates while keeping only the **first occurrence** based on the **ID**.
90+
91+
---
92+
93+
## **6. Marking Duplicates Instead of Deleting (Soft Delete)**
94+
Useful when you want to **review duplicates later** instead of deleting them immediately.
95+
96+
```sql
97+
ALTER TABLE employees ADD COLUMN is_duplicate BOOLEAN DEFAULT FALSE;
98+
99+
UPDATE employees
100+
SET is_duplicate = TRUE
101+
WHERE id IN (
102+
SELECT id
103+
FROM (
104+
SELECT id, ROW_NUMBER() OVER(PARTITION BY name, department ORDER BY id) AS row_num
105+
FROM employees
106+
) subquery
107+
WHERE row_num > 1
108+
);
109+
```
110+
111+
---
112+
113+
## **7. Deduplicating with SELECT INTO (Copy Unique Rows)**
114+
Create a **new table** with unique records.
115+
116+
```sql
117+
CREATE TABLE unique_employees AS
118+
SELECT DISTINCT *
119+
FROM employees;
120+
```
121+
**Use Case:** Preserves the **original table** while creating a clean version with **unique rows**.
122+
123+
---
124+
125+
## **8. Preventing Future Duplicates (Constraints)**
126+
127+
### **a. Add a UNIQUE Constraint**
128+
```sql
129+
ALTER TABLE employees
130+
ADD CONSTRAINT unique_employee UNIQUE(name, department);
131+
```
132+
133+
### **b. Add a UNIQUE Index**
134+
```sql
135+
CREATE UNIQUE INDEX idx_unique_employee
136+
ON employees(name, department);
137+
```
138+
**Purpose:**
139+
- Ensures no future duplicates based on **name** and **department**.
140+
141+
---
142+
143+
## **9. Deduplicating Joins**
144+
145+
### **a. Find Duplicate Rows in Joins**
146+
```sql
147+
SELECT e1.*
148+
FROM employees e1
149+
JOIN employees e2
150+
ON e1.name = e2.name AND e1.department = e2.department
151+
WHERE e1.id > e2.id;
152+
```
153+
154+
### **b. Delete Duplicate Rows from Joins**
155+
```sql
156+
DELETE e1
157+
FROM employees e1
158+
JOIN employees e2
159+
ON e1.name = e2.name AND e1.department = e2.department
160+
WHERE e1.id > e2.id;
161+
```
162+
**Explanation:**
163+
- Deletes rows with **higher IDs**, retaining only **one copy**.
164+
165+
---
166+
167+
## **10. Finding Duplicate Records Based on Dates**
168+
169+
### **a. Detect Duplicate Orders by Date**
170+
```sql
171+
SELECT order_id, customer_id, order_date, COUNT(*) AS count
172+
FROM orders
173+
GROUP BY customer_id, order_date
174+
HAVING COUNT(*) > 1;
175+
```
176+
177+
### **b. Keep Only the Latest Order for Each Customer**
178+
```sql
179+
SELECT *
180+
FROM (
181+
SELECT *, ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY order_date DESC) AS row_num
182+
FROM orders
183+
) subquery
184+
WHERE row_num = 1;
185+
```
186+
**Explanation:**
187+
- Uses **ROW_NUMBER()** to keep the most recent order for each customer.
188+
189+
---
190+
191+
## **11. Handling Time-Based Duplicates (Retention Analysis)**
192+
193+
### **a. Find Customers with Multiple Purchases**
194+
```sql
195+
SELECT customer_id, COUNT(*) AS purchase_count
196+
FROM orders
197+
GROUP BY customer_id
198+
HAVING COUNT(*) > 1;
199+
```
200+
201+
### **b. Identify Consecutive Purchases (Churn Analysis)**
202+
```sql
203+
SELECT customer_id, order_date,
204+
LEAD(order_date) OVER(PARTITION BY customer_id ORDER BY order_date) AS next_order_date,
205+
DATEDIFF(LEAD(order_date) OVER(PARTITION BY customer_id ORDER BY order_date), order_date) AS days_between
206+
FROM orders;
207+
```
208+
**Use Case:**
209+
- Tracks the number of days between purchases to identify **customer churn patterns**.
210+
211+
---
212+
213+
## **12. Exporting Unique Rows (Backup)**
214+
215+
### **Export Unique Data:**
216+
```sql
217+
SELECT DISTINCT *
218+
INTO OUTFILE '/tmp/unique_employees.csv'
219+
FIELDS TERMINATED BY ','
220+
ENCLOSED BY '"'
221+
LINES TERMINATED BY '\n'
222+
FROM employees;
223+
```
224+
225+
---
226+
227+
## **Best Practices for Duplicate Handling**
228+
229+
1. **Analyze Before Deleting:** Always inspect duplicates with SELECT before DELETE.
230+
2. **Backup Tables:** Use SELECT INTO or create snapshots before modifying data.
231+
3. **Use Temporary Tables:** Create temp tables for intermediate results when testing queries.
232+
4. **Monitor Logs:** Review logs to prevent accidental duplicate insertion.
233+
5. **Constraints:** Use **PRIMARY KEYS**, **UNIQUE constraints**, and **INDEXES** to avoid future duplicates.
234+
235+
---
236+

0 commit comments

Comments
 (0)