|
| 1 | +Here are **additional examples** for handling **duplicates** in SQL with more **use cases** and explanations. |
| 2 | + |
| 3 | +--- |
| 4 | + |
| 5 | +## **1. Detecting Exact Duplicates (All Columns Match)** |
| 6 | +Find rows where all column values are identical. |
| 7 | + |
| 8 | +```sql |
| 9 | +SELECT *, COUNT(*) AS count |
| 10 | +FROM employees |
| 11 | +GROUP BY id, name, age, department, salary |
| 12 | +HAVING COUNT(*) > 1; |
| 13 | +``` |
| 14 | +**Explanation:** |
| 15 | +- Groups rows by all columns. |
| 16 | +- Filters groups with a **count > 1**, indicating duplicates. |
| 17 | + |
| 18 | +--- |
| 19 | + |
| 20 | +## **2. Detecting Partial Duplicates (Based on Specific Columns)** |
| 21 | +Find duplicates based only on specific columns (e.g., **name** and **department**). |
| 22 | + |
| 23 | +```sql |
| 24 | +SELECT name, department, COUNT(*) AS count |
| 25 | +FROM employees |
| 26 | +GROUP BY name, department |
| 27 | +HAVING COUNT(*) > 1; |
| 28 | +``` |
| 29 | +**Use Case:** Check if employees have been assigned to the **same department** multiple times. |
| 30 | + |
| 31 | +--- |
| 32 | + |
| 33 | +## **3. Deleting Exact Duplicates** |
| 34 | +Keep only **one occurrence** of exact duplicates and delete others. |
| 35 | + |
| 36 | +```sql |
| 37 | +DELETE FROM employees |
| 38 | +WHERE id NOT IN ( |
| 39 | + SELECT MIN(id) |
| 40 | + FROM employees |
| 41 | + GROUP BY name, department, salary |
| 42 | +); |
| 43 | +``` |
| 44 | +**Explanation:** |
| 45 | +- Groups duplicates and keeps the row with the **minimum ID**. |
| 46 | +- Deletes all others. |
| 47 | + |
| 48 | +--- |
| 49 | + |
| 50 | +## **4. Delete Partial Duplicates** |
| 51 | +Remove duplicates based on **name** and **department**, keeping the **lowest salary**. |
| 52 | + |
| 53 | +```sql |
| 54 | +DELETE FROM employees |
| 55 | +WHERE id NOT IN ( |
| 56 | + SELECT MIN(id) |
| 57 | + FROM employees |
| 58 | + GROUP BY name, department |
| 59 | +); |
| 60 | +``` |
| 61 | + |
| 62 | +--- |
| 63 | + |
| 64 | +## **5. Handling Duplicates with ROW_NUMBER()** |
| 65 | + |
| 66 | +### **a. Identify Duplicates Using ROW_NUMBER()** |
| 67 | +```sql |
| 68 | +SELECT *, |
| 69 | + ROW_NUMBER() OVER(PARTITION BY name, department ORDER BY id) AS row_num |
| 70 | +FROM employees; |
| 71 | +``` |
| 72 | +**Explanation:** |
| 73 | +- Assigns a **row number** within each duplicate group. |
| 74 | +- Rows with **row_num > 1** are duplicates. |
| 75 | + |
| 76 | +### **b. Remove Duplicates with ROW_NUMBER()** |
| 77 | +```sql |
| 78 | +DELETE FROM employees |
| 79 | +WHERE id IN ( |
| 80 | + SELECT id |
| 81 | + FROM ( |
| 82 | + SELECT id, |
| 83 | + ROW_NUMBER() OVER(PARTITION BY name, department ORDER BY id) AS row_num |
| 84 | + FROM employees |
| 85 | + ) subquery |
| 86 | + WHERE row_num > 1 |
| 87 | +); |
| 88 | +``` |
| 89 | +**Use Case:** Deletes all duplicates while keeping only the **first occurrence** based on the **ID**. |
| 90 | + |
| 91 | +--- |
| 92 | + |
| 93 | +## **6. Marking Duplicates Instead of Deleting (Soft Delete)** |
| 94 | +Useful when you want to **review duplicates later** instead of deleting them immediately. |
| 95 | + |
| 96 | +```sql |
| 97 | +ALTER TABLE employees ADD COLUMN is_duplicate BOOLEAN DEFAULT FALSE; |
| 98 | + |
| 99 | +UPDATE employees |
| 100 | +SET is_duplicate = TRUE |
| 101 | +WHERE id IN ( |
| 102 | + SELECT id |
| 103 | + FROM ( |
| 104 | + SELECT id, ROW_NUMBER() OVER(PARTITION BY name, department ORDER BY id) AS row_num |
| 105 | + FROM employees |
| 106 | + ) subquery |
| 107 | + WHERE row_num > 1 |
| 108 | +); |
| 109 | +``` |
| 110 | + |
| 111 | +--- |
| 112 | + |
| 113 | +## **7. Deduplicating with SELECT INTO (Copy Unique Rows)** |
| 114 | +Create a **new table** with unique records. |
| 115 | + |
| 116 | +```sql |
| 117 | +CREATE TABLE unique_employees AS |
| 118 | +SELECT DISTINCT * |
| 119 | +FROM employees; |
| 120 | +``` |
| 121 | +**Use Case:** Preserves the **original table** while creating a clean version with **unique rows**. |
| 122 | + |
| 123 | +--- |
| 124 | + |
| 125 | +## **8. Preventing Future Duplicates (Constraints)** |
| 126 | + |
| 127 | +### **a. Add a UNIQUE Constraint** |
| 128 | +```sql |
| 129 | +ALTER TABLE employees |
| 130 | +ADD CONSTRAINT unique_employee UNIQUE(name, department); |
| 131 | +``` |
| 132 | + |
| 133 | +### **b. Add a UNIQUE Index** |
| 134 | +```sql |
| 135 | +CREATE UNIQUE INDEX idx_unique_employee |
| 136 | +ON employees(name, department); |
| 137 | +``` |
| 138 | +**Purpose:** |
| 139 | +- Ensures no future duplicates based on **name** and **department**. |
| 140 | + |
| 141 | +--- |
| 142 | + |
| 143 | +## **9. Deduplicating Joins** |
| 144 | + |
| 145 | +### **a. Find Duplicate Rows in Joins** |
| 146 | +```sql |
| 147 | +SELECT e1.* |
| 148 | +FROM employees e1 |
| 149 | +JOIN employees e2 |
| 150 | +ON e1.name = e2.name AND e1.department = e2.department |
| 151 | +WHERE e1.id > e2.id; |
| 152 | +``` |
| 153 | + |
| 154 | +### **b. Delete Duplicate Rows from Joins** |
| 155 | +```sql |
| 156 | +DELETE e1 |
| 157 | +FROM employees e1 |
| 158 | +JOIN employees e2 |
| 159 | +ON e1.name = e2.name AND e1.department = e2.department |
| 160 | +WHERE e1.id > e2.id; |
| 161 | +``` |
| 162 | +**Explanation:** |
| 163 | +- Deletes rows with **higher IDs**, retaining only **one copy**. |
| 164 | + |
| 165 | +--- |
| 166 | + |
| 167 | +## **10. Finding Duplicate Records Based on Dates** |
| 168 | + |
| 169 | +### **a. Detect Duplicate Orders by Date** |
| 170 | +```sql |
| 171 | +SELECT order_id, customer_id, order_date, COUNT(*) AS count |
| 172 | +FROM orders |
| 173 | +GROUP BY customer_id, order_date |
| 174 | +HAVING COUNT(*) > 1; |
| 175 | +``` |
| 176 | + |
| 177 | +### **b. Keep Only the Latest Order for Each Customer** |
| 178 | +```sql |
| 179 | +SELECT * |
| 180 | +FROM ( |
| 181 | + SELECT *, ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY order_date DESC) AS row_num |
| 182 | + FROM orders |
| 183 | +) subquery |
| 184 | +WHERE row_num = 1; |
| 185 | +``` |
| 186 | +**Explanation:** |
| 187 | +- Uses **ROW_NUMBER()** to keep the most recent order for each customer. |
| 188 | + |
| 189 | +--- |
| 190 | + |
| 191 | +## **11. Handling Time-Based Duplicates (Retention Analysis)** |
| 192 | + |
| 193 | +### **a. Find Customers with Multiple Purchases** |
| 194 | +```sql |
| 195 | +SELECT customer_id, COUNT(*) AS purchase_count |
| 196 | +FROM orders |
| 197 | +GROUP BY customer_id |
| 198 | +HAVING COUNT(*) > 1; |
| 199 | +``` |
| 200 | + |
| 201 | +### **b. Identify Consecutive Purchases (Churn Analysis)** |
| 202 | +```sql |
| 203 | +SELECT customer_id, order_date, |
| 204 | + LEAD(order_date) OVER(PARTITION BY customer_id ORDER BY order_date) AS next_order_date, |
| 205 | + DATEDIFF(LEAD(order_date) OVER(PARTITION BY customer_id ORDER BY order_date), order_date) AS days_between |
| 206 | +FROM orders; |
| 207 | +``` |
| 208 | +**Use Case:** |
| 209 | +- Tracks the number of days between purchases to identify **customer churn patterns**. |
| 210 | + |
| 211 | +--- |
| 212 | + |
| 213 | +## **12. Exporting Unique Rows (Backup)** |
| 214 | + |
| 215 | +### **Export Unique Data:** |
| 216 | +```sql |
| 217 | +SELECT DISTINCT * |
| 218 | +INTO OUTFILE '/tmp/unique_employees.csv' |
| 219 | +FIELDS TERMINATED BY ',' |
| 220 | +ENCLOSED BY '"' |
| 221 | +LINES TERMINATED BY '\n' |
| 222 | +FROM employees; |
| 223 | +``` |
| 224 | + |
| 225 | +--- |
| 226 | + |
| 227 | +## **Best Practices for Duplicate Handling** |
| 228 | + |
| 229 | +1. **Analyze Before Deleting:** Always inspect duplicates with SELECT before DELETE. |
| 230 | +2. **Backup Tables:** Use SELECT INTO or create snapshots before modifying data. |
| 231 | +3. **Use Temporary Tables:** Create temp tables for intermediate results when testing queries. |
| 232 | +4. **Monitor Logs:** Review logs to prevent accidental duplicate insertion. |
| 233 | +5. **Constraints:** Use **PRIMARY KEYS**, **UNIQUE constraints**, and **INDEXES** to avoid future duplicates. |
| 234 | + |
| 235 | +--- |
| 236 | + |
0 commit comments