Here are additional examples for handling duplicates in SQL with more use cases and explanations.
Find rows where all column values are identical.
SELECT *, COUNT(*) AS count
FROM employees
GROUP BY id, name, age, department, salary
HAVING COUNT(*) > 1;
Explanation:
- Groups rows by all columns.
- Filters groups with a count > 1, indicating duplicates.
Find duplicates based only on specific columns (e.g., name and department).
SELECT name, department, COUNT(*) AS count
FROM employees
GROUP BY name, department
HAVING COUNT(*) > 1;
Use Case: Check if employees have been assigned to the same department multiple times.
Keep only one occurrence of exact duplicates and delete others.
DELETE FROM employees
WHERE id NOT IN (
SELECT MIN(id)
FROM employees
GROUP BY name, department, salary
);
Explanation:
- Groups duplicates and keeps the row with the minimum ID.
- Deletes all others.
Remove duplicates based on name and department, keeping the lowest salary.
DELETE FROM employees
WHERE id NOT IN (
SELECT MIN(id)
FROM employees
GROUP BY name, department
);
SELECT *,
ROW_NUMBER() OVER(PARTITION BY name, department ORDER BY id) AS row_num
FROM employees;
Explanation:
- Assigns a row number within each duplicate group.
- Rows with row_num > 1 are duplicates.
DELETE FROM employees
WHERE id IN (
SELECT id
FROM (
SELECT id,
ROW_NUMBER() OVER(PARTITION BY name, department ORDER BY id) AS row_num
FROM employees
) subquery
WHERE row_num > 1
);
Use Case: Deletes all duplicates while keeping only the first occurrence based on the ID.
Useful when you want to review duplicates later instead of deleting them immediately.
ALTER TABLE employees ADD COLUMN is_duplicate BOOLEAN DEFAULT FALSE;
UPDATE employees
SET is_duplicate = TRUE
WHERE id IN (
SELECT id
FROM (
SELECT id, ROW_NUMBER() OVER(PARTITION BY name, department ORDER BY id) AS row_num
FROM employees
) subquery
WHERE row_num > 1
);
Create a new table with unique records.
CREATE TABLE unique_employees AS
SELECT DISTINCT *
FROM employees;
Use Case: Preserves the original table while creating a clean version with unique rows.
ALTER TABLE employees
ADD CONSTRAINT unique_employee UNIQUE(name, department);
CREATE UNIQUE INDEX idx_unique_employee
ON employees(name, department);
Purpose:
- Ensures no future duplicates based on name and department.
SELECT e1.*
FROM employees e1
JOIN employees e2
ON e1.name = e2.name AND e1.department = e2.department
WHERE e1.id > e2.id;
DELETE e1
FROM employees e1
JOIN employees e2
ON e1.name = e2.name AND e1.department = e2.department
WHERE e1.id > e2.id;
Explanation:
- Deletes rows with higher IDs, retaining only one copy.
SELECT order_id, customer_id, order_date, COUNT(*) AS count
FROM orders
GROUP BY customer_id, order_date
HAVING COUNT(*) > 1;
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY order_date DESC) AS row_num
FROM orders
) subquery
WHERE row_num = 1;
Explanation:
- Uses ROW_NUMBER() to keep the most recent order for each customer.
SELECT customer_id, COUNT(*) AS purchase_count
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 1;
SELECT customer_id, order_date,
LEAD(order_date) OVER(PARTITION BY customer_id ORDER BY order_date) AS next_order_date,
DATEDIFF(LEAD(order_date) OVER(PARTITION BY customer_id ORDER BY order_date), order_date) AS days_between
FROM orders;
Use Case:
- Tracks the number of days between purchases to identify customer churn patterns.
SELECT DISTINCT *
INTO OUTFILE '/tmp/unique_employees.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM employees;
- Analyze Before Deleting: Always inspect duplicates with SELECT before DELETE.
- Backup Tables: Use SELECT INTO or create snapshots before modifying data.
- Use Temporary Tables: Create temp tables for intermediate results when testing queries.
- Monitor Logs: Review logs to prevent accidental duplicate insertion.
- Constraints: Use PRIMARY KEYS, UNIQUE constraints, and INDEXES to avoid future duplicates.
CompiledByUdithaWICK