Using ROW_NUMBER() and PARTITION BY (Preferred Approach)
The ROW_NUMBER() function assigns a unique row number to each record within a partition (group). We can use this to identify and delete duplicates while keeping only the required data.
Query Syntax
WITH CTE AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY column1, column2 ORDER BY id) AS row_num
FROM table_name
WHERE condition -- Apply filtering condition here
)
DELETE FROM CTE
WHERE row_num > 1;
Example
Consider a Customer table with duplicate entries based on Email.
Removing Duplicates While Keeping the First Entry.
;WITH CTE AS (
SELECT ID
FROM (
SELECT ID,
ROW_NUMBER() OVER (PARTITION BY NAME, Email, City ORDER BY ID) AS RN
FROM Customers
WHERE City = 'NY' -- Only NY state filtering condition
) AS sub
WHERE RN > 1
)
DELETE FROM Customers
WHERE ID IN (SELECT ID FROM CTE);
Explanation of the Query
- Identifies Duplicates
- ROW_NUMBER() OVER (PARTITION BY Name, Email, City ORDER BY ID)
- Assign a row number (RN) for each duplicate group, keeping the first record (RN = 1).
- Filters Out Duplicates (RN > 1): Only marks duplicate records where City = 'NY'.
- Deletes Duplicate Records: Deletes only IDs from the CTE that have RN > 1
- This syntax will be useful when we are joining more tables and deleting duplicates from one specific table.
Please comment on your ideas, and follow for more.