Efficient SQL Query to Remove Duplicates with ROW_NUMBER

Using ROW_NUMBER() and PARTITION BY (Preferred Approach)

The ROW_NUMBER() function assigns a unique row number to each record within a partition (group). We can use this to identify and delete duplicates while keeping only the required data.

Query Syntax

WITH CTE AS (
    SELECT 
        *, 
        ROW_NUMBER() OVER (PARTITION BY column1, column2 ORDER BY id) AS row_num
    FROM table_name
    WHERE condition  -- Apply filtering condition here
) 
DELETE FROM CTE 
WHERE row_num > 1;

Example

Consider a Customer table with duplicate entries based on Email.

ID Name Email City
1 John [email protected] NY
2 Jane [email protected] LA
3 John [email protected] NY
4 Sam [email protected] TX

Removing Duplicates While Keeping the First Entry.

;WITH CTE AS (
    SELECT ID 
    FROM (
        SELECT ID, 
               ROW_NUMBER() OVER (PARTITION BY NAME, Email, City ORDER BY ID) AS RN  
        FROM Customers 
        WHERE City = 'NY'  -- Only NY state filtering condition
    ) AS sub
    WHERE RN > 1
)
DELETE FROM Customers 
WHERE ID IN (SELECT ID FROM CTE);

Explanation of the Query

  • Identifies Duplicates
    • ROW_NUMBER() OVER (PARTITION BY Name, Email, City ORDER BY ID)
    • Assign a row number (RN) for each duplicate group, keeping the first record (RN = 1).
  • Filters Out Duplicates (RN > 1): Only marks duplicate records where City = 'NY'.
  • Deletes Duplicate Records: Deletes only IDs from the CTE that have RN > 1
  • This syntax will be useful when we are joining more tables and deleting duplicates from one specific table.

Please comment on your ideas, and follow for more.

Ebook Download
View all
Learn
View all