In this article, we will learn How to Find and Remove Duplicate records in a table in SQL. We will explore some of the most effective method for finding and removing duplicate records in SQL, including using GROUP BY Clause, INNER JOIN, EXIST CLAUSE and ROW_NUMBER Function.
We need to create sample table
CREATE TABLE [dbo].[Customers]( [id] [int] NOT NULL, [firstName] [nvarchar](max) NOT NULL, [lastName] [nvarchar](max) NOT NULL, [job] [nvarchar](max) NOT NULL, [amount] [real] NOT NULL, [tdate] [datetime2](7) NOT NULL, [email] [varchar](25) NULL )
Then Insert the Sample record
INSERT [dbo].[Customers] ([id], [firstName], [lastName], [job], [amount], [tdate], [email]) VALUES (1, N'Astle', N'Vicky', N'Software Engineer', 50000, CAST(N'2021-01-29T00:00:00.0000000' AS DateTime2), N'astle@example.com') INSERT [dbo].[Customers] ([id], [firstName], [lastName], [job], [amount], [tdate], [email]) VALUES (2, N'John', N'Vicky', N'software engineer', 55000, CAST(N'2022-12-03T00:00:00.0000000' AS DateTime2), N'John@example.com') INSERT [dbo].[Customers] ([id], [firstName], [lastName], [job], [amount], [tdate], [email]) VALUES (3, N'Fleming', N'Stuart', N'software engineer', 55000, CAST(N'2022-12-29T00:00:00.0000000' AS DateTime2), N'astle@example.com') INSERT [dbo].[Customers] ([id], [firstName], [lastName], [job], [amount], [tdate], [email]) VALUES (4, N'Philip', N'John', N'Manager', 650000, CAST(N'2022-12-29T00:00:00.0000000' AS DateTime2), N'ricky@example.com') INSERT [dbo].[Customers] ([id], [firstName], [lastName], [job], [amount], [tdate], [email]) VALUES (5, N'Ricky', N'Fleming', N'DBA Administrator', 78000, CAST(N'2022-12-29T00:00:00.0000000' AS DateTime2), N'ricky@example.com') INSERT [dbo].[Customers] ([id], [firstName], [lastName], [job], [amount], [tdate], [email]) VALUES (6, N'mack', N'Mack', N'Manager', 56000, CAST(N'2022-12-20T00:00:00.0000000' AS DateTime2), N'nack@exampl.com') INSERT [dbo].[Customers] ([id], [firstName], [lastName], [job], [amount], [tdate], [email]) VALUES (7, N'steva', N'margh', N'Team Lead', 456000, CAST(N'2022-12-23T00:00:00.0000000' AS DateTime2), N'steva@example.com')
Here are few SQL methods for finding and removing duplicate records in a table using SQL
GROUP BY and HAVING CLAUSE
Suppose we have a table customer with columns id, firstname, email etc.
Suppose we have a table customer with columns id, firstname, email etc. To find and remove duplicate records based on the “email” column, you can use the following SQL query
SELECT EMAIL,COUNT(*) FROM Customers GROUP BY EMAIL HAVING COUNT(*)>1
This query will group the rows in the “Customers” table by the “email” column, and the return the email and count of each group that has more than on row. To delete the duplicate records , you can use the following SQL query
DELETE FROM Customers WHERE id NOT IN(SELECT MAX(id) FROM Customers GROUP BY EMAIL)
This query will delete all records in the customers table that have a duplicate email, except for the one with highest id value
EXISTS CLAUSE
Suppose we have a table customer with columns id, firstname, email etc To find and remove duplicate records based on the “firstname” column, you can use the following SQL query
SELECT c.id,c.firstName FROM Customers c WHERE EXISTS ( SELECT * FROM Customers cust WHERE c.firstName=cust.firstName AND c.id > cust.id )
This query will select all rows in the customers table where there exists another row with the same firstname and a lower id value. To delete the duplicate records , you can use the following SQL query
DELETE c FROM Customers c,Customers cust WHERE c.firstName=cust.firstName AND c.id > cust.id
This query will delete all records in the customers table that have a duplicate firstname, except for the one with lowest id value
More SQL Server Posts
How to Find and remove Numbers from string in a column using SQL Server How to count male and female without case statement in SQL How to Search Text using Contains Command in SQL ServerROW_NUMBER Function
In this query , I’m using the same example as in the above one, but this time I’m using the ROW_NUMBER Function to find the duplicate data
SELECT id, email , ROW_NUMBER() OVER ( PARTITION BY email ORDER BY id ) AS row_dup FROM Customers
The query return the id, email and row number for each row from the email column. To remove duplicate rows, modify the query as follows
WITH CT AS( SELECT id, email , ROW_NUMBER() OVER ( PARTITION BY email ORDER BY id ) AS row_dup FROM Customers ) DELETE FROM CT WHERE row_dup>1
This query will use a common table expression (CTE) to assign row numbers to each records in the customers table based on the email column, then delete any rows with a row number greater than 1