The Ultimate Guide to Handle Duplicates in Excel

This post will show you how to handle duplicate entries or, in other words, repeated data in your data set in Microsoft Excel. There are many ways to handle duplicates in Excel. Sometimes you may have to remove duplicates. Sometimes you may have to work with the dataset while keeping the duplicates as it is. It totally depends on the requirement and the type of dataset. Let’s discuss how to handle duplicates in Excel.

How to visualize or highlight duplicates in Excel

Duplicates are not always bad. Duplicates can be occurred due to many reasons; it can be either human errors or sometimes it is due to the method of data collection. Sometimes duplicates have a useful purpose.

Before anything else, you should identify the duplicates. Then, you can visualize the duplicates by highlighting the rows with multiple occurrences. That way, you can identify the density or some pattern in duplicated data.

Visualizing duplicates will help you identify how much duplicated data you have and decide how to remove or eliminate duplicates, for example.

  • There may be only a few duplicates; you can remove them manually
  • There may be a large group of duplicated rows. Such things can happen as a human error when copying and pasting twice mistakenly.
  • If there is no pattern, you can decide on other methods of removing duplicates.
Note: Please note that I am creating this post with Microsoft Office 365. The options in earlier versions may be slightly different. 

We can easily recognize colors than text or numbers. So, conditional formatting is the best way to visualize duplicates in the dataset. With conditional formatting, you can highlight the cells or rows with duplicates with colors.

First, you need to decide based on which columns to find duplicates. It can be either based on single columns or multiple columns.

Highlight duplicate entries in a single column

This method highlights the cells with repeated values, including the first occurrence.

  1. Select the entire column or the range of data in the column
  2. Then go to Home > Conditional Formatting > Highlight Cells Rules > Duplicate Values…
  3. Next, in the Duplicate Values dialog box, select the preferred color and click Ok.

Now you can see the duplicate values are highlighted with the selected color.

The following video demonstrates how to highlight duplicates in a single column.

Highlight duplicate rows in Excel (multiple columns)

In most cases, we call duplicated rows when all the values in one row are identical to all the values in another row.

You can also compare rows considering only a few columns. In such a case, we can say there are partial duplicates if all the values in such rows (in selected columns only) are identical to all the values in another such row.

You can highlight both types of duplicate rows with Excel conditional formatting.

In Excel, you can create new formatting rules using formulas to determine whether the dataset has duplicate rows. Using this method, you can highlight duplicate rows and partial duplicate rows.

To highlight duplicate rows in Excel,

  1. Select the range of data you want to check for duplicates.
  2. Go to Home > Conditional Formatting > New Rule…
  3. In the New Formatting Rule popup box, select the Use a formula to determine which cells to format option.
  4. Then, you can build your custom formula in the formula box, under title Format values where this formula is true.
    • In this case, we need to include multiple criteria inside this formula. So, we use the COUNTIFS function here.
    • The COUNTIFS function counts the number of times all criteria are met.
    • To return a Boolean value (TRUE or FALSE), we check the count is greater than one (>1).
    • If there are multiple identical rows, the count is greater than one; the formula returns TRUE. Then it applies formatting to the identical rows, including the first occurrence.
    • For example, let’s say your data set has three columns, in columns A, B, and C, and the first row is the row headers. Then, you can write the following formula to format duplicate rows.
    • When writing this equation,
      • The first and last row of the selected range in step 1 and the criteria ranges must be the same.
      • Make sure to select the same number of rows to each additional column you apply the formatting.
      • The second argument of each criterion must be the topmost row of your range of data. (Ex. $A2, $B2, $C2 in the below formula)
=COUNTIFS($A$2:$A$16,$A2,$B$2:$B$16,$B2,$C$2:$C$16,$C2)>1
  1. Then, in the New Formatting Rule popup box, click the Format button. Then select the cell formatting and click OK.
  2. In the New Formatting Rule popup box, click OK to apply your formatting to the selected cell range.

If you want to highlight duplicate rows based on two columns, say E and F, you can modify the equation as below.

=COUNTIFS($E$2:$E$16,$E2,$F$2:$F$16,$F2)>1

How does the above formula work?

In the above formula, you can see there are two arguments to each criterion supplied to the COUNTIFS function. The first is the “Criteria range”, and the second is the “Criteria. For this specific case, the “Criteria range” is one column in the dataset, and “Criteria” is the topmost cell in that column.

COUNTIFS(criteria_range1, criteria1, [criteria_range2, criteria2]…)

In this case, the first criteria (first two arguments) $A$2:$A$16,$A2 initially find the identical values to cell B2. Similarly, other criteria find identical values to B2 and C2, respectively row by row.

If it found rows with identical values to A2, B2, and C2, the value returned by COUNTIFS is greater than 1. Then =COUNTIFS()>1 return TRUE. Then it applies conditional formatting to rows with identical values, including row 2. Likewise, the formula runs for each row in the data range.

Identify duplicates in Excel using formulas

This is another way to identify duplicates in Excel. You can write the above formulas in a helper column next to your dataset to identify the duplicates.

You can manage and analyze a group of related data easily by turning them into an Excel table. 

You may read 10 Benefits of Converting Data into Excel Tables.

Excel formulas to identify duplicates in one column

You can use the COUNTIF function to count duplicates in a single column. The result can be shown in the helper column in front of the relevant row.

For example.

You can use the following formula to find how many times a value appears on the list. Enter the formula in the first row and drag it down until the last row. Remember to fix rows and columns exactly as I have done below.

=COUNTIF($C$2:$C$16,$C2)

The equation returns the number of times that the values appear in the range.

You can also modify this equation as given below to get results according to your requirement.

The following formula returns TRUE if duplicates exist, and FALSE otherwise.

=COUNTIF($C$2:$C$16,$C2)>1

The following formula returns the “Unique” if it is a unique value.

=IF(COUNTIF($C$2:$C$16,$C2)>1,"","Unique")

Formulas to identify duplicate rows in Excel (Considering multiple columns)

You can use the same formula used above in a helper column to check for duplicates.

The following formula returns how many times each row is repeated in the selected range. In the formula, we consider all the columns to check duplicates. But you can use it to check partial duplicates also.

=COUNTIFS($A$2:$A$16,$A2,$B$2:$B$16,$B2,$C$2:$C$16,$C2)

You can also modify the above equation as below to return “TRUE” of “FALSE” based on whether the row has a duplicate or not.

=COUNTIFS($A$2:$A$16,$A2,$B$2:$B$16,$B2,$C$2:$C$16,$C2)>1

If you want to mark only the unique values, you can modify the equation as below.

=IF(COUNTIFS($A$2:$A$16,$A2,$B$2:$B$16,$B2,$C$2:$C$16,$C2)>1,"","Unique")

How to remove duplicates in Excel

In the previous sections, we discussed several ways to visualize duplicates in Excel. By doing that, you may find some of the duplicates are not useful to your analysis and need to be removed. So, you can use Excel “Remove Duplicates” option. But this method removes your duplicate data permanently.

Therefore, it is always a good practice to keep a backup of your original dataset before using the “Remove Duplicates” option.

This method can be applied to both single and multiple columns.

Removing duplicates in a single column in Excel

This is straightforward, to remove duplicates in Excel columns,

  1. Select the column you want to remove duplicates
  2. Go to Data > Remove Duplicates (in the “Data Tools” group)

Removing duplicate rows in Excel (considering multiple columns)

You can also remove duplicate rows in a data set. You can check for duplicates by considering all the columns or a few of them.

To delete duplicate rows,

  1. Select the entire dataset
  2. Go to Data > Remove Duplicates (in the “Data Tools” group)
  3. In the Remove Duplicates dialog box, select only the column you want to consider for checking the duplicates. Then Excel disregards the columns not selected for evaluation.
  4. Click OK.

If your dataset has duplicates, only one of them will remain, and others will be deleted.

Extracting unique records in Excel

This is a better way to handle data with duplicates in Excel. You can use the UNIQUE function to extract only the unique data ta to a different location. So, you don’t need to delete anything in the original dataset.

The UNIQUE function is currently available to Microsoft 365 subscribers only.

This function works across both rows and columns.

Extracting unique records from a single column

You can use the UNIQUE function in several ways. Once you enter the equation in one cell; then, it automatically spilled to neighboring cells.

To extract only the unique records in one column, use the following equation:

=UNIQUE(A5:A19)

If you want to extract records that appear in the range exactly once, use the following equation. This will not return any of the records which have multiple occurrences.

=UNIQUE(A5:A19,,TRUE)

Watch the following video that demonstrates the above equations.

Extract unique rows in Excel

Similarly, you can use the UNIQUE function to extract unique rows too. It identifies unique rows considering all the columns. If you have one column with unique data, this method will return all the rows as unique rows.

To extract only the unique rows, you can use the following equation. The following video demonstrates how you can use the below formula to extract unique orders from the list of orders with duplicates.

=UNIQUE(A5:C18)

If you want to extract the first and last names of the new customers, you can use the UNIQUE function as below. This formula returns the first and last names of the customers who have placed orders exactly once.

=UNIQUE(A5:B18,,TRUE)

Watch the video below that demonstrates how to extract unique records in Excel.

Filter unique records using Excel’s advanced filter

You can filter unique records from a range of cells using the Excel advanced filter. You can even use the above visualizing methods also to filter unique data. But with the advanced filter option, you can filter unique data directly.

Filter unique records, in-place

Follow the steps below to filter only the unique records without deleting others or copying them to another location. You can use this method to hide duplicates in Excel. It is useful when you want to present only a unique list of data by hiding others.

  1. Select the dataset
  2. Go to, Data > Advanced (in Sort & Filter group)
  3. In the “Advanced Filter” dialog box, under “Action”, select Filter the list, in-place
  4. Check Unique records only option
  5. Click OK.

Copy unique records to another location using Excel advanced filter option

You can also copy the unique items to another location. This method does not alter the original dataset. However, your copied data is not linked to the data source like in the UNIQUE function.

To copy unique data to another location,

  1. Select the range of data; this can be the complete data set or a few adjacent columns.
  2. Go to, Data > Advanced (in Sort & Filter group)
  3. In the “Advanced Filter” dialog box, under “Action”, select Copy to another location.
  4. For the Copy to field, select the first cell where you want to place the copied dataset.
  5. Check Unique records only option
  6. Click OK.

Dealing with partial duplicates in Excel

Partial duplicate is another common but harmful type of duplicate data. Partial duplicates occur when part of your record is identical to another record or set of records.

Partial duplicates can occur within a column or across multiple columns.

Example 01.

Let’s say you have a list of vehicle registration numbers recorded at two locations A and B, in a street section. When matching these numbers, you may find some of the numbers in location A do not match Location B. This can happen due to several reasons. One probable reason is human errors happen during reading and writing the vehicle numbers.

You can filter the numbers that do not match (unique numbers), as explained above. Some of the numbers in location A can be partially matched the numbers in location B. So, you can match part of the registration numbers as explained in the below table and formulas.

Find partial duplicates in

Match 6 characters from the Left.

=IF(SUMPRODUCT(--(LEFT($B$2:$B$15,6)=LEFT($B2,6)))>1,"Matched","")

Match the 5 characters from Left

=IF(SUMPRODUCT(--(LEFT($B$2:$B$15,5)=LEFT($B2,5)))>1,"Matched","")

Match 6 characters from the Right

=IF(SUMPRODUCT(--(RIGHT($B$2:$B$15,6)=RIGHT($B2,6)))>1,"Matched","")

Match 5 characters from the Right

=IF(SUMPRODUCT(--(RIGHT($B$2:$B$15,5)=RIGHT($B2,5)))>1,"Matched","")

Match without 3rd character

=IF(SUMPRODUCT(--(LEFT($B$2:$B$15,3)&RIGHT($B$2:$B$15,3)=LEFT($B2,3)&RIGHT($B2,3)))>1,"Matched","")

Example 02

You can also have partial duplicates across multiple columns. For example, these records contain the same name, phone number, and email as another record but include other non-matching data.

For this type of duplicates, first, you can use the highlighting method or formula method on columns that contain duplicates to mark the rows with duplicates. Then, you can use the method explained in above Example 01 for non-matching data.

Find duplicates in Excel using wildcard characters

Sometimes, you may need to find like matches. You can do it using the partial duplicate method explained above for this. But if you want to find such using the Excel Find option quickly, you can use wildcard characters.

In Excel, you can use a question mark (?), and asterisk (*).

  • With the question mark (?), you can find single characters. For example,
    • “s?t” find “sat” or “set”.
    • “f??t” find “feet” or “foot”.
    • “QS54?” find “QS541”, “QS542”, and “QS543” etc
  • With an asterisk (*), you can find any number of characters, for example:
    • “*east” find “Southeast” or “Northeast”
    • “s*t” find “sit” or “Southeast”

The following video demonstrates how you can use the wildcard characters in Excel to find partial duplicates.

Use the Excel pivot table to find duplicates

You can use the Pivot table to count how many times that value is repeated in the list. Then you can identify which value has duplicates. This method can directly apply to a single column as well as multiple columns by using a helper column.

Find duplicates in a single column using the Excel Pivot table

  1. Select the range of data you want to check for duplicates (the column).
  2. Then, go to Insert > Pivot Table.
  3. Select Existing Worksheet in the Create Pivot Table popup box.
  4. Select a cell in the worksheet for the Location field to place the Pivot table.
  5. Click Ok.
  6. Drag and drop the column name to the Rows box as well as to the Values box.
  7. For the Values box, by default, it should have selected the Count as the calculation method. If not, click the small down arrow and select “Value Field Settings…” then select the “Count” for the type of calculation.
  8. Then, the first column of the Pivot table contains the unique list of values in the dataset, and the second column shows how many times that value appears in the list.

Find duplicates in a single column using the Excel Pivot table

  1. Create a helper column and join all the columns into one string using the CONCATENATE function.
  2. Then follow the same procedure explained above to find duplicates (watch the video below).

Wrapping Up

Duplicates are not always a bad thing. In some cases, we intentionally collect duplicates. However, we should be able to identify duplicates in either case. So, in this post, I showed you several methods to deal with duplicates in Excel.

You should start the process by identifying the duplicates. That will make it easy to decide the ways you should treat duplicates. Excel Conditional formatting is the best way to highlight duplicates. Then you can use the methods I have shown above to handle your duplicate data. Of course, it depends on the nature of your dataset.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share via
Copy link