Jump to content

Excel - remove duplicates

Recommended Posts

I have a spreadsheet which consists only of a column A with people's names in it. There are 18,355 rows.

 

I have used Excel's remove 'duplicates' tool and get a message that 15,639 duplicates have been found and removed, and 2991 remain.

 

But 15,639 + 2991 does not equal 18,355, the number I started with.

 

Any idea why this could be?

 

Thanks

Share this post


Link to post
Share on other sites

Some duplicates could be there 3 or more times.

Share this post


Link to post
Share on other sites
Some duplicates could be there 3 or more times.

 

How would that make the total removed plus the total remaining greater than the original line count? Or am I missing something simple?

Share this post


Link to post
Share on other sites

Is it possible that you had a number of 'blank' cells (i.e. just spaces in the cell) after your 18,355 rows?

 

If you then use 'remove duplicates' on the whole column, the removal of all but one of these 'blank' cells will be included in the 'removed' total.

 

*edit : the cells don't have to have a blank character in, just pressing 'return' to go down to the next cell makes Excel think there's something there...

Edited by nickycheese

Share this post


Link to post
Share on other sites
Is it possible that you had a number of 'blank' cells (i.e. just spaces in the cell) after your 18,355 rows?

 

If you then use 'remove duplicates' on the whole column, the removal of all but one of these 'blank' cells will be included in the 'removed' total.

 

*edit : the cells don't have to have a blank character in the, just pressing 'return' to go down to the next cell makes Excel think there's something there...

 

If Excel thought there was something there then wouldn't these "extra" rows be included in the original count? Genuine question...

Share this post


Link to post
Share on other sites

It might just be that you had more than two duplicates of one kind of data.

Share this post


Link to post
Share on other sites
How would that make the total removed plus the total remaining greater than the original line count? Or am I missing something simple?

 

If number of duplicates is a count of values repeated at least once then it wouldn't be the same as the number of rows removed.

Easy to test though, just set up a simple

1

2

2

2

3

 

run the tool and see what the count is for number of duplicates.

 

Edit - actually my explanation only makes sense for the combined number to be less than the total, not more. So, probably not the answer.

Share this post


Link to post
Share on other sites
If number of duplicates is a count of values repeated at least once then it wouldn't be the same as the number of rows removed.

Easy to test though, just set up a simple

1

2

2

2

3

 

run the tool and see what the count is for number of duplicates.

 

Edit - actually my explanation only makes sense for the combined number to be less than the total, not more. So, probably not the answer.

 

:) Your edit is what I was trying to get at... I can't see how the number of removed rows plus the number of remaining ones is greater then the original total..the OP han't come back to reply so maybe we'll never know :)

Share this post


Link to post
Share on other sites
If Excel thought there was something there then wouldn't these "extra" rows be included in the original count? Genuine question...

 

I'm guessing the value of his original "18,355 rows." was obtained by manually looking for the last value, rather than calculated by Excel.

 

Weird stuff - try this in a new workbook (Excel 2013)

type in:

a <return>

a <return>

b <return>

b <return>

c <return>

c <return>

<return>

<return>

<return>

<return>

 

This puts a,a,b,b,c,c,'blank','blank','blank','blank' in the first 10 cells of column A.

 

Select column A and do 'Remove Duplicates' - leaves a,b,c and says "3 duplicate values found; 3 unique values remain" as expected.

 

Select column A again and do 'Remove Duplicates' again - looks the same, but says "2 duplicate values found; 4 unique values remain".

 

No matter how many times you repeat 'Remove Duplicates', it always says "2 duplicate values found; 4 unique values remain"!?!?!?!?!?

 

Weird :confused:

Share this post


Link to post
Share on other sites

OP here!

Thanks for your help - I'll have a good look at this over the weekend!

Share this post


Link to post
Share on other sites
I'm guessing the value of his original "18,355 rows." was obtained by manually looking for the last value, rather than calculated by Excel.

 

Weird stuff - try this in a new workbook (Excel 2013)

type in:

a <return>

a <return>

b <return>

b <return>

c <return>

c <return>

<return>

<return>

<return>

<return>

 

This puts a,a,b,b,c,c,'blank','blank','blank','blank' in the first 10 cells of column A.

 

Select column A and do 'Remove Duplicates' - leaves a,b,c and says "3 duplicate values found; 3 unique values remain" as expected.

 

Select column A again and do 'Remove Duplicates' again - looks the same, but says "2 duplicate values found; 4 unique values remain".

 

No matter how many times you repeat 'Remove Duplicates', it always says "2 duplicate values found; 4 unique values remain"!?!?!?!?!?

 

Weird :confused:

 

Just done that and got "3 duplicate values removed;3 unique values remain ":confused: So it appears to have worked correctly..? Using Excel 2016

Edited by truman

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.