Grahamfff   10 #1 Posted October 23, 2018 I have a spreadsheet which consists only of a column A with people's names in it. There are 18,355 rows.  I have used Excel's remove 'duplicates' tool and get a message that 15,639 duplicates have been found and removed, and 2991 remain.  But 15,639 + 2991 does not equal 18,355, the number I started with.  Any idea why this could be?  Thanks Share this post Link to post Share on other sites Share this content via...
Cyclone   10 #2 Posted October 24, 2018 Some duplicates could be there 3 or more times. Share this post Link to post Share on other sites Share this content via...
truman   10 #3 Posted October 24, 2018 Some duplicates could be there 3 or more times.  How would that make the total removed plus the total remaining greater than the original line count? Or am I missing something simple? Share this post Link to post Share on other sites Share this content via...
nickycheese   10 #4 Posted October 24, 2018 (edited) Is it possible that you had a number of 'blank' cells (i.e. just spaces in the cell) after your 18,355 rows?  If you then use 'remove duplicates' on the whole column, the removal of all but one of these 'blank' cells will be included in the 'removed' total.  *edit : the cells don't have to have a blank character in, just pressing 'return' to go down to the next cell makes Excel think there's something there... Edited October 24, 2018 by nickycheese Share this post Link to post Share on other sites Share this content via...
truman   10 #5 Posted October 24, 2018 Is it possible that you had a number of 'blank' cells (i.e. just spaces in the cell) after your 18,355 rows? If you then use 'remove duplicates' on the whole column, the removal of all but one of these 'blank' cells will be included in the 'removed' total.  *edit : the cells don't have to have a blank character in the, just pressing 'return' to go down to the next cell makes Excel think there's something there...  If Excel thought there was something there then wouldn't these "extra" rows be included in the original count? Genuine question... Share this post Link to post Share on other sites Share this content via...
mabel22 Â Â 10 #6 Posted October 26, 2018 It might just be that you had more than two duplicates of one kind of data. Share this post Link to post Share on other sites Share this content via...
Cyclone   10 #7 Posted October 26, 2018 How would that make the total removed plus the total remaining greater than the original line count? Or am I missing something simple?  If number of duplicates is a count of values repeated at least once then it wouldn't be the same as the number of rows removed. Easy to test though, just set up a simple 1 2 2 2 3  run the tool and see what the count is for number of duplicates.  Edit - actually my explanation only makes sense for the combined number to be less than the total, not more. So, probably not the answer. Share this post Link to post Share on other sites Share this content via...
truman   10 #8 Posted October 26, 2018 If number of duplicates is a count of values repeated at least once then it wouldn't be the same as the number of rows removed. Easy to test though, just set up a simple 1 2 2 2 3  run the tool and see what the count is for number of duplicates.  Edit - actually my explanation only makes sense for the combined number to be less than the total, not more. So, probably not the answer.  Your edit is what I was trying to get at... I can't see how the number of removed rows plus the number of remaining ones is greater then the original total..the OP han't come back to reply so maybe we'll never know Share this post Link to post Share on other sites Share this content via...
nickycheese   10 #9 Posted October 26, 2018 If Excel thought there was something there then wouldn't these "extra" rows be included in the original count? Genuine question...  I'm guessing the value of his original "18,355 rows." was obtained by manually looking for the last value, rather than calculated by Excel.  Weird stuff - try this in a new workbook (Excel 2013) type in: a <return> a <return> b <return> b <return> c <return> c <return> <return> <return> <return> <return>  This puts a,a,b,b,c,c,'blank','blank','blank','blank' in the first 10 cells of column A.  Select column A and do 'Remove Duplicates' - leaves a,b,c and says "3 duplicate values found; 3 unique values remain" as expected.  Select column A again and do 'Remove Duplicates' again - looks the same, but says "2 duplicate values found; 4 unique values remain".  No matter how many times you repeat 'Remove Duplicates', it always says "2 duplicate values found; 4 unique values remain"!?!?!?!?!?  Weird Share this post Link to post Share on other sites Share this content via...
Grahamfff   10 #10 Posted October 26, 2018 OP here! Thanks for your help - I'll have a good look at this over the weekend! Share this post Link to post Share on other sites Share this content via...
truman   10 #11 Posted October 27, 2018 (edited) I'm guessing the value of his original "18,355 rows." was obtained by manually looking for the last value, rather than calculated by Excel. Weird stuff - try this in a new workbook (Excel 2013) type in: a <return> a <return> b <return> b <return> c <return> c <return> <return> <return> <return> <return>  This puts a,a,b,b,c,c,'blank','blank','blank','blank' in the first 10 cells of column A.  Select column A and do 'Remove Duplicates' - leaves a,b,c and says "3 duplicate values found; 3 unique values remain" as expected.  Select column A again and do 'Remove Duplicates' again - looks the same, but says "2 duplicate values found; 4 unique values remain".  No matter how many times you repeat 'Remove Duplicates', it always says "2 duplicate values found; 4 unique values remain"!?!?!?!?!?  Weird  Just done that and got "3 duplicate values removed;3 unique values remain ":confused: So it appears to have worked correctly..? Using Excel 2016 Edited October 27, 2018 by truman Share this post Link to post Share on other sites Share this content via...