Working with duplicate data in Excel

Source: Internet
Author: User

When we are dealing with data, data duplication often causes a lot of trouble in the analysis, so the first important task of data collation is the weight, excel2007 above version of a delete duplicates is often convenient, but each point to the point is also very troublesome, the following we use the formula to deal with some duplicate data
One, "single row extract distinct value"
First, define the data column name
{=index (name, MATCH (, COUNTIF (e$1:e1, Name),)} dropdown
The general idea is that a dynamic region is formed according to E$1:e1, and countif sequentially determines the number of occurrences of the corresponding data in each region and forms an array of memory, then match determines where the 0 appears in the array, and the last index to complete the final reference. This formula is not long or difficult, there is a notable place is the use of match, its first argument is omitted, but the first argument of match is "required", which seems to violate the rules, in fact, because Excel on the omitted parameter defaults to 0, can participate in the calculation, it is not equivalent to "", The latter Excel considers it as a literal value and cannot participate in the calculation, so the match (, COUNTIF (e$1:e1, name),)) is equivalent to match (0,countif (e$1:e1, name)))
By the way: we often use COUNTIF to generate arrays to convert data to 1 and 0, for example, if you want to calculate the number of single-column duplicates, you can use {=sum (name, name)}
Second, "Multi-column extraction of non-repeating values"
{=indirect (TEXT (Right (COUNTIF ($D $1:d1, $A $ $B $8), 4^8,row ($2:$8) *100+column (A:B) *10001), 4), "R0C00"),) & ""} Drop Down
The formula is very long and complex, the data column in the A2:B8, extract columns in column D, the main idea is to use COUNTIF to generate an array of memory, and then the array is processed to achieve the goal, here note a "r0c00", which means that the text will be processed by the array into the cell's RC reference form , and then use indirect for reference
Iii. "Finding the corresponding information for duplicate values"

In the Product column to find out a product corresponding to all the order number, product and order number is a one-to-many relationship, the same product has multiple order number, can be implemented as follows formula:
{=index (B:b,small ($A $: $A $130= $C $1,row ($A $: $A $), 65536), ROW (1:1))) & ""} dropdown
According to the product name of the input in the C1, extract all of its corresponding order number, the approximate idea is to form an array according to the IF condition, the maximum value of this array is 65536, the remainder is a value based on the conditions of C1, and then use small to extract the position of the lowest value of row and index reference.

Working with duplicate data in Excel

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.