Idea of finding duplicates

Source: Internet
Author: User

Let's take a look at two common questions about the number of duplicates.

1. There are 101 numbers, which are the numbers between [1,100]. One of them is repeated. How can we find the number of duplicates? What is the time complexity and space complexity?

2. 1-n (n is 32000 at the maximum and unknown), and the memory is only 4 K. Find the number of duplicates.

Find the number of duplicates, basicThere are four ideas,

  1. The first approach is to create a Boolean array and use the subscript of each bit to represent a certain number of values. The value of each bit is used to indicate whether the number is repeated. This approach is suitable for the case where the largest number of arrays is small. Once the maximum number is too large, the Boolean array we need to create will be too large. At this time, the space consumption will be very large and may exceed the memory demand.The time complexity is O (n), and the space complexity is unstable.
  2. The second approach is to perform hash or remainder operations and use conflicts to find the same number. This idea is simple, and it consumes little space and time, but it is difficult to implement it because it is necessary to determine the conflict and resolve the conflict.The time complexity is 0 (n), and the space complexity is 0 (n).
  3. The third approach is to use binary or quartile recursion to identify duplicate values based on high or low values. This method is applicable when the data volume is too large.The time complexity is 0 (nlogn), and the space complexity is 0 (1).
  4. In the fourth way, You can first perform heap sorting or fast sorting on the array, and then traverse the sorted array.The time complexity is 0 (nlogn), and the space complexity is 0 (n ).

First, let's take a look at the first question,
  1. To use the first method, we need to create a Boolean array B with a size of 100 and initialize it to false. Then we traverse the sorted array. If the values of the first four elements of the array are {, 53,}, then we set B [62] = true, B [53] = true, B [88] = true. When the fourth value is assigned, because B [62] has already been assigned a value, when the value is assigned again, it is determined that the value is true. Then, 62 is the number of duplicates we are looking.
  2. In the second way, assume that the length of array a to be processed is N, then we can create an object array B with the length of N, perform the remainder operation on N for each element in array A and save the element to the subscript B [I]. The linked list is used to resolve conflicts. When a conflict occurs, check whether the element exists in the conflicting position linked list. If yes, it indicates that it is a duplicate value.
  3. The third approach is to divide each value into two parts based on the maximum bit of each value in the array: 0 or 1. Then, the loop is recursive until the decimal bit is equal. Two equal numbers are duplicates.
  4. The fourth idea is to sort these 101 numbers in heap or fast, and then traverse the sorted array to get the same number.
Next, let's look at the second question.Because there are requirements on the space size, the solution is a little more complicated.

  1. First Thought: to create a Boolean array, each element occupies 1 bit, (1*32000)/(8*1024) = 3.9kb, exactly within 4 kb, therefore, the first approach is effective.
  2. The second approach is adopted: because the number of arrays is unknown, the feasibility cannot be determined (the number of elements cannot be processed if there are more than 1024 ).
  3. Using the third idea: assume that all the data is stored on the hard disk and one or more are read at a time (within 4 K memory ), the maximum or maximum two values are used for two or more points. Then cyclically recursive, and then 2 or more minutes of the array that has been divided, know to find the same number.
  4. The fourth idea is to use the array size. The feasibility cannot be determined because the array size is unknown (more than 1024 elements cannot be processed ).

By comparing these four ideas, each has its own advantages and disadvantages, and each has its own use cases. It must be flexible in actual use.

Idea of finding duplicates

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.