Calculate the number of non-repeating elements in hundreds of millions

Source: Internet
Author: User

Question:

There are 0.25 billion unsigned integers (but in the file). You need to find out the number of non-repeated numbers (the number of numbers that only appear once). In addition, the available memory is limited to 600 mb, requiring efficient and optimal algorithms.

 

Ideas:

So many numbers cannot be read in the memory, so some processing is required. Imagine using a flag array, which contains true or false, to indicate whether a number is read for the first time. It is best to use this number as the subscript of the array to access this flag, for example, read 234432 and check whether flag [234432] is true or false. This is very convenient (this is not similar to the concept of hash ).

Well, now the main contradiction lies in how to define the flag array. Unsigned int, ranging from 0 to 2 ^ 32-1 (4 bytes ), make sure that the array is large enough to use the subscript 2 ^ 32-1 to access this number. True or false, so only one digit is enough. How big is the flag array:

2 ^ 32 bits, 2 ^ 29 byte, 2 ^ 19 kb, 2 ^ 9 m, and 512 M. The memory size is smaller than 600 mb.

 

Summary:

Use bits to indicate whether a number has exists.

Directly use numbers as the subscript of the bitarray for access.

Calculate the number of non-repeating elements in hundreds of millions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.