Usage of Bitmap Data Structure

Source: Internet
Author: User

Massive Data Sorting Problem: The file contains 10 million phone number records (10x7 power), each record is a 7-digit integer, no repeated integer. Sort files. Note that only 1 MB of memory space is available and sufficient disk storage space is available. Design an efficient algorithm.
(1) Use the multi-byte algorithm: if each number is represented by a 32-bit integer, approximately 250000 numbers can be stored in 1 MB of storage space. Therefore, you can use a program that traverses 40 bytes of input files to complete sorting. In the first round, any integer between 0 and 249999 is read internally, sorted, and written to the output file. The second traversal is an integer between 250000 and 499999. In this way, the integers between 40th and 9750000 are sorted by 9999999 traversal.
(2) Use bitmap data structure: Use a 10 million-Bit String (about 1.25 MB) to represent this file. if and only if integer I exists in the file, the I-th digit is 1. First, set all bits to 0, so that the Set Initialization is empty. Then, a set is created by reading each integer in the file, and each corresponding bit is set to 1. Scan each bit. If this bit is 1, the corresponding integer is output, and an ordered output file is generated.

# Include <stdio. h> <br/> # define bitsperword 32 <br/> # define shift 5 <br/> # define mask 0x1f <br/> # define n 10000000 <br/> int [1 + N/bitsperword]; /* represents the bitvector of the 10 million integers */<br/>/* sets the bit where the integer I is located */<br/> void set (int I) {<br/>/* Each element in a can represent 32 integers. Therefore, the bit indicating integer I is a bit in element a [I/32, <br/> this bit is on the left of a [I/32], the more I & 5 (the last five digits of I represent an integer) location <br/> */<br/> A [I> shift] | = (1 <(I & Mask )); <br/>}< br/>/* clear the bit where the integer I is located */<br/> void CLR (int I) {<br/> [I> shift ~ (1 <(I & Mask )); <br/>}< br/>/* test whether the bitvector contains an integer I */<br/> int test (int I) {<br/> return a [I> shift] & (1 <(I & Mask); <br/>}< br/> int main (void) {<br/> int I; <br/> for (I = 0; I <n; ++ I) <br/> CLR (I ); <br/> while (scanf ("% d", & I )! = EOF) // enter the integer to be sorted <br/> set (I); <br/> for (I = 0; I <n; ++ I) <br/> If (test (I) <br/> printf ("% d/N", I); <br/> return 0; <br/>}

(3) Sorting Algorithm: uses the C standard library function qsort to quickly sort data.

# Include <stdio. h> <br/> # include <stdlib. h> <br/> int intcomp (int * X, int * Y) {<br/> return * X-* Y; <br/>}< br/> int A [10000000]; <br/> int main (void) {<br/> int I, n = 0; <br/> while (scanf ("% d", & A [n])! = EOF) // enter the integer to be sorted <br/> N ++; <br/> qsort (A, N, sizeof (INT), intcomp ); <br/> for (I = 0; I <n; ++ I) <br/> printf ("% d/N", a [I]); <br/> return 0; <br/>}

(4) Set-based Sorting Algorithm: use the set container in C ++ STL.

# Include <cstdio> <br/> # include <iostream> <br/> # include <set> <br/> using namespace STD; <br/> int main (void) {<br/> set <int> Sint; <br/> int I; <br/> set <int>: iterator J; <br/> while (CIN> I) <br/> Sint. insert (I); // inserts data in sequence. <br/> for (j = Sint. begin (); J! = Sint. End (); ++ J) <br/> cout <* j <"/N"; <br/> return 0; <br/>}

Massive Data Search: an ordered file contains 4 billion 32-bit integers randomly arranged to locate a 32-bit integer that is not in the file.
(1) bitmap technology: if you have enough memory, you can use bitmap technology. Uses a bitmap consisting of 536870912 8-bit bytes to represent the integer of the object. Scan the bitmap to find the missing integer.
(2) Sorting Technology: by sorting files quickly, we can find the missing integer. The total running time is proportional to nlogn.
(3) binary search technology: what if there are only several hundred bytes of memory and several temporary external sequential files available? Use the binary search technology and the multi-byte algorithm. Read 4 billion input integers over multiple times, write an integer whose start position is 0 to an ordered file, and write an integer whose start position is 1 to another ordered file. One of the two files contains a maximum of 2 billion integers. Next, the file is used as the current input and repeat the probe process. However, this is the second probe. If the original input file contains nelements, N integers are read for the first time, n/2 integers are read for the second time, and N/4 integers are read for the third time, we can find the missing integer. Therefore, the total running time is proportional to n.
Key algorithm design ideas: bitmap data structure, set data structure, multi-trip strategy, sorting policy, and binary search policy.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.