Sort with bitmaps without duplicate data set instance code (c + + Edition) _c language

Source: Internet
Author: User
Tags bitset
"Programming Pearls" (programming Zhuji Download) The first chapter describes how to use a bitmap to sort without duplicate data sets, the whole idea is very concise, today practice.

First, the main ideas

The idea of bitmap ordering is to request a contiguous space in memory as a bitmap, initially, each bit of the bitmap is set to 0, followed by an integer that reads the file to be sorted, the bit that contains the integer to 1, the last scan of the bitmap, and if one is 1, the number is present and output to the sorted file. For example, the data to be sorted S={3,0,4,1,7,2,5},max (S) = 7, we can set a eight-bit bitmap B, each bit of the bitmap will initially be 0, that is b=[0,0,0,0,0,0,0,0], for each integer d in S, set b[d]=1, that is, b=[ 1,1,1,1,1,1,0,1], the last scan bitmap, for each bit of the bitmap I, if b[i]==1, then output I to the sorted file, sorted s={0,1,2,3,4,5,7}.
The whole process only needs to be traversed to sort files and bitmaps, time complexity O (n), and the required auxiliary space is (max (S)/8) B. Although this sort algorithm can only run on a set of integers without duplicates, but for some needs, really efficient implementation, such as cell phone number sorting, mobile phone number 11, the first always 1, theoretically can have 10^10 number, but some numbers are not issued, that is, some numbers in the system does not exist, Assuming that there are 50% legal numbers in the system, each number is represented by a long int, so many numbers require space for 50%* (10^10) *4B=20GB and cannot be placed in memory for quick sorting. An alternative scenario is to divide the merge order into multiple trips, but it takes a long time. We apply for a bitmap 10^10 bit, the required memory is 10^10/8B=1.25GB, can be run on a contemporary PC, when scanning a bitmap, assuming that one I is 1, output file, before adding a 1, such as i= 3885201314, the output is 13885201314.

Second, the implementation of the algorithm

In the case of C language, you need to encapsulate bitmap operations, where you need to use three operations: set the bitmap of all bits 0 (Setallzero), set the specified bit to 1 (setone), see whether the specified bit is 1 (find), the code is as follows:

Copy Code code as follows:

#include <malloc.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <math.h>

#define MAX_NUM 16777216//The largest number, which is the desired bit
#define BYTE_NUM (1+MAX_NUM/8)//number of bytes
#define MASK 0x07

void Setallzero (unsigned char *p,long size);
void Setone (unsigned char *p,long loc);
int find (unsigned char *p,long loc);
BOOL getsorted (unsigned char *bitmap,char *filename);
BOOL SetBitmap (unsigned char *bitmap,char *filename);
int Bitmapsort ();
int main () {
return Bitmapsort ();
}
int Bitmapsort () {
unsigned char *bitmap; Bitmap pointers
Bitmap = (unsigned char *) malloc (byte_num*sizeof (unsigned char));
if (bitmap = = NULL) {
printf ("Malloc failed\n");
return-1;
}
Setallzero (bitmap,byte_num);//set bitmap all bits to 0
SetBitmap (Bitmap, "phoneNumber.txt")//scan for files, set bitmap corresponding bit to 1
getsorted (Bitmap, "BitmapSort.txt"); Scans the bitmap, outputting the bitmap to a 1-bit number to the file
Free (bitmap);/release bitmap
return 0;
}
/*********** set the bitmap for the data to be sorted **************/
BOOL SetBitmap (unsigned char *bitmap,char *filename) {
FILE *READFP;
printf ("Setting bitmap...\n");
READFP = fopen (FileName, "R");
if (READFP = NULL)
return false;
Long phonenum=0;
while (FSCANF (READFP, "%ld\n", &phonenum)!= EOF) {
Setone (bitmap,phonenum);//Set phonenum bit to 1
}
Fclose (READFP);
return true;
}
/***** sequentially traverses the bitmap output record, which enables sorting ****************/
BOOL getsorted (unsigned char *bitmap,char *filename) {
printf ("Search bitmap...\n");
FILE *WRITEFP;
WRITEFP = fopen (FileName, "w");
if (WRITEFP = NULL)
return false;
Long phonenum=0;
for (phonenum = 0; phonenum < max_num; Phonenum + + 1) {
if (find (Bitmap,phonenum)) {
fprintf (WRITEFP, "%ld\n", phonenum);
}
}
Fclose (WRITEFP);
return true;
}
/****** the bitmap first 0 ********/
void Setallzero (unsigned char *bitmap,long size) {
for (long i=0;i<size;i++)
* (Bitmap+i) &= 0;
}
/*************************************************
Place the specified position at 1
(loc>>3) is equal to the division 2^3=8, which is positioned to the number of bytes, mask=0x07,loc&mask equivalent to Loc%8
***************************************************/
void Setone (unsigned char *bitmap,long loc) {
* (bitmap+ (loc>>3)) |= (1<< (loc&mask));
}

/****** find whether the specified bit is 1********/
int find (unsigned char *bitmap,long loc) {
Return ((* (bitmap+ (loc>>3))) & (1<< (loc&mask)) = = = (1<< (loc&mask));
}

C + + STL has a data structure bitset, Operation bitmap is very convenient.

Copy Code code as follows:

#include <bitset>
#define The maximum number of max_num 4000000//, that is, the number of digits required
using namespace Std;

int main () {
FILE *READFP,*WRITEFP;
READFP = fopen ("PhoneNumber1.txt", "R");
WRITEFP = fopen ("BitsetSorted.txt", "w");
Bitset<max_num> bitmap;
for (long i=0;i<max_num;i++) {//bitmap first to 0
Bitmap.set (i,0);
}
printf ("Begin set bitmap...\n");
Long number = 0;
while (FSCANF (READFP, "%ld\n", &number)!= EOF) {
Bitmap.set (number,1);//Set number location to 1
}
printf ("Begin search bitmap...\n");
for (long i=0;i<max_num;i++) {
if (bitmap[i] = = 1)//To output bits of bit 1 to a sorted file
fprintf (WRITEFP, "%ld\n", number);
}
Fclose (WRITEFP);
Fclose (READFP);
}

The

Sorting algorithm is soon written, began to generate test data, want to generate 0-2^31 data set is really not easy, first of all to ensure that do not repeat, the second to lose 40% of the number (invalid mobile phone number), the third to as far as possible disorderly, pounded a long time, finally found a way to achieve, A 12GB dataset is generated, and the way to generate this dataset is welcome to discuss it, and I will summarize my approach in the next article.
The complete code can refer to GitHub.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.