Sort 10 million data---programming Zhu Ji Nanxiong Second Edition Chapter One

Source: Internet
Author: User

The first chapter of this book presents a seemingly simple question, with up to 10 million different integer data in the hard disk's file, and how to sort it as quickly as possible with 1 m of memory.

Each number can be stored with 4BYTE,1M 250 000 data, obviously, as long as 250 000 data sorted each time, write to the file can be repeated 40 times.

So how do you choose 250,000 data for each traversal? There are two strategies:

1, 10 million data traversal 40 times, the first I traverse, the judge whether the number belongs to [i*250000,i*250000+249999), if yes, then read into memory, when the first traversal is completed, the internal

There are 250,000 data in the storage, which is larger than the previous i-1, but smaller than the 40-i traversal. Therefore, the first traversal is selected into the memory of the number of sorting, output to the hard disk file,

Append to i-1 output to that file.

Features: simple, rough, but the number of traversal, only serial, the file was read 40 times, the machine runs 2 minutes 17 seconds 214 milliseconds.

2, 10 million data traversal 1 times, Group I 250,000 data into memory, sort, output to file I, 10 million data after the completion of a traversal, the generation of 40 content of an orderly temporary file,

After merging these files, you can.

Features: Only one read in the source file, sorting can be multi-threaded, the number of Io is still high. In serial cases, this machine runs for 21 seconds and 221 milliseconds

So how can it reach within 10 seconds?

Then analyze the characteristics of the problem:

A, the data does not exceed the maximum value

B. All data is not duplicated

c, each piece of data is only one number

Then you can apply for a 10 million bit long bit vector, the subscript is the bit of I is 1, then there is a number I.

3. A bitmap sorting method for this problem is obtained.

Apply a bit vector length of 10 million bits bit[10000000], all bits set to 0, sequentially read the file to be sorted, each read into a number I, will bit[i] set to 1. When all of the data is read into the complete

Bit does the traversal from beginning to end, if bit[i]=1, then output I to the file, when the traversal is complete, the file is already sorted. This machine runs for 9 seconds and 49 milliseconds.

Note: There is no bit type of programming language to implement bit manipulation

Take the int type under 32-bit operating system as an example. If you need to apply n bits, you need to have a[n/32+1] int type to allow the next n bit (of course, some bits in the last int are wasted)

When the position I is 1, you can use the following actions:

The first bit must be in the number of int array A (I/32), the offset is (i%32), the I position is 1, requires the number of (I/32) and a number B phase or can be, B is required (I%32) bit is 1, the other bits are

is 0, so there is the following statement:

a[i/] | (1 << (i%

To ensure the fastest operating speed, the above rewrite is as follows:

A[i >>5] |= (1));

Sort 10 million data---programming Zhu Ji Nanxiong Second Edition Chapter One

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.