Two external sorting ideas: sorting by merging & amp; sorting by distribution

Source: Internet
Author: User

Today I think about a problem. By the way, I think about external sorting and share it as follows:

Assuming that there are 16 numbers on the disk, and the memory can only hold four numbers for sorting, what is the difference between the sort of merging and the sort of the bucket? In fact, each has its own merits. Merge Sorting is undoubtedly the most commonly used and has a simple idea. However, it is difficult to give full play to the multi-core advantages in the last step. The bucket sorting method gives full play to the multi-core advantages, the difficulty lies in how to make the number of keywords entered by distributing in the bucket consistent. For details, refer to the book I previously recommended.

For ease of understanding, I gave an example:


Input: 1 4 2 8 12 11 3 6 9 40 28 27 21 7 6 3

Output:

 

Sorting by merge

(1) segmentation (2) Inner sorting (3) multiplexing (loser tree)

1 4 2 8 1 2 4 8

12 11 3 6 3 6 11 12

9 40 28 27 9 27 28 40 1 2 3 3 4 6 6 7 8 9 11 12 21 27 28 40

21 7 6 3 6 7 21

(1) Nothing actually happens.

Disk I/O count 0 times

Input: 1 4 2 8 12 11 3 6 9 40 28 27 21 7 6 3

Output:

(2)

Disk IO capacity 16x2 = 32 bytes (read 16 bytes, write 16 bytes)

Input: 1 2 4 8 3 6 11 12 9 27 28 40 3 6 7 21

Output:

(3)

Disk IO read 16x2 = 32 bytes

Input: 1 2 4 8 3 6 11 12 9 27 28 40 3 6 7 21

Output: 1 2 3 3 4 6 6 7 8 9 11 12 21 27 28 40

 

If there are four CPUs, four CPUs in the first and second stages can be involved in computing at the same time. In the third stage, up to two CPUs can be involved in computing, and one can be written from the beginning of output, one is written from the end of output, and the two ends are written in the middle until they are full. The file can be written between two ends in the memory ing mode.

 

 

Sorting by distribution

(1) sorting within a bucket (2) (after sorting, it is naturally ordered by the bucket order)

1 2 3 3 1 2 3 3

4 6 7 6 4 6 7

8 9 12 11 8 9 11 12

40 28 27 21 21 27 28 40

 

(1)

Disk IO capacity 16*2 = 32 bytes

Input: 1 4 2 8 12 11 3 6 9 40 28 27 21 7 6 3

Output: 1 2 3 3 4 6 7 6 8 9 12 11 40 28 27 21

(2)

Disk IO capacity 16*2 = 32 bytes

Input: 1 4 2 8 12 11 3 6 9 40 28 27 21 7 6 3

Output: 1 2 3 3 4 6 6 7 8 9 11 12 21 27 28 40

 

If there are 4 CPUs, 4 CPUs in the first and second stages can be involved in computing at the same time. But the difficulty is how to make the number of buckets even.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.