Sort the external sort

Source: Internet
Author: User

Sometimes, the file to be sorted is large, the computer memory can not accommodate the entire file, the file can not be used in the internal sorting (here to explain, in fact, all sorts are done in memory, and here the internal sorting refers to the content to be sorted in memory can be completed, The external sort means that the content to be sorted cannot be completed in memory at once, it needs to do the internal and external content exchange, and the sorting method used by the outer sort is also a merge sort, which consists of two different stages:

1, use the appropriate internal sorting method to sort each fragment of the input file, write the ordered fragment (become the merge segment) to the external memory (usually by an available disk as a temporary buffer), so that the content of each merge segment in the temporary buffer is ordered.

2, the merge algorithm is used to merge the first phase of the merged segment, until only one of the merge segments is left.

For example, to merge the 4,500 records in the external memory, and the memory size can only hold 750 records, in the first stage, we can read 750 records each time to sort, this can be divided into six reads, sorting, you can get six sequential merge segments , such as:

The size of each merge segment is 750 records, remember that these merge segments have all been written to the temporary buffer (which is played by an available disk), which is the first step in the sorting result.

What do you do when you finish the second step? This time the merge algorithm is useful, the algorithm is described as follows:

1, divides the memory space into three parts, each size 250 records, two are used as the input buffer, the other one serves as the output buffer. First, the segment_1 and Segment_2 are merged, first read 250 records from each merge segment into the input buffer, merge the results into the output buffer, when the output buffer is full, it is written in the temporary buffer, if an input buffer is empty, Then from the corresponding merge section to read 250 records to continue to merge, repeat the above steps, until segment_1 and segment_2 are all in order to form a record size of 1500, and then to Segment_3 and Segment_4, Segment_ 5 and Segment_6 do the same.

2, the merged with a good size of 1500 of the record as in step 1 the same operation, proceed to sort, until the end of the formation of a 4500-size merge segment, to the end of the sort.

You can use a diagram to represent the merging effect of the above algorithm:

The above is a brief summary of how the external sort is sorted using the merge algorithm, and the following issues need to be considered in order to improve the external ordering:

1. How to reduce the number of merge trips required for sorting.

2. If the program buffers are used efficiently, the input, output, and CPU operations can be overlapped as much as possible.

3, how to generate the initial merge segment (Segment) and how to merge the merge segment.

Sort the external sort

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.