19. Toad's data structure advanced 19 external ordering related concepts

Source: Internet
Author: User

19. Toad's data structure advanced 19 external ordering related concepts

This famous article:"a person most afraid of dishonest, young people most valuable is honest style." " honesty " is not self-deception, do not deceive others easy, do not deceive themselves the most difficult. " honest style " is down-to-earth, not to advantage. There is no cheap thing in the world, and whoever wants to take advantage of the water will suffer. -- educationist State "

The sort we learned before was sorted internally, and then we looked at the external sort.

Welcome reprint, Reprint please indicate source: http://blog.csdn.net/notbaron/article/details/47844037

1. External sorting

Refers to the process of sorting records in a large file (external memory file), that is, the records to be sorted are stored on the external storage, and in the sorting process, the exchange between the internal and the external memory is performed several times.

The external sort is basically composed of two relatively independent stages.

(1) First, according to the available memory size, the files containing N records on the external memory are divided into sub-files or segments of length l, read into memory sequentially, and sort them with a valid internal sorting method, and write back the ordered sub-files that are ordered to external memory. These ordered sub-files are usually referred to as merge segments or straight strings.

(2) Then, the merging segments are merged, so that the merging segment (ordered sub-pieces) gradually from small to large, until the entire ordered file.


1.1 Illustrative Examples

Suppose there is a file with 10,000 records, first 10 internal sorting to get 10 initial merge segment R1~r10, each of which contains 1000 records. They are then merged as shown in 22 until an ordered file is obtained.


Visible, from 10 initial merge segments to an ordered file, a total of four merges, each trip from the M-merge section to get [M/2] a merge segment. This merging method is called 2- The road is balanced and merged.

Suppose that, in the example above, each physical block can hold 200 records, then each merge requires 50 reads and 50 writes, four merges plus internal sorting requires a total of 500 reads/writes in the outer row to be read/write.

Generally, the total time required for external sorting =

The time required to internally sort (generate the initial merge segment) M * TIS

+ External Memory information read and write time d * TIO

+ Time required for internal merging S * UTMG

which

The number of initial merge segments obtained after M internal sorting;

TIS the mean of the time required to get an internal sort of an initial merge segment

D total number of reads/writes;

TIO the mean value of a external memory read/write time;

S merge the number of trips;

UTMG the time required to internally merge the U records.

Thus, the total time required for the row to be discharged using the 21-way merge for the previous example of 10,000 records is: 10* tis+500* TIO+4*10000TMG

Where TIO depends on the external memory device used, it is clear that tIO (the mean of a external memory read/write time) is much larger than the TMG (the time it takes to merge internally). Therefore, to improve the efficiency of the outer row should mainly focus on reducing the number of external memory information read and write D.

1.2 Analysis of the relationship between external memory of information read and write and "merging process"

If the 10 initial merge segments obtained in the above example are 5-way balanced (i.e., each trip merges 5 or 5 ordered sub-files into an ordered sub-file), from the visible, only 2 times to merge, the total number of reads/writes in the row is reduced to 2 x 100+ 100 = 300, less than 2-way merge Read/write 200 times.


It can be seen that, for the same file, the number of read/write external memory required for the outer row is proportional to the number of times of the merge and in general, the K- Way balancing of the m Initial merge segments is Number of merged trips

s = [km]

It can be seen that increasing k or reducing m will reduce s.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

19. Toad's data structure advanced 19 external ordering related concepts

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.