The first case of programming pearl is related to a skillful solution to external sorting problems. The problem is cleverly solved, but the algorithm for external sorting by Merge Sorting mentioned at the beginning is worth a careful exploration. After all, it is not very deep in undergraduate course.
First, let's look at the simplest two-way Merge Sorting Algorithm in internal sorting.
The core operation of the algorithm is to combine the two adjacent sequential sequences in the one-dimensional array into an ordered sequence, and specify the sequence boundaries I, m, and N in the array, use two subscript variables to process them one by one from I and j = m + 1 respectively. First, compare them and write them to the current traversal subscript K of the result sequence, the corresponding subscript auto-increment continues to compare until the subscript of a sequence reaches the boundary, and then copies the remaining elements of another sequence to the result sequence.
Algorithms can be implemented recursively or recursively. from the neighboring elements, the preceding core operations are continuously called to form a long ordered sequence until the entire sequence is completed.
After the algorithm is merged, a complete, locally ordered new sequence is obtained. A total of n elements need to be merged by log2n, the comparison operation is performed n times each time (one value of the sequence is obtained once), and the new sequence is written to the result sequence space, before the next trip, copy the result sequence to the temporary space, and merge the next trip into the temporary space. Therefore, the time complexity is nlog2n. In addition to the original sequence space N and the result sequence space N, the space also needs to assist the temporary space N.
Next we will look at external sorting. External sorting refers to the sorting of large files, that is, the records to be sorted are stored in external storage, and files to be sorted cannot be loaded into memory at a time, data exchange between memory and external memory is required multiple times to sort the entire file. The most common algorithm for external sorting is multi-channel Merge Sorting, which breaks down the original file into multiple parts that can be loaded into memory at one time and transfers each part to the memory for sorting. Then, merge and sort sorted sub-files.
Multi-path merge sorting algorithms are involved in common data structures. From 2 to multi-channel (K), increasing K can reduce the read and write time of the external storage information, but the selection of the minimum records in K merge segments needs to be compared with the K-1, in order to obtain a sequential segment of U records (U-1) (k-1) times, if the number of merged records is s times, then the files of N records are discharged, the total number of comparisons performed during the internal merge process is S (n-1) (k-1), that is (rounded up) (logkm) (k-1) (n-1) = (rounded up) (log2m/log2k) (k-1) (n-1), and (k-1)/log2k with K increase and increase so the internal merge Time with K growth and growth, offset the time when external memory reads and writes are reduced. This does not work, leading to the "Loser tree ".
Use of loser. In the internal merging process, use the loser tree to reduce the number of times that the minimum record comparison is selected from K merging segments to (rounded up) (log2k) so that the total number of comparisons is (rounded up) (log2m) (n-1), regardless of K.
The loser tree is a complete binary tree, so the data structure can adopt a one-dimensional array. The number of elements is K leaf nodes, K-1 comparison node, a champion node A total of 2 K. Ls [0] is the champion node, ls [1] -- ls [k-1] is the comparison node, ls [k] -- ls [2k-1] is the leaf node (index B [0] -- B [k-1] points at the same time with another pointer ). In addition, BK is an additional auxiliary space, which does not belong to the loser tree and stores the Minkey value during initialization.
The process of multi-path Merge Sorting Algorithm is roughly as follows: first, the first element keywords in K merge segments are stored in the leaf node space of B [0] -- B [k-1, call createlosertree to create the loser tree. After the creation, the smallest keyword subscript (that is, the serial number of the merged segment) is saved to LS [0. Then keep repeating: Obtain the serial number of the merge segment from which the minimum keyword stored in LS [0] is Q, and output the first element of the merge segment to the ordered merge segment, then put the keyword of the next element into the leaf node B [Q] where the previous element was originally located, call adjust to adjust the loser tree following the leaf node B [Q] until the new smallest keyword is selected. Its subscript also exists in LS [0. Loop this operation until all elements are written to the sorted merging segment.
The pseudocode is as follows:
Void adjust (losertree & LS, int S)
/* Adjust the loser tree from leaf node B [s] to the root node's parent node ls [0 */
{Int T, temp;
T = (S + k)/2;/* t is the subscript of the parent node of B [s] in the loser tree. k is the number of merged segments */
While (T> 0)/* if it does not reach the root of the tree, continue */
{If (B [s]> B [ls [T])/* compares the data indicated by the parent node */
{/* Ls [T] records the part number of the loser. s indicates the new winner. The winner will be compared to the previous one */
Temp = s;
S = ls [T];
Ls [T] = temp;
}
T = T/2;/* return to the root of the tree and find the parent node */
}
Ls [0] = s;/* ls [0] records the field number of the minimum keyword of this trip */
}
Void k_merge (INT ls [k])
/* Ls [0] ~ Ls [k-1] is the internal comparison node of the loser tree. B [0] ~ B [k-1] store the current records of K initial merge segments respectively */
/* The get_next (I) function is used to read from the merging segment I and return the current record */
{Int B [k + 1), I, Q;
For (I = 0; I <K; I ++)
{B [I] = get_next (I);/* read the first keyword of K merge segments respectively */}
B [k] = Minkey;/* Create the loser tree */
For (I = 0; I <K; I ++)/* set the initial values of the losers in LS */
Ls [I] = K;
For (I = K-1; I> = 0; I --)/* from B [K-1]... B [0] Starting to adjust the loser */
Adjust (LS, I);/* the loser tree is created, and the minimum keyword number is saved to LS [0].
While (B [ls [0]! = Maxkey)
{Q = ls [0];/* Q is the merging segment of the current minimum keyword */
Prinftf ("% d", B [Q]);
B [Q] = get_next (Q );
Adjust (LS, q);/* Q: After adjusting the loser tree, select the new minimum keyword */
}
}
Finally, we will roughly describe the process of using multi-path Merge Sorting for external sorting: dividing large files into L segments based on limited memory resources, then read the L segments into the memory in sequence and sort each segment using an efficient internal Sorting Algorithm. The sorting result is that the initial sorted merging segment is directly written to the external storage file. Select an appropriate Sorting Algorithm for internal sorting, and take into account the auxiliary space required for internal sorting and limited memory space to determine whether to divide large files into several segments. Next, select the appropriate route number k to sort the merge segments in multiple ways. Each merge segment changes K merged segments into one large merge segment and writes the data to the file, after several rounds of merging, you can obtain the entire ordered file. In the multi-channel merge process, the memory space only needs to maintain a 2 k failure