[Z] external sorting for sorting large files in Disk

Source: Internet
Author: User

Most theoretical books on external sorting explain the external sorting based on tape. The theory can be used as the basis, while the disk sorting may be different.ArticleFor more information, see books, such as taocp. :-)

Http://exceptional-code.blogspot.com/2011/07/external-sorting-for-sorting-large.html

External sorting for sorting large files in Disk

Sorting is a fundamental programming task. Given the abundance of built-in libraries that perform tasks like sorting and binary search, we often become forgetful of exactly how these tasks are accomplished.

When the data is so large that it cannot be processed in memory at one
Time we need to resort to the file system to store part or all the data
During the sorting process. We then need to perform another layer
Disk operations on top of regular sorting algorithms to manage the data
As they get sorted.

External sorting is precisely the technique we described in the previous paragraph.

Let us describe in some detail how external sorting can be done in Java:

First the algorithm:

Say, we have one file (it can be more than one file, but having just one
File simplifies the process for authentication purpose) in Disk
Containing N numbers. And suppose the memory in our computer can hold m
Numbers at a time.

1. Start reading the input file from the beginning.
2. Read M (or less if number of entries remaining in the file is less
Than m) numbers from the file and store it into a temp buffer.
3. Sort (using any good sorting method-quicksort, for example) the numbers in the buffer stored in step 2.
4. Create a temp file in disk and write out the sorted numbers from step 3 to this temp file. Save the name Of the temp file.
5. Repeat Step 2 to 4 until all numbers from the input file has been read, sorted, and written out to temp files.

At this point, we have chunks of numbers of size M sorted and stored in
Temp files in disk. We need to merge all these sorted files into one
Single sorted file. We will apply the merging algorithm from merge sort
To join the numbers from these sorted files together.

6. Open all the temp files (and set the read pointer to the beginning of the files ).
7. Find the minimum number from the set of numbers currently pointed to by the file read pointer.
8. Write the number to disk. (To increase efficiency you cocould write
Number to a buffer first and then flush the Buffer out to disk when
Buffer is full. But modern I/O libraries shoshould be doing this anyway
For you ).
9. Read another number from the file that contained the minimum number at Step 7.
10. Repeat Step 7 to 9 until all numbers from all the temp files have been processed, merged, and written out to disk.

The new file in disk now contains a sorted list of the numbers supplied in the initial input file.

Happy external sorting!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.