A method of sorting large files

Source: Internet
Author: User
Tags assert

Requirement: A file contains several words, one for each line, requiring that the words in the file be sorted in dictionary order.

Analysis: Because the size of a file may exceed the memory size, it is unrealistic to want to read the entire file into memory at once and then sort it out. Of course, to deal with this problem can be merged: the large file split into several can be read into the memory of small files, and then the small files sorted and then merged. But here's another way to do this: swap the memory space for disk space and bubble sort inside the file.

Algorithm: Read the 1th, 22 words in the file, if the 1th is larger than the 2nd, then exchange two words in the file position, the word directly back to the file. Read the 3rd word again and compare the size of the 2nd and 3 words, and so on. The largest word after the 1th round is sorted to the end of the file, and the entire file is sorted after the n-1 wheel. The implementation code is as follows:

 void Sortfile (file* fp, unsigned int wordnum, unsigned int maxwordlen) {long curwordpos, nextwordpos;
 char * Curword = (char*) malloc (maxwordlen+2);
 char * Nextword = (char*) malloc (maxwordlen+2);
 int err;

 ASSERT (Curword!= null && nextword!= null);
  for (unsigned int i = 0; i < wordNum-1; i++) {curwordpos=0;
  Err = fseek (FP, curwordpos,seek_set);

  ASSERT (!err);
  Fgets (Curword, Maxwordlen, FP);
  for (unsigned int j = i; J < WordNum-1; J + +) {Nextwordpos = Curwordpos+strlen (Curword);
  Err = fseek (FP, nextwordpos,seek_set);

  ASSERT (!err);
  Fgets (Nextword, Maxwordlen, FP);
  if (Stringcompare (Curword, Nextword) > 0) {err = fseek (FP, curwordpos,seek_set);

  ASSERT (!err);
  Curwordpos + + strlen (Nextword);
  Fputs (NEXTWORD,FP);
  Fputs (CURWORD,FP);
  else {Curwordpos + = strlen (Curword);
  strcpy (Curword, Nextword);
 }} free (Curword);
Free (Nextword); }

Before using this function, you should be able to easily get the number of words in the file (wordnum) and the length of the longest word in the file (Maxwordlen). The Fseek function is required for reading and writing in a specific location in a file. In addition, you need to open the file in read-write mode (fopen-time parameter to use "r+"). Before opening a file, you need to set the file read/write mode to binary (call _set_fmode (_o_binary)) to avoid merging "/r/n" into the C function library.

Since it is "disk space" for memory space, then the efficiency will not be reduced. The answer is no. Understanding how the file system works will make it clear that fputs actually writes the contents of the file to the system memory space and still operates in memory while the file is being read and written. Therefore, it should be more accurate to call this method to swap the memory space of the user with the system memory space. Moreover, this method uses only two words in the user's memory space, and the opposite efficiency is higher than the merging algorithm.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.