Random row reading in files

Source: Internet
Author: User

1. How to read a file without knowing the number of rows? select and output a row randomly.

When we readI (I> 0)When1/IThe probability of selectingIAnd replace the original row.
That is, the first line is always selected and the probability1/2Select2Line, with probability1/3Select3And so on.
By the end of the file, the probability of each row being selected is equal.

 

 
# Include <stdio. h> # include <stdlib. h> # include <string. h> # include <time. h> # define max_line_len 4096int main () {srand (Time (null); const char * filename = "input.txt"; file * file = fopen (filename, "R "); char line_buffer [max_line_len]; char Selection [max_line_len]; int I = 1; while (fgets (line_buffer, max_line_len, file) {If (RAND () % I = 0) strcpy (selection, line_buffer); ++ I;} puts (selection); fclose (File); Return 0 ;}

Click here.
To
1Line, no problem, Skip.

To
2Line,
2The probability that a row is selected is
1/2, Then
1The probability that a row is selected is also
1/2.

To
3Line,
3The probability that a row is selected is
1/3, No
1Row and number
2The probability that a row is selected is
(1/2) * (2/3).

To
ILine,
1 ~ IThe probability that each row of a row is selected is
1/ITo the last line of the file.

 

2.How to read a file without knowing the number of rows, select and output k rows randomly (assuming that K is smaller than the total number of files)

Read1 ~ KSave the row and readILine, allK/IThe probability of replacing the previously storedKA row in the row.

 

 # include 
  
    # include 
   
     # include 
    
      # include 
     
       # define max_line_len 4096int main () {int K = 5; srand (Time (null); char line_buffer [max_line_len]; char ** selections = (char **) malloc (K * sizeof (char *); For (INT I = 0; I 
      
     
    
   
  

Click here.
Set
1 ~ I (I> = K)The probability of each row being selected is
K/IWhen we read
I + 1When
K/(I + 1)The probability of retaining the row, and randomly replace a saved row (the probability of each saved row being replaced is
1/K). In this way
I + 1The probability that a row is selected is
K/(I + 1), The probability of other rows being selected is

(
K/I) * (1-k/(I + 1) + (K/I) * (K/(I + 1) * (1-1/K),

The first is
I + 1When the row is not retained
I + 1When the row is retained and replaced, the final result is
K/(I + 1), So to
I + 1Until the row ends, the probability of each row being selected is still the same. It is also satisfied by the end of the file.

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.