1 Topic Requirements:
The text file has this data and requires only 5 of these attributes, the following color tags
Data like the following reaches 750,000 groups:
1product/ productid:b0000uixz4 2product/Title:timex Link USB Watch3product / price:unknown 4review/Userid:a14mvg2i9ps6nz5review/profilename:b. Kuiper"Wah"6review/helpfulness:0/07review/score:5.08review/ time:12750912009review/summary:best Geek Weapon ever...but no longer made?10review/text:this Watch serves asMy brain and now, my brain isNo lo
2 coarse read based on Python
The code is as follows: No processing of the output, just a simple filter
Fo.write (); The place to note when writing to a file: 3. X is different from the type of 2.X write file
Write error: Typeerror:a Bytes-like object is required, not ' str '----------------------------------------------------------- --btest.decode (' Utf-8 ') #结果 ' ABCDE ' Strtest.encode (' utf-8 ') #结果b ' abc '
need = [' Product/productid: ', ' Product/price: ', ' review/helpfulness: ', ' Review/score: ', ' review/time: ']FO = open ("C:\\ users\\five\\desktop\\ new Folder \\python2.txt "," WB ") for line in open (" c:\\users\\five\\desktop\\ new Folder \\Watches.txt "): Flag = 0;for I in range (0,5): If Line.find (Need[i]) ==0:flag =1;break;if flag==1:fo.write ((line+ '). Encode (' utf-8 ')); Fo.close ();
The file is read in the following ways:
f = open ("Foo.txt") # Returns a file object line = F.readline () # Call the file's ReadLine () method while line : .... line = F.readline ()----------------------------------------------------for line in open ("Foo.txt"): ----------- -----------------------------------------f = open ("C:\\1.txt", "R") lines = F.readlines () #读取全部内容 for line in Lines Print Line
3 read-more based on C language
The results are read and processed as follows:
The way to prepare for knowledge reading
Fp=fopen ("Python.txt", "R"); FSCANF (FP, "%s", &s); printf ("%s\n", s), the inside is read by the space separately. The following is a read-by-line-------------------------------------- fgets (S,1028*8,FP); fgets (S,1028*8,FP) Read the length ratio = actual +1 (line break demarcation) printf ("%s", s),---------------------------------------- fscanf (FP, "%[^\n", &s);---------------- ---------------
Open in detail as follows:
For the use of the file has the following points: 1) file usage by the r,w,a,t,b,+ six characters, the meaning of each character is: R (Read): Read W (write): Write A (append): Append t: Text file, can be omitted not to write B (banary): Binary file +: Read and write meaning "RT" read-only open a text file, only allow read data "WT" only write open or create a text file, only allow write data "at" Append open a text file, and write the data at the end of the file "RB" read-only open a binary file, only read data "WB" Write only open or create a binary file, only allow write data "AB" Append open a binary file, and write the data at the end of the file "rt+" read and write open a text file, allow read and write "wt+" read/write open or create a text file, allow read and write "at+" read and write open a text file, allow reading, or append data at the end of the file "rb+" reads and writes open a binary file that allows read and write "wb+" to open or create a binary file that allows read and write "ab+" reads and writes open a binary file, allows reading, or appends data at the end of a file
Results of processing: (for Product/price:unknown This class of unknown 0 processing)
B000NLZ4A2 0 0/0 4.0 1260230400b000nlz4a2 0 0/0 4.0 1216339200b000nlz4a2 0 5.0 1245024000b000aio6ra 0 3/3 5.0 11224224 00b000aio6ra 0 0/0 4.0 1207958400b000nlz4am 0 2/2 4.0 1250208000b000nlz4am 0 2/2 5.0 1244764800b000nlz4am 0 2/2 5.0 124329 6000b000nlz4am 0 1/1 4.0 1235952000b000nlz4am 0 0/0 5.0 1236816000b000f70v0m 0 1/1 5.0 1189468800b000f70v0m 0 0/0 4.0 1244 678400b000f70v0m 0 0/0 5.0 1204502400b000f70v0m 0 0/0 5.0 1201478400 .... These are just a subset of the data.
The detailed code is as follows:
#include <stdio.h> #include <string.h>void getValue (char S[],char temp[]) {int end = strlen (s); int start = 0; int i =0,j=-1; char c; for (i=end-2;s[i]!= "; i--) {temp[++j]= s[i]; }//printf ("\ n"); temp[j+1]= ' + '; for (i=0;i<=j;) {c=temp[i]; TEMP[I]=TEMP[J]; Temp[j]=c; i++;j--; }}int Main () {FILE *fr,*fw; int data,count; long int sum=0; Char s[100000];//reads a row of data char temp[20];//intercepts a space after the value char s1[20],s2[20],s3[20],s4[20],s5[20];//requires 5 properties of value Char unkn Ow[]= "Unknown"; Char zero[]= "0"; Fr=fopen ("Watches.txt", "R"); Fw=fopen ("P.txt", "wt"); Count=1; while (Fgets (S,1028*80,FR)!=null) {//printf ("%s", s); if (count!=11) GetValue (s,temp); if (count==1) strcpy (s1,temp); else if (count==3) {strcpy (s2,temp); if (strcmp (s2,unknow) ==0) strcpy (S2,zero); } else if (count==6) strcpy (s3,temp); else if (count==7) strcpy (s4,temp); else if (count==8) strcpy (s5,temp); if (count==11) {fprintf (fw, "%s%s%s%s%s\n", S1,S2,S3,S4,S5); count=0; } sum++; count++; Fflush (FW); printf ("%ld\n", sum); } printf ("%ld", sum); Fclose (FW); printf ("Press any key to end!\n"); GetChar (); return 0;}
Python reads file data