Problem: It is necessary to pre-process a file of several GB and tens of millions of lines.
Function: splits records in a file into different files according to certain rules.
Advantage: It is really convenient to use shell scripts for text processing. If you use other advanced languages, you may not know how long it will take.
Disadvantage: single-process and single-thread text splitting may take a long time.
In this example, the format of the text record to be processed is:
Fore @ forest :~ /Work/ftr_m2_work/search/periphery/fullindex/script $ more./testawk.txt <br/> fileID1userIDstorageID2010-10-10 09: 00filename1filesizefsha <br/> fileID2userIDstorageID2010-10-13 09: 00filename2filesizefsha <br/>
Code:
#! /Bin/bash <br/> # Author: Fore <br/> # Date: 2010.10.13 <br/> # description: divide the records of a file into different files by date. <br/> # Set-X; <br/> If [$ #-ne 1]; then <br/> echo ". /devidebyday. SH/datafilepath/"<br/> exit 0 <br/> fi <br/> If [-F $1]; then <br/> middir = ". /Mid/"<br/> RM-RF $ middir <br/> mkdir-p $ middir </P> <p> logdir ="/data/logs/FTR/fullindex /"<br/> mkdir-p $ logdir <Br/> logpath = $ logdir "divide. log "</P> <p> I = 0 <br/> CAT $1 | while read line <br/> DO <br/> echo $ Line | cut-D/ -F4> TMP <br/> cat tmp | while read Dateline <br/> DO <br/> echo $ line> $ middir $ Dateline <br/> done </P> <p> left = 'expr $ I % 100' <br/> If [$ left-EQ 0]; then <br/> echo 'date' "" $ left> $ logpath <br/> fi </P> <p> I = 'expr $ I + 1' <br /> done <br/> else <br/> echo $1 "does not exist! "<Br/> fi
Output:
Fore @ forest :~ /Work/ftr_m2_work/search/periphery/fullindex/S <br/>-RW-r -- 1 fore 66. /Mid/2010-10-10 <br/>-RW-r -- 1 fore 66 2010-10-13. /Mid/2010-10-13 <br/>