Linux next join command
Recently new online algorithm, intends to analyze the beginning of the book click-through rate fluctuations, the original process has a daily rate of each book's CTR data (file). Previously this was written code for different days of the merger, and later found that Linux directly under the Join command, can do similar things, but also very powerful, fast.
Join [-i][-a<1 or 2>][-e< string >][-o< format >][-t< character >][-v<1 or 2>][-1< field >][-2< field >][ --help][--version][file 1][file 2]
Common parameter Description:
-a<1 or 2>, in addition to displaying the original output, also displays lines in the instruction file that do not have the same field.
-e< string > If the specified field is not found in [file 1] and [File 2], the string in the selection is filled in the output.
-I or –igore-case compares the differences in the case when the contents of the column are compared.
-o< format > Displays the results in the specified format.
-t< character > Use the separator character of the field.
-v<1 or 2> is the same as-a, but only rows that do not have the same field in the file are displayed.
-1< fields > connections [File 1] Specify the fields.
-2< fields > connections [File 2] Specify the fields.
Help Display Help.
–version Displays version information.
Note:
1. The file must be sorted according to the key of the join. (Join should be through multi-merge, so the efficiency is higher)
2, the various join methods correspond to the following
INNER JOIN (INNER JOIN) format: Join < file 1> < file 2>
Left join (left JOIN, Zuo outer join) format: join-a1 < file 1> < file 2>
Right-join, right-to-outer-join, outer join format: JOIN-A2 < file 1> < file 2>
Fully connected (full join, all-out connection, outer join) format: join-a1-a2 < file 1> < file 2>
I already have the daily CTR file format is
Bookid PV CLICK CTR
Take the merger of December 20, and December 19 Ctr As an example, and according to the number 20th CTR 19th The most Down order, the command is as follows
Join-t ""-1 1-2 1-a 1-a 2-o 1.1-o 2.1-o 1.4-o 2.4-e "0″./ctr_1220./ctr1219 | Awk-f "" ' {print $0″ ' $4-$3} ' | Sort-rn-k 5
Linux Next Join command