Linux Next Join command

Source: Internet
Author: User

Linux next join command

Recently new online algorithm, intends to analyze the beginning of the book click-through rate fluctuations, the original process has a daily rate of each book's CTR data (file). Previously this was written code for different days of the merger, and later found that Linux directly under the Join command, can do similar things, but also very powerful, fast.

Join [-i][-a<1 or 2>][-e< string >][-o< format >][-t< character >][-v<1 or 2>][-1< field >][-2< field >][ --help][--version][file 1][file 2]
Common parameter Description:
-a<1 or 2>, in addition to displaying the original output, also displays lines in the instruction file that do not have the same field.
-e< string > If the specified field is not found in [file 1] and [File 2], the string in the selection is filled in the output.
-I or –igore-case compares the differences in the case when the contents of the column are compared.
-o< format > Displays the results in the specified format.
-t< character > Use the separator character of the field.
-v<1 or 2> is the same as-a, but only rows that do not have the same field in the file are displayed.
-1< fields > connections [File 1] Specify the fields.
-2< fields > connections [File 2] Specify the fields.
Help Display Help.
–version Displays version information.

Note:

1. The file must be sorted according to the key of the join. (Join should be through multi-merge, so the efficiency is higher)

2, the various join methods correspond to the following

INNER JOIN (INNER JOIN) format: Join < file 1> < file 2>

Left join (left JOIN, Zuo outer join) format: join-a1 < file 1> < file 2>

Right-join, right-to-outer-join, outer join format: JOIN-A2 < file 1> < file 2>

Fully connected (full join, all-out connection, outer join) format: join-a1-a2 < file 1> < file 2>

I already have the daily CTR file format is

Bookid PV CLICK CTR

Take the merger of December 20, and December 19 Ctr As an example, and according to the number 20th CTR 19th The most Down order, the command is as follows

Join-t ""-1 1-2 1-a 1-a 2-o 1.1-o 2.1-o 1.4-o 2.4-e "0″./ctr_1220./ctr1219 | Awk-f "" ' {print $0″ ' $4-$3} ' | Sort-rn-k 5

Linux Next Join command

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.