The join,paste of the Linux Text Processing tool

Source: Internet
Author: User

Outline

1. What is join? What's the effect?

2. Join syntax format

3, Actual combat drills

4. Introduction of Paste command



1. What is join? What's the effect?

The most commonly used data file format in Linux is text format, using separators to distinguish different fields, such as colons (:), tabs, spaces, and so on. Like the common/etc/passwd and/etc/group two files are separated by the words ":". This text format can be regarded as a text database, which is convenient for people to read, also suitable for program processing, usually a column similar to the keywords in the database.

The join command is a command that merges data files according to the keyword (join lines of two files on a common field), similar to the relational database in which the tables are associated with the query.

Those of you who have had a relational database experience should know that when we combine multiple tables to query, we usually specify how the tables relate to each other, otherwise produce a Cartesian product. The following example shows how to join categories and the Products table through the CategoryID field:

SELECT CategoryName, Productnamefrom Categories INNER JOIN Productson Categories.CategoryID = Products.CategoryID;


It is easy to understand how the join command works if you understand the associated query for a relational database.


2. Join syntax format

Usage: join [option]... file1 file2for each pair of input lines  with identical join fields, write a line tostandard output.   The default join field is the first, delimitedby  whitespace.  when file1 or file2  (Not both)  is -, read  Standard input.  -a filenum        also print  unpairable lines from file FILENUM, where                       FILENUM  Is 1 or 2, corresponding to file1 or file2  -e empty           replace missing input fields  With empty  -i,&nbSp;--ignore-case  ignore differences in case when comparing fields   -j FIELD          equivalent to  ' -1 field -2 field '   -o FORMAT          obey FORMAT while constructing output line  -t CHAR            use char as input and output  field separator  -v filenum        like  -a filenum, but suppress joined output lines  -1 field           join on this FIELD of  File 1  -2 field          join on  this field of fIle 2 

The help document says this:

"For each pair of input lines with identical join fields, write a line to standard output.  The default join field is the first, delimited by whitespace. When FILE1 or FILE2 (not both) are-, read standard input. "

Join merges the data rows of two files according to common fields (keywords) and then sends them to standard output. The default is to separate the data with a blank (whitespace), and the default is the first field.

The following options are commonly used:

-T CHAR

Specify a delimiter, such as-t ': ' To use: As a delimiter, the default delimiter is blank [space +tab+] (whitespace).

-1 FIELD

Specify which field the 1th file uses to join

-2 FIELD

Specify which field the 2nd file uses to join

-A FileNum

Equivalent to outer JOIN

-O <fileno. Fieldno> ...

Where fileno=1 represents the 1th file, fileno=2 represents the 2nd file, Fieldno represents the field ordinal, numbering starts at 1. All outputs are output by default, but the keyword column is output only once.

For example:-O 1.1 1.2 2.2 Indicates the first field of the output first file, the second field, and the second field of the second file.


Join vs. SQL Association analogy:

INNER JOIN (INNER JOIN) format: Join <FILE1> <FILE2> LEFT join (ieft join, LEFT outer join, Ieft outer join) format: Join -A1 <FILE1> <FILE2> Right-join form: JOIN-A2 <FILE1> <FILE2> fully connected (full Joi) N, full outer join, fully outer join) format: JOIN-A1-A2 <FILE1> <FILE2>


3, Actual combat drills

Important:file1 and FILE2 must is sorted on the join fields.
It is important to note that before using join, the file is best sorted with sort.


To prepare the test data:

# head Month_en.txt month_zh.txt ==> month_en.txt <==1 January2 February3 March14 "Unknown" ==> month_zh.tx T <==1 January 22 months 33 months 13 "January or March, intentional"

A, INNER join

If the join does not specify any parameters, the default is to use whitespace as the delimiter, and the first field as the Join keyword, ignoring the line where the keyword does not match.

# Join Month_en.txt month_zh.txt 1 January January 2 February February 3 March March


b, left outer connection (outer join)

JOIN-A1 month_en.txt month_zh.txt 1 January January 2 February February 3 March March "Unknown"

This time the "Unknown" line is also shown, that is, the Month_en.txt does not match the line is also displayed.


c, right outer connection (outer join)

# JOIN-A2 Month_en.txt month_zh.txt 1 January January 2 February February 3 March March 13 "January or March, intentional"

This time the show "January or March, intentional" This line shows, that is, the Month_zh.txt does not match the line displayed.


D, fully connected (full join)

# JOIN-A1-A2 Month_en.txt month_zh.txt 1 January January 2 February February 3 March March 13 "January or March, intentional" "Unknown"

See, the effect is very obvious.


E, specify delimiter

We put/etc/passwd and/etc/shadow together.

# Head-n3/etc/passwd/etc/shadow[sudo] Password for yy: ==>/etc/passwd <==root:x:0:0:root:/root:/bin/bashdaemon :x:1:1:daemon:/usr/sbin:/usr/sbin/nologinbin:x:2:2:bin:/bin:/usr/sbin/nologin==>/etc/shadow <==root:! : 16212:0:99999:7:::d aemon:*:16177:0:99999:7:::bin:*:16177:0:99999:7:::

We see that all two of these files are delimited with ":" And we use the first field as the Join keyword

# join-t:/etc/passwd/etc/shadowroot:x:0:0:root:/root:/bin/bash:!:16212:0:99999:7:::d aemon:x:1:1:daemon:/usr/ Sbin:/usr/sbin/nologin:*:16177:0:99999:7:::bin:x:2:2:bin:/bin:/usr/sbin/nologin:*:16177:0:99999:7:::

OK, see it, is not very convenient. The concept of SQL Federated query is mainly used here.


4. Introduction of Paste command

What role does paste have? Paste simply sticks the rows of two files together, and the fields that make up the new row are tab-delimited by default. It is much simpler than the join command, and the join command has to be merged, based on the keywords associated with the two files, and paste is not considered, and is merged directly.

# paste Month_en.txt month_zh.txt 1 January 11 months 2 February 22 months 3 March 33 months "Unknown" 13 "Ten March, intentional "

See the difference between paste and join? Of course, you can use the-D option to specify the delimiter that makes up the new row.


This article is from the "Share Your Knowledge" blog, so be sure to keep this source http://skypegnu1.blog.51cto.com/8991766/1427158

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.