Text processing commands--cut, sort, join

Source: Internet
Author: User
Tags pear

Disclaimer: The following is only the common options for commands, if you need to learn more about the details of the command, you need to refer to other information.

First, cut

Cut is a selection command that analyzes a piece of data and takes out what we want. In general, the selection of information is usually for "line" to analyze, not the entire information analysis.

-D: Custom delimiter, the default delimiter is a tab.

-F: Specify which area to display

Example:

(my_python_env) [Email protected] ~]#Echo "a b c"|Cut-D' '-F1A (my_python_env) [[email protected]~]#Echo "a b c"|Cut-D' '-F1,2a B (my_python_env) [[email protected]~]#Echo "a b c"|Cut-D' '-F1-3a b c

Second, sort (reprint, original address: http://www.cnblogs.com/51linux/archive/2012/05/23/2515299.html)

The sort command works by sorting the text.

How the 2.1sort works

Sort compares each line of a file as a unit, comparing it from the first character backwards, to the ASCII value in turn, and finally outputting them in ascending order.

Example:

Cat seq  sortseq. Txtapplebananaorangepear

The-u option for 2.2sort

Removes duplicate rows from the output line.

Example:

Cat seq  sortseqsortseq. Txtapplebananaorangepear

Pear was ruthlessly removed by the-u option because of repetition.

The-r option for 2.3sort

Sort By default is ascending, and if you want to change to descending order, you need to use the-R

[Email protected] programming]$CatNumber.txt13524[email protected] programming]$SortNumber.txt12345[email protected] programming]$Sort-R Number.txt54321

-O option for 2.4sort

Because sort defaults to outputting the results to standard output, a redirect is required to write the results to a file, such as the sort filename > NewFile.

However, if you want to output the sorting results to the original file, redirection is not possible.

sort -R number.txt>cat  number.txt[[email protected] programming]$

Look, the number was emptied.

At this point, the-O option appears, which successfully solves this problem, allowing you to confidently write the results to the original file. This may also be the only advantage of the-o specific direction.

Cat Number.txt 1 3 5 2 4  Sort -R number.txt-cat  number.txt543  21

The-n option for 2.5sort

Have you ever encountered 10:2 small cases. I've met anyway. This occurs because the sorting program sorts the numbers by character, and the sorting program compares 1 and 2, which is obviously 1 small, so 10 is placed before 2. This is also the sort's consistent style.

If we want to change this situation, we need to use the-n option to tell sort, "Sort by value"!

[Email protected] programming]$CatNumber.txt1Ten + One25[email protected] programming]$SortNumber.txt1Ten One +25[email protected] programming]$Sort-N number.txt125Ten One +

The-t and-K options for 2.6sort

If there is a file with the contents of this:

Cat Facebook.txtbanana: $:5.5Apple:2.5pear:$2.3  Orange:3.4

This file has three columns, separated by a colon between the column and column, the first column indicates the fruit type, the second column represents the fruit quantity, and the third column represents the fruit price.
So I would like to sort by the number of fruits, that is, in the second column, how to use the sort implementation?
Fortunately, sort provides the-t option, after which you can set the spacer.
After you specify a spacer, you can use-K to specify the number of columns.

Sort 2 -t:facebook.txtapple::2.5Orange:3.4  Banana:5.5pear:2.3

We use the colon as the spacer, and we sort the numbers in ascending order for the second column, and the result is very satisfying.

The-F option for 2.7sort

-F converts lowercase letters to uppercase for comparison, i.e. ignores case

Extension of the-t and-K options for 2.8sort

The source files are:

Cat a   4500

The first domain is the company name, the second field is the number of companies, and the third field is the average employee wage.

I want this file to be sorted alphabetically by company, that is, sort by the first field: (This facebook.txt file has three domains)

Sort 1   A.   4500

I want facebook.txt to be sorted by company number.

Sort 2  4500     about

I want facebook.txt to be sorted according to the number of companies in the same number of employees in ascending order of average salary:

Sort 2 3  4500     about

Look, we've added a-k2-k3 to solve the problem. To drop, sort supports this setting, which is to prioritize the domain sort, first to sort by the 2nd field, and then to sort by the 3rd field if the same. (If you want to, you can always write this down, set a number of sort priorities)

I want facebook.txt to be sorted according to the employee's salary in descending order, if the number of employees is the same, the number of companies in ascending order: (This is a bit difficult)

Sort 2  4500     about

Here are some tips to take a closer look and secretly add a small letter R after the-K 3. Do you think, combined with our last article, can you get an answer? Announcement: The function of the R and-R options is the same, which means reverse order. Because sort is sorted by default in ascending order, it is necessary to add r here to indicate that the third field (employee average wage) is sorted in descending order. Here you can also add n, which means to sort the field by the numeric size, for example:

sort -t ' k 3nr-100  4500

Look, we removed the front-n option, but added it to each of the-K options.

Third, join

Function Description: Find two files, specify the same column content of the row, and merge, and then output to the standard output device. Joins are powerful, like joins in SQL. How join works. Here are two documents F i l E 1 and F i l e 2, of course already classified. There are elements in each file that are related to another file. Because of this relationship, join joins two files, which is a bit like modifying a master file to contain the common elements in two files.

Note: These two files must already be sorted on this column by the same rules.

Test file Preparation:

File1:

(my_python_env) [email protected] ~]# cat File1
AA 1 2
BB 2 3
CC 4 6
DD 3 3

File2:

(my_python_env) [email protected] ~]# cat File2
AA 2 1
BB 8 2
FF 2 4
CC 4 4
DD 5 5

3.1 The most basic usage

(my_python_env) [email protected] ~]# join File1 file2
AA 1 2 2 1
BB 2 3 8 2
Join:file 2 is not in sorted order

(my_python_env) [email protected] ~]# join File2 file1
AA 2 1 1 2
BB 8 2 2 3
Join:file 1 is not in sorted order

The default is to match fields in the first column of two files by default, with a space (unlimited number) as the delimiter.

3.2-j Options

The-J option specifies the domain used by the join of two files

 (my_python_env) [[email protected] ~]# join -j 1   File1 file2aa  1  2  2  1  bb  2  8  2  join : file  2  is not in  sorted order 
Join 2 file1 file2 2 3 1 Join file 1 inch Sorted Order Join file 2  in sorted order

3.3-1 FIELD-2 FIELD

Match in the Field field in File1 and in the Field field in File2

Join -12 -23  file1 file21222  384cc624cc6  cc4

3.4-o option,-e option, and-a option

-O: Used to specify the output format of the file, 1.2 represents the second column of the output first file

-E: Airspace processing

-A: followed by 1 or 2, the file 1 or file 2 does not match the line also displayed

(my_python_env) [Email protected] ~]#Join-O1.1-O1.2-O1.3-O2.1-O2.2-O2.3-E'Empty'-A1file1 FILE2AA1 2Aa2 1BB2 3Bb8 2cc 4 6 Empty Empty EmptyDD 3 3 Empty Empty EmptyJoin:file 2is notinchSorted order

3.5-v Options

-V and-A are similar, but only unmatched rows are displayed

Join 1 file1 file2 cc 4 6 DD 3 3 Join file 2  in sorted order

3.6join Standard input

Sometimes we need to join multiple files with the same format, and join accepts instructions for two files, so we can use the pipe and the character "-" to implement
Join File1 File2 | Join-file3 | Join-file4
This makes it possible to connect four files together.

Text processing commands--cut, sort, join

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.