Linux shell handles text most commonly used tools for large inventory

Last Update:2016-04-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Guide

This article will cover the most common tools used by Linux to process text using the shell: Find, grep, Xargs, Sort, uniq, tr, cut, paste, WC, sed, awk, and the examples and parameters provided are the most common and practical. The principle I use with shell scripts is to write command lines, try not to exceed 2 lines, or consider Python if you have more complex task requirements.

Find File Lookup

Find txt and PDF files
Find txt and PDF files

Find. \ (-name "*.txt"-o-name "*.pdf" \)-print

Regular way to find. txt and PDF

Find. -iregex  #-iregex: Ignore case of regular

Negative parameters to find all non-txt text

Find. ! -name "*.txt"-print

Specify search Depth
Print out the current directory file (depth is 1)

Find. -maxdepth 1-type F

Custom Search
Search By Type:
-type f File/L Symbolic link

List all directories only

Search by Time:
-atime access Time (in days, minutes units is-amin, similar to the following)
-mtime modification Time (content modified)
-ctime Change Time (metadata or permission changes)
All files that have been visited in the last 7 days:

Find. -atime 7-type F-print

Search by Size:
W-word k M G, looking for files larger than 2k

Find. -type f-size +2k

Search by permissions:

Find all files that have executable permissions

Search by User:

Find the files that the user Weber owns

Follow-up action found after
Delete all SWP files in the current directory:

Find. -type f-name "*.SWP"-delete

Perform actions (powerful exec)

Change ownership in the current directory to Weber

Note: {} is a special string, for each matching file, {} will be replaced with the corresponding file name;
Eg: Copy all the found files to another directory:

Find. -type f-mtime +10-name "*.txt"-exec cp {} old \;

Combine multiple commands
Tips: If you need to execute multiple commands later, you can write multiple commands into a script, and then execute the script-exec the call.

-exec./commands.sh {} \;

-print delimiter, using ' \ n ' as the delimiter of the file by default;
-print will add a carriage return newline character after each output, and-print0 will not, use ' \ n ' as the file delimiter, so you can search for files containing spaces;
In the current directory, the files are sorted from large to small (including hidden files), and the file name is not "." ：

Find. -maxdepth 1! -name "."-print0 | xargs-0 Du-b | Sort-nr | head-10 | nl

grep Text Search

grep match_patten File//default access matching line
Common Parameters-O output only matched lines of text vs-v only output lines with no matching text
The number of times the text is contained in the-C statistic file

Grep-c "text" filename

-N Prints matching line numbers
-I ignore case when searching
-L print File name only
Recursive search of text in a multilevel directory (the programmer searches for code favorites):

grep "Class". -r-n

Match multiple patterns

Grep-e "Class"-E "vitural" file

grep outputs the file name of the trailing character as: (-Z)

grep "Test" file*-lz| xargs-0 RM

xargs command-line argument conversions

Xargs can convert input data into command-line arguments for a particular command, so that it can be combined with a number of commands. Like grep, like find;
Convert multi-line output to single-line output

Cat file.txt| Xargs

\ n is a delimiter between multiline text to convert a single line to multiple lines of output

-N: Specify the number of fields to display per row

Xargs parameter Description
-D defines delimiters (the delimiter is \ n for multiple lines by default)
-n Specifies that the output is multiple lines
-I {} Specifies the replacement string, which will be replaced when the xargs extension is used, when multiple parameters are required for the command to be executed eg:

Cat File.txt | Xargs-i {}./command.sh-p {}-1

-0: Specify as input delimiter eg: count the number of program lines

Find source_dir/-type f-name "*.cpp"-print0 |xargs-0 wc-l

Sort Sorts

Field Description:
-N Sort by number VS
-D Sort by dictionary order
-R Reverse Order
-k n Specifies sorting by nth column
eg

Ignore leading whitespace characters such as spaces

Uniq duplicate row de-duplication row

Sort Unsort.txt | Uniq

Count the number of times each line appears in a file

Sort Unsort.txt | Uniq-c

Find duplicate rows

Sort Unsort.txt | Uniq-d

You can specify the duplicates that need to be compared in each row:-S start position-W comparison character number

Convert with TR

General usage

//Add decryption conversion, replace the corresponding character //tab to space

TR Delete character

Cat File | Tr-d ' 0-9 '//Delete all numbers

-C Seeking complement set

Cat File | Tr-c ' 0-9 '//Get all the numbers in the file  //Delete non-digital data

TR compression characters
Tr-s repeating characters that appear in compressed text; most commonly used to compress extra spaces

Cat File | Tr-s "

character class
Various character classes are available in TR:
Alnum: Letters and Numbers
Alpha: Letters
Digit: Digital
Space: white space characters
Lower: lowercase
Upper: Uppercase
Cntrl: Control (non-printable)
Character print: printable character

How to use: TR [: Class:] [: Class:]

Eg:tr ' [: Lower:] ' [: Upper:] '

cut split text by column

Intercept the 2nd and 4th columns of the file:

cut-f2,4 filename

Go to all columns except column 3rd of the file:

CUT-F3--complement filename

-D Specify delimiter:

Cat-f2-d ";" FileName

Range of cut and take
N-nth field to end
-M 1th field mn-m N to M Fields
Cut-to-take units
-B in bytes
-C in Characters-f in fields (using delimiters)

//Print a first to 5th character //Print the first 2 characters

Paste stitching text by column

Stitch two text together by column;

Cat File112cat file2colinbookpaste file1 file21 colin2 Book

The default delimiter is a tab character, which can be specified with-D
Paste File1 file2-d ","
1,colin
2,book

Tools for WC statistics lines and characters

//Count rows

//Statistics of words

Statistics number of characters

sed text Replacement tool

First place replacement

Replace the first matching text of each line

Global substitution

Seg ' s/text/replace_text/g ' file

After the default substitution, output the replaced content, if you need to replace the original file directly, use-I:

Seg-i ' s/text/repalce_text/g ' file

To remove a blank line:

Sed '/^$/d ' file

Variable conversions
The matched string is referenced by the tag &.

echo this is en example | Seg ' s/\w+/[&]/g ' ___fckpd___37gt; [This]  [IS] [En] [Example]

SUBSTRING matching tag
The first matching parenthesis content is referenced using the tag \ One

Sed ' s/hello\ ([0-9]\)/\1/'

Double quotation mark Evaluation
Sed is usually quoted as a single quotation mark, or double quotation marks are used, and double quotation marks are used to evaluate an expression:

Sed ' s/$var/hlloe/'

When using double quotes, we can specify variables in the SED style and in the replacement string;

P=pattenr=replacedecho "line con a patten" | Sed "s/$p/$r/g" ___fckpd___40gt;line con a replaced

Other examples
String insertion character: Converts each line of content in the text (Peksha) to Pek/sha

Sed ' s/^.\{3\}/&\//g ' file

awk Data Flow processing tool

AWK script Structure

awk ' begin{statements} statements2 end{statements} '

Working style
1. Execute the statement block in begin;
2. Read a line from the file or stdin, and then execute statements2, repeating the process until the file is fully read; 3. Execute the end statement block;
Print prints the current line when print is currently exercised with no parameters;

Echo-e "Line1\nline2" | awk ' Begin{print ' "start"} {print} end{print "END"} '

When print is separated by commas, the parameters are bounded by spaces;

echo | awk ' {var1 = ' v1 '; var2 = "V2"; var3= "V3"; \print var1, Var2, var3;} ' ___FCKPD___43GT;V1 V2 v3

Use the-stitching method ("" as the stitching character);

echo | awk ' {var1 = ' v1 '; var2 = "V2"; var3= "V3"; \print var1 "-" var2 "-" VAR3;} ' ___fckpd___44gt;v1-v2-v3

Special variables:
NR NF $ $2NR: Indicates the number of records, in the course of the implementation of the forward number;
NF: Indicates the number of fields, the total number of fields that should go forward during the execution;
$: This variable contains the text content of the current line during execution;
$: The text content of the first field;
$: The text content of the second field;

Echo-e "line1 f2 f3\n line2 \ Line 3" | awk ' {print NR ': ' $ '-' $ '-' $ '

Print the second and third fields of each line:

awk ' {print $, $ $} ' file

Number of rows in the statistics file:

awk ' END {print NR} ' file

Accumulate the first field of each row:

Echo-e "1\n 2\n 3\n 4\n" | awk ' begin{num = 0;  print "Begin";} {sum + = $;} END {print "= ="; Print sum} '

Passing external variables

var=1000 Echo | awk ' {print Vara} ' vara= $var # input from Stdinawk ' {print Vara} ' vara= $var file # input from files

Filtering the lines that awk handles with Styles awk ' NR < 5 ' #行号小于5
awk ' nr==1,nr==4 {print} ' file #行号等于1和4的打印出来
awk '/linux/' #包含linux文本的行 (can be specified with regular expressions, super powerful)
awk '!/linux/' #不包含linux文本的行设置定界符使用-F to set delimiters (default to spaces)

Awk-f: ' {print $NF} '/etc/passwd

The read command output reads the output of the external shell command into the variable cmdout using Getline;

echo | awk ' {"grep root/etc/passwd" | getline cmdout; print Cmdout} '

Using loops in awk

for (i=0;i<10;i++) {print $i;} For (i in array) {print array[i];}

Print lines in reverse order: (Implementation of the TAC command)

Seq 9| \awk ' {LIFO[NR] = $; Lno=nr} \end{for (; lno>-1;lno--) {print Lifo[lno];}} ‘

AWK implements head, tail commands

Head:  awk ' nr< =10{print} ' filename

Tail:  awk ' {buffer[nr%10] = $;} End{for (i=0;i<11;i++) {\  print buffer[i%10]}} ' filename

Print the specified column
The awk approach implements:

LS-LRT | awk ' {print $6} '

Cut Mode implementation

LS-LRT | Cut-f6

Print the specified text area
Determine line number

Seq 100| awk ' Nr==4,nr==6{print} '

Determine text
Print text that is between Start_pattern and End_pattern;

awk '/start_pattern/,/end_pattern/' filename

EG:SEQ 100 | awk '/13/,/15/' cat/etc/passwd| awk '/mai.*mail/,/news.*news/'

awk common built-in functions
Index (string,search_string): Returns the position search_string appears in string sub (regex,replacement_str,string): Replace the first content of the regular match with Replacement_str;match (regex,string): Checks whether the regular expression matches the string;
Length (String): Returns the string length

echo | awk ' {"grep root/etc/passwd" | getline cmdout; print length (cmdout)} '

printf, similar to the C language, formats the output

Seq 10 | awk ' {printf '->%4s\n ', ' $ '

Iterate over lines, words, and characters in a file
1. iteration file for each line while loop method

While read Line;doecho $line;d One < file.txt to sub shell:cat file.txt | (While read line;do echo $line;d one)

Awk method:

Cat file.txt| awk ' {print} '

2. Iterate through each word in a row

For word in $line;d o echo $word;d one

3. Iterate through each of the characters
${string:start_pos:num_of_chars}: Extracts a character from a string; (bash text slices)
${#word}: Returns the length of a variable word

For ((i=0;i< ${#word};i++) Doecho ${word:i:1);d One

Free to provide the latest Linux technology tutorials Books, for open-source technology enthusiasts to do more and better: http://www.linuxprobe.com/

Linux shell handles text most commonly used tools for large inventory

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More