LinuxShell text processing tool highlights

Source: Internet
Author: User
Tags character classes printable characters
In Linux, unlike windows, windows can be used for window operations. it consists of many different commands. This article describes the most common tools used in Linux to process text using Shell: find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, awk

The examples and parameters provided are the most common and practical;
The principle I use for shell scripts is to write a single line of commands. try not to exceed 2 lines;
If you have more complex task requirements, consider python;

Find File Search

• Find txt and PDF files

Copy codeThe code is as follows: find. \ (-name "*. txt"-o-name "*. pdf" \)-print
• Regular expression search for .txt and pdf


Copy codeThe code is as follows: find.-regex ". * \ (\. txt | \. pdf \) $"

-Iregex: ignore case-insensitive regular expressions.
• Negative parameters
Search for all non-txt text

Copy codeThe code is as follows: find .! -Name "*. txt"-print
• Specify search depth
Print the file in the current directory (depth: 1)

Copy codeThe code is as follows: find.-maxdepth 1-type f
Custom Search
• Search by type:


Copy codeThe code is as follows: find.-type d-print // only list all directories

-Type f file/l symbolic link
• Search by time:
-Atime access time (unit: Day, unit:-amin, similar to the following)
-Mtime modification time (content modified)
-Ctime change time (metadata or permission change)

All files accessed in the last seven days:

Copy codeThe code is as follows: find.-atime 7-type f-print

• Search by size:
W words k M G
Search for files larger than 2 k


Copy codeThe code is as follows: find.-type f-size + 2 k

Search by permission:


Copy codeThe code is as follows: find.-type f-perm 644-print // find all files with executable permissions

Search by User:


Copy codeThe code is as follows: find.-type f-user weber-print // find the file owned by the user weber

Subsequent actions after finding

• Delete:

Delete all swp files in the current directory:


Copy codeThe code is as follows: find.-type f-name "*. swp"-delete

• Execute actions (powerful exec)


Copy codeThe code is as follows: find.-type f-user root-exec chown weber {}\; // change the ownership in the current directory to weber

Note: {} is a special string. for each matching file, {} is replaced with the corresponding file name;
Eg: copy all the files found to another Directory:


Copy codeThe code is as follows: find.-type f-mtime + 10-name "*. txt"-exec cp {} OLD \;

• Combined with multiple commands
Tips: If you need to execute multiple commands later, you can write multiple commands as one script. -Execute the script when exec is called;


Copy codeThe code is as follows:-exec./commands. sh {}\;

-The identifier of the print
'\ N' is used by default as the delimiter of the file;
-Print0 uses '\ 0' as the delimiter of the file to search for files containing spaces;

Grep text search
Grep match_patten file // Default Access matching row

• Common parameters
-O only outputs matched text lines VS-v only outputs unmatched text lines
-C: The number of times the statistics file contains text


Copy codeThe code is as follows: grep-c "text" filename

-N: print the matched row number.
-I case insensitive during search
-L print only the file name
• Recursive text search in multi-level directories (a programmer's favorite search code ):


Copy codeThe code is as follows: grep "class".-R-n

• Match multiple modes


Copy codeThe code is as follows: grep-e "class"-e "vitural" file

• Grep output the name of the file ending with \ 0: (-z)


Copy codeThe code is as follows: grep "test" file *-lZ | xargs-0 rm

Xargs command line parameter conversion

Xargs can convert input data into command line parameters of a specific command. in this way, it can be used together with many commands. Such as grep, such as find;

• Convert multi-row output to single-row output
Cat file.txt | xargs
\ N is the delimiter between multiple texts.
• Convert a single row into multiple rows of output
Cat single.txt | xargs-n 3
-N: specifies the number of fields displayed in each row.
Xargs parameter description
-D defines the delimiters (by default, the delimiters of multiple lines with spaces are \ n)
-N: specify the output as multiple rows.
-I {} specifies the replacement string, which will be replaced during xargs extension. when multiple parameters are required for the command to be executed
Eg:


Copy codeThe code is as follows: cat file.txt | xargs-I {}./command. sh-p {}-1

-0: specify \ 0 as the input delimiter.
Eg: count the number of program lines


Copy codeThe code is as follows: find source_dir/-type f-name "*. cpp"-print0 | xargs-0 wc-l

Sort sorting

Field description:
-N sort by Number VS-d sort by lexicographically
-R reverse sorting
-K N indicates sorting by column N
Eg:


Copy codeThe code is as follows:
Sort-nrk 1 data.txt
Sort-bd data // ignore leading blank characters such as spaces

Uniq eliminates duplicate rows
• Eliminate duplicate rows


Copy codeThe code is as follows: sort unsort.txt | uniq

• Count the number of times each row appears in a file


Copy codeThe code is as follows: sort unsort.txt | uniq-c

• Identify duplicate rows


Copy codeThe code is as follows: sort unsort.txt | uniq-d

You can specify the repeated content to be compared in each line:-s start position-w comparison character count
Conversion using tr

• Common usage


Copy codeThe code is as follows:
Echo 12345 | tr '0-9 ''123' // encryption/decryption conversion, replace the corresponding character
Cat text | tr' \ t''' // Convert a tab to a space

• Tr delete characters


Copy codeThe code is as follows: cat file | tr-d '0-9' // delete all numbers

-C: Complement


Copy codeThe code is as follows:
Cat file | tr-c '0-9' // Obtain all numbers in the file
Cat file | tr-d-c '0-9 \ n' // delete non-numeric data

• Tr compressed characters
Duplicate characters in tr-s compressed text; most commonly used to compress redundant spaces


Copy codeThe code is as follows: cat file | tr-s''

• Character classes

Various character classes are available in tr:
Alnum: letters and numbers
Alpha: letter
Digit: number
Space: White space
Lower: lower case
Upper: uppercase
Cntrl: Control (non-printable) characters
Print: Printable characters
Usage: tr [: class:] [: class:]


Copy codeThe code is as follows: eg: tr '[: lower:]' [: upper:]'

Cut Split text by column
• 2nd and 4th columns of the captured File:

Copy codeThe code is as follows: cut-f2, 4 filename
• Remove all columns except 3rd columns from the file:

Copy codeThe code is as follows: cut-f3 -- complement filename
•-D specifies the delimiters:

Copy codeThe code is as follows: cat-f2-d ";" filename
• Cut range
N-the nth field to the end
-M: 1st fields are M.
N-M N to M fields
• Unit of cut
-B is in bytes.
-C in characters
-F is in the unit of fields (using delimiters)

• Eg:


Copy codeThe code is as follows:
Cut-C1-5 file // print the first to five characters
Cut-C-2 file // print the first 2 characters

Paste concatenates text by column
Concatenates two texts by column;


Copy codeThe code is as follows:
Cat file1
1
2

Cat file2
Colin
Book

Paste file1 file2
1 colin
2 book

The default delimiter is a tab. you can use-d to specify the delimiter.
Paste file1 file2-d ","
1, colin
2, book

Wc statistical line and character tools
Wc-l file // count the number of rows
Wc-w file // count words
Wc-c file // Number of characters

Sed text replacement tool
• First Place replacement


Copy codeThe code is as follows: seg's/text/replace_text/'file // replace the first matched text in each line

• Global replacement


Copy codeThe code is as follows: seg's/text/replace_text/g' file

After replacement by default, the replaced content is output. if you need to replace the original file directly, use-I:

Copy codeThe code is as follows: seg-I's/text/repalce_text/g' file
• Remove blank rows:

Copy codeThe code is as follows: sed '/^ $/d' file
• Variable conversion
Matched strings are referenced by tag &


Copy codeThe code is as follows:
Echo this is en example | seg's/\ w +/[&]/g'
$> [This] [is] [en] [example]

• Substring matching tag
The content of the first matching bracket is referenced by Mark \ 1.

Copy codeThe code is as follows: sed's/hello \ ([0-9] \)/\ 1 /'
• Double quotation marks
Sed is usually referenced by single quotes. you can also use double quotation marks. after double quotation marks are used, double quotation marks evaluate the expression:


Copy codeThe code is as follows: sed's/$ var/HLLOE /'

When using double quotation marks, we can specify variables in the sed style and replacement string;


Copy codeThe code is as follows:
Eg:
P = patten
R = replaced
Echo "line con a patten" | sed "s/$ p/$ r/g"
$> Line con a replaced

• Other examples
String insertion character: converts each line of content (PEKSHA) in the text to PEK/SHA

Copy codeThe code is as follows: sed's/^. \ {3 \}/& \/g' file

Awk data stream processing tool

• Awk script structure

Awk 'In in {statements} statements2 END {statements }'

• Work style

1. execute the begin statement block;
2. read a row from a file or stdin, and then execute statements2. Repeat this process until all the files are read;
3. execute the end statement block;

Print current row

• When print without parameters is used, the current row is printed;

Copy codeThe code is as follows: echo-e "line1 \ nline2" | awk 'In in {print "start" }{ print} END {print "End "}'
• When print is separated by commas (,), the parameters are bounded by spaces;


Copy codeThe code is as follows:
Echo | awk '{var1 = "v1"; var2 = "V2"; var3 = "v3 ";\
Print var1, var2, var3 ;}'
$> V1 V2 v3

• Use-concatenation operator ("" As concatenation operator );


Copy codeThe code is as follows:
Echo | awk '{var1 = "v1"; var2 = "V2"; var3 = "v3 ";\
Print var1 "-" var2 "-" var3 ;}'
$> V1-V2-v3

Special variable: nr nf $0 $1 $2

NR: indicates the number of records, which corresponds to the current row number during execution;
NF: indicates the number of Fields. the total number of fields corresponding to the current row during execution;
$0: The variable contains the text of the current row during execution;
$1: text content of the first field;
$2: text content of the second field;


Copy codeThe code is as follows: echo-e "line1 f2 f3 \ n line2 \ n line 3" | awk '{print NR ": "$0"-"$1"-"$2 }'

• Print the second and third fields of each row:

Copy codeThe code is as follows: awk '{print $2, $3}' file
• Number of statistics files:

Copy codeThe code is as follows: awk 'end {print NR} 'file
• Add the first field of each row:


Copy codeThe code is as follows:
Echo-e "1 \ n 2 \ n 3 \ n 4 \ n" | awk 'begin {num = 0;
Print "begin" ;}{ sum + = $1;} END {print "="; print sum }'

Passing external variables


Copy codeThe code is as follows:
Var = 1000.
Echo | awk '{print vara}' vara = $ var # Input from stdin
Awk '{print vara}' vara = $ var file # Input from file

Filter the rows processed by awk using styles

Awk 'NR <5' # the row number is smaller than 5
Awk 'NR = 1, NR = 4 {print} 'file # print the row numbers equal to 1 and 4
Awk '/linux/' # lines containing linux text (can be specified using regular expressions, super powerful)
Awk '! /Linux/'# lines that do not contain linux text

Set the delimiter
Use-F to set the delimiter (space by default)
Awk-F: '{print $ NF}'/etc/passwd

Read command output
Use getline to read the output of the external shell command into the variable cmdout;


Copy codeThe code is as follows:
Echo | awk '{"grep root/etc/passwd" | getline cmdout; print cmdout }'

Use loops in awk
For (I = 0; I <10; I ++) {print $ I ;}
For (I in array) {print array [I];}

Eg:
Print rows in reverse order: (implementation of the tac command)


Copy codeThe code is as follows:
Seq 9 | \
Awk '{lifo [NR] = $0; lno = NR }\
END {for (; lno>-1; lno --) {print lifo [lno];}
}'

Awk implements head and tail commands
• Head:

Copy codeThe code is as follows: awk 'NR <= 10 {print} 'filename

• Tail:


Copy codeThe code is as follows:
Awk '{buffer [NR % 10] = $0;} END {for (I = 0; I <11; I ++ ){\
Print buffer [I % 10]} 'filename

Print specified column
• Awk implementation:

Copy codeThe code is as follows: ls-lrt | awk '{print $6 }'
• Cut implementation

Copy codeThe code is as follows: ls-lrt | cut-f6
Print the specified text area
• Determine the row number

Copy codeThe code is as follows: seq 100 | awk 'NR = 4, NR = 6 {print }'
• Confirm the text
Print the text between start_pattern and end_pattern;


Copy codeThe code is as follows: awk '/start_pattern/,/end_pattern/'filename'

Eg:

Copy codeThe code is as follows:
Seq 100 | awk '/13/,/15 /'
Cat/etc/passwd | awk '/mai. * mail/,/news. * news /'

Common built-in functions of awk

Index (string, search_string): returns the position where search_string appears in string.
Sub (regex, replacement_str, string): replace the first part of the regular expression with replacement_str;
Match (regex, string): checks whether regular expressions can match strings;
Length (string): returns the string length.


Copy codeThe code is as follows: echo | awk '{"grep root/etc/passwd" | getline cmdout; print length (cmdout )}'

Printf is similar to printf in C language.
Eg:


Copy codeThe code is as follows: seq 10 | awk '{printf "-> % 4s \ n", $1 }'

Iterate the rows, words, and characters in the file

1. Each row in the iteration file

• While loop method

Copy codeThe code is as follows:
While read line;
Do
Echo $ line;
Done <file.txt
Change to sub-shell:
Cat file.txt | (while read line; do echo $ line; done)

• Awk method:
Cat file.txt | awk '{print }'

2. iterate every word in a row


Copy codeThe code is as follows:
For word in $ line;
Do
Echo $ word;
Done

3. iterate every character
$ {String: start_pos: num_of_chars}: extract a character from the string. (bash text slicing)
$ {# Word}: returns the length of the variable word.


Copy codeThe code is as follows:
For (I = 0; I <$ {# word}; I ++ ))
Do
Echo $ {word: I: 1 );
Done

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.