Example of shell command text processing under Linux

Source: Internet
Author: User
Tags character classes printable characters regular expression stdin


This article describes the most commonly used tools for using shell to process text under Linux:
Find, grep, Xargs, Sort, uniq, tr, cut, paste, WC, sed, awk;
The examples and parameters provided are the most common and practical;
The principle I use for Shell scripting is to write the command line, and try not to exceed 2 lines;
If you have more complex task requirements, consider python.


Find File Lookup

Find txt and PDF files
Find. \ (-name "*.txt"-o-name "*.pdf" \)-print
Find txt and PDF in a regular way
Find. -regex ". *\ (\.txt|\.pdf\) $"
-iregex: Ignores case-insensitive regular

Negative parameters
Find all non-txt text
Find. ! -name "*.txt"-print
Specify Search Depth
Print out the current directory file (depth 1)
Find. -maxdepth 1-type F
Custom Search

Search By Type:
Find. -type D-print//Lists all catalogs only
-type f File/L Symbolic link

Search by Time:
-atime Access time (units are days, minutes are-amin, similar to the following)
-mtime modification Time (content modified)
-ctime change time (meta data or permission changes)
All files that have been visited in the last 7 days:
Find. -atime 7-type F-print
Search by Size:
W Word k M G
Looking for files larger than 2k
Find. -type f-size +2k
Find by Permission:

Find. -type f-perm 644-print//Find all files with executable permissions
Find by User:

Find. -type f-user weber-print//to find files owned by user Weber
Follow-up action after finding

Delete:
Delete all SWP files in the current directory:
Find. -type f-name "*.SWP"-delete
Execute action (Mighty exec)
Find. -type f-user root-exec chown Weber {} \; Change ownership under current directory to Weber
Note: {} is a special string, for each matching file, {} will be replaced with the corresponding filename;
Eg: Copy all found files to another directory:

Find. -type f-mtime +10-name "*.txt"-exec cp {} old \;
Combine multiple commands
Tips: If you need to execute multiple commands later, you can write multiple commands as a script. Then execute the script-exec the call;
-exec./commands.sh {} \;
The delimiter of the-print

Use ' \ n ' as the delimiter for the file by default;
-print0 uses ' I ' as the delimiter of the file, so that it can search for files containing spaces;

grep Text Search

grep match_patten File//default access matching row

Common parameters
-O outputs only matching lines of text vs-v output only lines that do not match
-C Statistics The number of times the file contains text
Grep-c "text" filename
-N Print matching line numbers
-I ignores case when searching
-L print File name only

Recursive search of text in a multilevel directory (programmers search for code favorites):
grep "Class". -r-n
Match multiple patterns
Grep-e "Class"-E "vitural" file
The name of the grep output with the end character: (z)
grep "Test" file*-lz| xargs-0 RM
Xargs command line parameter conversions

Xargs can convert input data into command-line arguments for specific commands, which can be combined with many commands. such as grep, such as find;

Convert multiline output to single-line output
Cat file.txt| Xargs
\ n is a delimiter between multiple lines of text
Convert a single line to multiline output
Cat Single.txt | Xargs-n 3
-N: Specifies the number of fields to display per line
Xargs parameter Description

-D definition delimiter (the default is a multiple-line delimiter for \ n)
-n Specifies that the output is multiple lines
-I {} Specifies a replacement string that is replaced when the Xargs is extended, when multiple parameters are required for a command to be executed
eg

Cat File.txt | Xargs-i {}./command.sh-p {}-1
-0: Specify the input delimiter
Eg: statistics of the number of program lines

Find source_dir/-type f-name "*.cpp"-print0 |xargs-0
Sort sorting

Field Description:
-N Sort by number vs-d in dictionary order
-R Reverse Order
-K-n Specifies sort by nth column
eg

SORT-NRK 1 Data.txt
SORT-BD Data//Ignore leading white space characters like spaces
Uniq Eliminate duplicate rows

Eliminate duplicate rows
Sort Unsort.txt | Uniq
Count the number of times each row appears in a file
Sort Unsort.txt | Uniq-c
Find duplicate rows
Sort Unsort.txt | Uniq-d
You can specify what duplicates you want to compare in each row:-S start position-W comparison character number

Converting with TR

Common usage
echo 12345 | Tr ' 0-9 ' 9876543210 '//Decrypt conversion, replace corresponding character
Cat text| Tr ' \ t '//tab to space
TR Delete character
Cat File | Tr-d ' 0-9 '//Delete all numbers
-C-Find the complement set

Cat File | Tr-c ' 0-9 '//Get all the numbers in the file
Cat File | Tr-d-C ' 0-9 \//Delete non-digital data
TR Compressed characters
Tr-s the repeated characters appearing in compressed text; most commonly used to compress extra spaces
Cat File | Tr-s '
Character class
Various character classes are available in TR:
Alnum: Letters and Numbers
Alpha: Letter
Digit: Digital
Space: whitespace characters
Lower: lowercase
Upper: Uppercase
Cntrl: Control (non-printable) character
Print: Printable characters
How to use: TR [: Class:] [: Class:]
Eg:tr ' [: Lower:] ' [: Upper:] '
Cut slitting text by column

Columns 2nd and 4th of the interception file:
cut-f2,4 filename
Go to file except all columns in column 3rd:
CUT-F3--complement filename
-d Specifies delimiters:
Cat-f2-d ";" FileName
Cut of the range
N-nth fields to the end
-M 1th field is M.
N-m N to M Fields
Cut units
-B in bytes
-C in character units
-F in fields (using delimiters)
eg
Cut-c1-5 File//print first to 5th character
Cut-c-2 File//print first 2 characters
Paste concatenation of text by column

The two text is spliced together by columns;

Cat File1
1
2

Cat File2
Colin
Book

Paste File1 file2
1 Colin
2 book
The default delimiter is a tab character, and you can use the-D to indicate the delimiter
Paste File1 file2-d ","
1,colin
2,book

WC statistic line and character tools

Wc-l File//statistic line number
Wc-w File//statistics number of words
Wc-c file//Statistics character count

Sed text replacement weapon

First place replacement
Seg ' s/text/replace_text/' file//replace the first matching text of each line
Global substitution
Seg ' s/text/replace_text/g ' file
After the default replacement, the output after the replacement of the content, if you need to replace the original file directly, use-I:

Seg-i ' s/text/repalce_text/g ' file
To remove a blank line:
Sed '/^$/d ' file
Variable conversion
The matched string is referenced by Tag &.
echo this is en example | Seg ' s/\w+/[&]/g '
$>[this] [is] [en] [Example]
SUBSTRING matching tag
The first matching bracket content uses the tag \ One reference
Sed ' s/hello\ ([0-9]\)/\1/'
Double quote Value
Sed is usually quoted in single quotes, and double quotes are used, and double quotes are evaluated over expressions:
Sed ' s/$var/hlloe/'
When using double quotes, we can specify variables in the SED style and in the replacement string;

eg
P=patten
r=replaced
echo "line con a patten" | Sed "s/$p/$r/g"
$>line Con a replaced
Other examples
String insertion character: Converts each line of text (Peksha) to Pek/sha
Sed ' s/^.\{3\}/&\//g ' file
awk Data Flow processing tools

AWK script Structure
awk ' begin{statements} statements2 end{statements} '
Working mode
1. Execute the statement block in begin;
2. Read a line from a file or stdin and execute statements2, repeating the process until all the files have been read;
3. Execute END statement block;
Print printing when moving forward

When you use print with no parameters, the current line is printed;
Echo-e "Line1\nline2" | awk ' Begin{print ' start} {print} end{print ' End '} '
When print is separated by commas, the parameters are bounded by a space;
echo | awk ' {var1 = "V1"; var2 = "V2"; var3= "V3"; \
Print var1, var2, VAR3; }'
$>V1 V2 v3
Use-a concatenation character ("" as a concatenation character);
echo | awk ' {var1 = "V1"; var2 = "V2"; var3= "V3"; \
Print Var1 "-" var2 "-" VAR3; }'
$>v1-v2-v3
Special variables: NR NF $ $

NR: Indicates the number of records, in the course of the implementation of the should forward number;
NF: Represents the number of fields, and the total number of fields that should go in the execution process;
$: This variable contains the textual content of the current line during execution;
$: The textual content of the first field;
$: The text content of the second field;

Echo-e "line1 f2 f3\n line2 \ n Line 3" | awk ' {print NR ': ' $ '-' $ '-' $} '
Print the second and third fields for each row:
awk ' {print $, $} ' file
Number of rows in the statistics file:
awk ' End {print NR} ' file
Add the first field of each row:
Echo-e "1\n 2\n 3\n 4\n" | awk ' begin{num = 0;
print "Begin"; {sum = $;} End {print = =; Print Sum} '
Passing external variables

var=1000
echo | awk ' {print Vara} ' vara= $var # input from stdin
awk ' {print Vara} ' vara= $var file # input from files
To filter the rows that awk handles by using a style

awk ' NR < 5 ' #行号小于5
awk ' nr==1,nr==4 {print} ' file #行号等于1和4的打印出来
awk '/linux/' #包含linux文本的行 (can be specified with regular expressions, super powerful)
awk '!/linux/' #不包含linux文本的行

Set delimiter

Use-F to set the delimiter (default is a space)
Awk-f: ' {print $NF} '/etc/passwd

Read command output

Using Getline, the output of the external shell command is read into the variable cmdout;

echo | awk ' {' grep root/etc/passwd ' | getline cmdout print Cmdout} '
Using loops in awk

for (i=0;i<10;i++) {print $i;}
For (i in array) {print array[i];}

eg
Print lines in reverse order: (Implementation of TAC commands)

Seq 9| \
awk ' {LIFO[NR] = $ LNO=NR} \
end{for (; lno>-1;lno--) {print Lifo[lno];}
} '
AWK implements head, tail commands

Head
awk ' Nr<=10{print} ' filename
Tail
awk ' {buffer[nr%10] = $;} End{for (i=0;i<11;i++) {\
Print buffer[i%10]} ' filename
Print the specified column

Awk method Implementation:
LS-LRT | awk ' {print $} '
Cut Mode implementation
LS-LRT | Cut-f6
Print the specified text area

Determine line number
Seq 100| awk ' Nr==4,nr==6{print} '
Determine text
Print the text between Start_pattern and End_pattern;
awk '/start_pattern/,/end_pattern/' filename
eg

SEQ 100 | awk '/13/,/15/'
cat/etc/passwd| awk '/mai.*mail/,/news.*news/'
Awk commonly used built-in functions

Index (string,search_string): Returns where Search_string appears in string
Sub (regex,replacement_str,string): Replaces the first content of a regular match with a replacement_str;
Match (regex,string): Checks whether the regular expression can match a string;
Length (String): returns string lengths

echo | awk ' {' grep root/etc/passwd ' | getline cmdout print Length (cmdout)} '
printf, which is similar to printf in C, formats the output
eg

Seq 10 | awk ' {printf '->%4s\n ', $} '
Lines, words, and characters in an iteration file

1. Each row in the iteration file

While Loop method
while read line;
Todo
Echo $line;
Done < file.txt
Change to a child shell:
Cat File.txt | (While read line;do echo $line;d one)
Awk method:
Cat file.txt| awk ' {print} '
2. Each word in the iteration line

for word in $line;
Todo
Echo $word;
Done
3. Iterate each character

${string:start_pos:num_of_chars}: Extract one character from string; (bash text slice)
${#word}: Returns the length of a variable word

For ((i=0;i<${#word};i++))
Todo
echo ${word:i:1);
Done
This article is a reading note for the Linux shell script Introduction, the main content and examples from

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.