This article describes the most common tools for using Shell to process text in Linux: find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, awk; the examples and parameters provided are the most common and practical. The principle I use for shell scripts is to write a single line of command, and try to avoid more than two lines. If there is a more complex task requirement, consider python. find the file and find the txt and PDF files. \ (-name
This article describes the most common tools for using Shell to process text in Linux:
Find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, awk;
The examples and parameters provided are the most common and practical;
The principle I use for shell scripts is to write a single line of commands. Try not to exceed 2 lines;
If you have more complex task requirements, consider python;
Find file search
Find txt and PDF files
find . \( -name "*.txt" -o -name "*.pdf" \) -print
Find the regular expression TXT and pdf
find . -regex ".*\(\.txt|\.pdf\)$"
-Iregex: Ignore case-insensitive regular expressions.
Negative Parameter
Search for all non-txt text
find . ! -name "*.txt" -print
Specify search depth
Print the file in the current directory (depth: 1)
find . -maxdepth 1 -type f
Custom Search
Search by type:
Find.-type d-print // only list all directories
-Type f file/l Symbolic Link
Search by Time:
-Atime access time (unit: Day, unit:-amin, similar to the following)
-Mtime modification time (content modified)
-Ctime change time (metadata or permission change)
All files accessed in the last seven days:
find . -atime 7 -type f -print
Search by size:
W words k M G
Search for files larger than 2 K
find . -type f -size +2k
Search by permission:
Find.-type f-perm 644-print // find all files with executable permissions
Search by user:
Find.-type f-user weber-print // find the file owned by the user weber
Subsequent actions after finding
Delete:
Delete all swp files in the current directory:
find . -type f -name "*.swp" -delete
Execute action (powerful exec)
Find.-type f-user root-exec chown weber {}\; // change the ownership in the current directory to weber
Note: {} is a special string. For each matching file, {} is replaced with the corresponding file name;
Eg: copy all the files found to another directory:
find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;
Combined with multiple commands
Tips: If you need to execute multiple commands later, you can write multiple commands as one script. -Execute the script when exec is called;
-exec ./commands.sh {} \;
-The identifier of the print
'\ N' is used by default as the delimiter of the file;
-Print0 uses '\ 0' as the delimiter of the file to search for files containing spaces;
Grep text search
Grep match_patten file // default access matching row
Common Parameters
-O only outputs matched lines of textVS-V only outputs unmatched text lines
-C: the number of times the statistics file contains text
grep -c "text" filename
-N: print the matched row number.
-I case insensitive during search
-L print only the file name
Recursive text search in multi-level directories (a programmer's favorite search code ):
grep "class" . -R -n
Match multiple modes
grep -e "class" -e "vitural" file
Grep output file name ending with \ 0: (-z)
grep "test" file* -lZ| xargs -0 rm
Xargs command line parameter conversion
Xargs can convert input data into command line parameters of a specific command. In this way, it can be used together with many commands. Such as grep, such as find;
Convert multi-row output to single-row output
Cat file.txt | xargs
\ N is the delimiter between multiple texts.
Convert a single row to multiple rows
Cat single.txt | xargs-n 3
-N: specifies the number of fields displayed in each row.
Xargs parameter description
-D defines the delimiters (by default, the delimiters of multiple lines with spaces are \ n)
-N: Specify the output as multiple rows.
-I {} specifies the replacement string, which will be replaced during xargs extension. When multiple parameters are required for the command to be executed
Eg:
cat file.txt | xargs -I {} ./command.sh -p {} -1
-0: Specify \ 0 as the input delimiter.
Eg: count the number of program lines
find source_dir/ -type f -name "*.cpp" -print0 |xargs -0 wc -l
Sort sorting
Field description:
-N sort by number VS-d sort by lexicographically
-R reverse sorting
-K N indicates sorting by column N
Eg:
Sort-nrk 1 data.txt sort-bd data // ignore leading blank characters such as spaces
Uniq eliminates duplicate rows
Eliminate duplicate rows
sort unsort.txt | uniq
Count the number of times each row appears in a file
sort unsort.txt | uniq -c
Find duplicate rows
sort unsort.txt | uniq -d
You can specify the repeated content to be compared in each line:-s start position-w comparison Character Count
Conversion Using tr
General Usage
Echo 12345 | tr '0-9' '000000' // encryption/Decryption conversion, replace the corresponding character cat text | tr' \ t''' // convert the tab to a space
Tr delete character
Cat file | tr-d '0-9' // Delete All numbers
-C: Complement
Cat file | tr-c '0-9' // obtain all numbers in the file cat file | tr-d-c '0-9 \ n' // Delete non-numeric data
Tr compressed characters
Duplicate characters in tr-s compressed text; most commonly used to compress redundant Spaces
cat file | tr -s ' '
Character class
Various character classes are available in tr:
Alnum: letters and numbers
Alpha: letter
Digit: Number
Space: white space
Lower: lower case
Upper: uppercase
Cntrl: control (non-printable) characters
Print: printable characters
Usage: tr [: class:] [: class:]
eg: tr '[:lower:]' '[:upper:]'
Cut split text by Column
The 2nd and 4th columns of the truncated file:
cut -f2,4 filename
Remove all columns except 3rd columns from the file:
cut -f3 --complement filename
-D specify the delimiter:
cat -f2 -d";" filename
Cut range
N-the nth field to the end
-M: 1st fields are M.
N-M N to M Fields
Unit of cut
-B is in bytes.
-C in characters
-F is in the unit of fields (using delimiters)
Eg:
Cut-C1-5 file // print the First to five characters cut-C-2 file // print the first 2 Characters
Paste concatenates text by Column
Concatenates two texts by column;
cat file112cat file2colinbookpaste file1 file21 colin2 book
The default Delimiter is a tab. You can use-d to specify the delimiter.
Paste file1 file2-d ","
1, colin
2, book
Wc statistical line and character tools
Wc-l file // count the number of rows
Wc-w file // count words
Wc-c file // number of characters
Sed text replacement tool
First place replacement
Seg's/text/replace_text/'file // Replace the first matched text in each row
Global replacement
seg 's/text/replace_text/g' file
After replacement by default, the replaced content is output. If you need to replace the original file directly, use-I:
seg -i 's/text/repalce_text/g' file
Remove blank rows:
sed '/^$/d' file
Variable Conversion
Matched strings are referenced by TAG.
echo this is en example | seg 's/\w+/[&]/g'$>[this] [is] [en] [example]
Substring matching tag
The content of the First Matching bracket is referenced by Mark \ 1.
sed 's/hello\([0-9]\)/\1/'
Double quotation marks
Sed is usually referenced by single quotes. You can also use double quotation marks. After double quotation marks are used, double quotation marks evaluate the expression:
sed 's/$var/HLLOE/'
When using double quotation marks, we can specify variables in the sed style and replacement string;
eg:p=pattenr=replacedecho "line con a patten" | sed "s/$p/$r/g"$>line con a replaced
Other examples
String insertion character: converts each line of content (PEKSHA) in the text to PEK/SHA
sed 's/^.\{3\}/&\//g' file
Awk data stream processing tool
Awk script Structure
Awk 'in in {statements} statements2 END {statements }'
Work Mode
1. Execute the begin statement block;
2. Read a row from a file or stdin, and then execute statements2. repeat this process until all the files are read;
3. Execute the end statement block;
Print current row
When the print without parameters is used, the current row is printed;
echo -e "line1\nline2" | awk 'BEGIN{print "start"} {print } END{ print "End" }'
When print is separated by commas (,), the parameters are bounded by spaces;
echo | awk ' {var1 = "v1" ; var2 = "V2"; var3="v3"; \print var1, var2 , var3; }'$>v1 V2 v3
Use-concatenation operator ("" As concatenation operator );
echo | awk ' {var1 = "v1" ; var2 = "V2"; var3="v3"; \print var1"-"var2"-"var3; }'$>v1-V2-v3
Special variable: nr nf $0 $1 $2
NR: indicates the number of records, which corresponds to the current row number during execution;
NF: indicates the number of fields. The total number of fields corresponding to the current row during execution;
$0: The variable contains the text of the current row during execution;
$1: text content of the first field;
$2: Text Content of the second field;
echo -e "line1 f2 f3\n line2 \n line 3" | awk '{print NR":"$0"-"$1"-"$2}'
Print the second and third fields of each row:
awk '{print $2, $3}' file
Number of statistics files:
awk ' END {print NR}' file
Add the first field of each row:
echo -e "1\n 2\n 3\n 4\n" | awk 'BEGIN{num = 0 ; print "begin";} {sum += $1;} END {print "=="; print sum }'
Passing external variables
Var = 1000 echo | awk '{print vara}' vara = $ var # input from stdinawk '{print vara}' vara = $ var file # input from file
Filter the rows processed by awk using styles
Awk 'nr <5' # The row number is smaller than 5
Awk 'nr = 1, NR = 4 {print} 'file # print the row numbers equal to 1 and 4
Awk '/linux/' # lines containing linux text (can be specified using regular expressions, super powerful)
Awk '! /Linux/'# lines that do not contain linux text
Set the delimiter
Use-F to set the delimiter (space by default)
Awk-F: '{print $ NF}'/etc/passwd
READ command output
Use getline to read the output of the external shell command into the variable cmdout;
echo | awk '{"grep root /etc/passwd" | getline cmdout; print cmdout }'
Use loops in awk
For (I = 0; I <10; I ++) {print $ I ;}
For (I in array) {print array [I];}
Eg:
Print rows in reverse order: (implementation of the tac command)
seq 9| \awk '{lifo[NR] = $0; lno=NR} \END{ for(;lno>-1;lno--){print lifo[lno];}} '
Awk implements head and tail commands
Print specified Column
Print the specified text area
Determine the row number
seq 100| awk 'NR==4,NR==6{print}'
Confirm text
Print the text between start_pattern and end_pattern;
awk '/start_pattern/, /end_pattern/' filename
Eg:seq 100 | awk '/13/,/15/'cat /etc/passwd| awk '/mai.*mail/,/news.*news/'
Common built-in functions of awk
Index (string, search_string): returns the position where search_string appears in string.
Sub (regex, replacement_str, string): Replace the first part of the regular expression with replacement_str;
Match (regex, string): checks whether regular expressions can match strings;
Length (string): returns the string length.
echo | awk '{"grep root /etc/passwd" | getline cmdout; print length(cmdout) }'
Printf is similar to printf in C language.
Eg:
seq 10 | awk '{printf "->%4s\n", $1}'
Lines, words, and characters in the iteration file 1. Each line in the iteration File
2. iterate every word in a row
for word in $line;do echo $word;done
3. iterate every character
$ {String: start_pos: num_of_chars}: extract a character from the string. (bash text slicing)
$ {# Word}: returns the length of the variable word.
for((i=0;i<${#word};i++))doecho ${word:i:1);done
This article is the Reading Notes of linux Shell script strategy. The main content and examples in this article are as follows:
Linux Shell script strategy;