In Linux, unlike windows, windows can be used for window operations. it is composed of many different commands.
In Linux, unlike windows, windows can be used for window operations. it consists of many different commands. This article describes the most common tools used in Linux to process text using Shell: find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, awk;
The examples and parameters provided are the most common and practical;
The principle I use for shell scripts is to write a single line of commands. try not to write more than two lines. if you need more complex tasks, consider python;
Find File Search
Find txt and PDF files
find . \( -name "*.txt" -o -name "*.pdf" \) -print
Find the regular expression TXT and pdf
find . -regex ".*\(\.txt|\.pdf\)___FCKpd___1quot;
-Iregex: ignore case-insensitive regular expressions.
Negative parameter
Search for all non-txt text
find . ! -name "*.txt" -print
Specify search depth
Print the file in the current directory (depth: 1)
find . -maxdepth 1 -type f
Custom Search
Search by type:
Find.-type d-print // only list all directories
-Type f file/l symbolic link
Search by time:
-Atime access time (unit: Day, unit:-amin, similar to the following)
-Mtime modification time (content modified)
-Ctime change time (metadata or permission change)
All files accessed in the last seven days:
find . -atime 7 -type f -print
Search by size:
W words k M G
Search for files larger than 2 k
find . -type f -size +2k
Search by permission:
Find.-type f-perm 644-print // find all files with executable permissions
Search by User:
Find.-type f-user weber-print // find the file owned by the user weber
Subsequent actions after finding
Delete:
Delete all swp files in the current directory:
find . -type f -name "*.swp" -delete
Execute action (powerful exec)
Find.-type f-user root-exec chown weber {}\; // change the ownership in the current directory to weber
Note: {} is a special string. for each matching file, {} is replaced with the corresponding file name;
Eg: copy all the files found to another Directory:
find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;
Combined with multiple commands
Tips: If you need to execute multiple commands later, you can write multiple commands as one script. -Execute the script when exec is called;
-exec ./commands.sh {} \;
-The identifier of the print
'\ N' is used by default as the delimiter of the file;
-Print0 uses '\ 0' as the delimiter of the file to search for files containing spaces;
Grep text search
Grep match_patten file // Default Access matching row
Common parameters
-O only outputs matched text lines VS-v only outputs unmatched text lines
-C: The number of times the statistics file contains text
grep -c "text" filename
-N: print the matched row number.
-I case insensitive during search
-L print only the file name
Recursive text search in multi-level directories (a programmer's favorite search code ):
grep "class" . -R -n
Match multiple modes
grep -e "class" -e "vitural" file
Grep output file name ending with \ 0: (-z)
grep "test" file* -lZ| xargs -0 rm
Xargs command line parameter conversion
Xargs can convert input data into command line parameters of a specific command. in this way, it can be used together with many commands. Such as grep, such as find;
Convert multi-row output to single-row output
Cat file.txt | xargs
\ N is the delimiter between multiple texts.
Convert a single row to multiple rows
Cat single.txt | xargs-n 3
-N: specifies the number of fields displayed in each row.
Xargs parameter description
-D defines the delimiters (by default, the delimiters of multiple lines with spaces are \ n)
-N: specify the output as multiple rows.
-I {} specifies the replacement string, which will be replaced during xargs extension. when multiple parameters are required for the command to be executed
Eg:
cat file.txt | xargs -I {} ./command.sh -p {} -1
-0: specify \ 0 as the input delimiter.
Eg: count the number of program lines
find source_dir/ -type f -name "*.cpp" -print0 |xargs -0 wc -l
Sort sorting
Field description:
-N sort by Number VS-d sort by lexicographically
-R reverse sorting
-K N indicates sorting by column N
Eg:
Sort-nrk 1 data.txt sort-bd data // ignore leading blank characters such as spaces
Uniq eliminates duplicate rows
Eliminate duplicate rows
sort unsort.txt | uniq
Count the number of times each row appears in a file
sort unsort.txt | uniq -c
Find duplicate rows
sort unsort.txt | uniq -d
You can specify the repeated content to be compared in each line:-s start position-w comparison character count
Conversion using tr
General usage
Echo 12345 | tr '0-9' '000000' // encryption/decryption conversion, replace the corresponding character cat text | tr' \ t''' // Convert the tab to a space
Tr delete character
Cat file | tr-d '0-9' // delete all numbers
-C: Complement
Cat file | tr-c '0-9' // Obtain all numbers in the file cat file | tr-d-c '0-9 \ n' // delete non-numeric data
Tr compressed characters
Duplicate characters in tr-s compressed text; most commonly used to compress redundant spaces
cat file | tr -s ' '
Character class
Various character classes are available in tr:
Alnum: letters and numbers
Alpha: letter
Digit: number
Space: White space
Lower: lower case
Upper: uppercase
Cntrl: Control (non-printable) characters
Print: Printable characters
Usage: tr [: class:] [: class:]
eg: tr '[:lower:]' '[:upper:]'
Cut Split text by column
The 2nd and 4th columns of the truncated file:
cut -f2,4 filename
Remove all columns except 3rd columns from the file:
cut -f3 --complement filename
-D specify the delimiter:
cat -f2 -d";" filename
Cut range
N-the nth field to the end
-M: 1st fields are M.
N-M N to M fields
Unit of cut
-B is in bytes.
-C in characters
-F is in the unit of fields (using delimiters)
Eg:
Cut-C1-5 file // print the first to five characters cut-C-2 file // print the first 2 characters
Paste concatenates text by column
Concatenates two texts by column;
cat file112cat file2colinbookpaste file1 file21 colin2 book
The default delimiter is a tab. you can use-d to specify the delimiter.
Paste file1 file2-d ","
1, colin
2, book
Wc statistical line and character tools
Wc-l file // count the number of rows
Wc-w file // count words
Wc-c file // Number of characters
Sed text replacement tool
First place replacement
Seg's/text/replace_text/'file // replace the first matched text in each row
Global replacement
seg 's/text/replace_text/g' file
After replacement by default, the replaced content is output. if you need to replace the original file directly, use-I:
seg -i 's/text/repalce_text/g' file
Remove blank rows:
sed '/^$/d' file
Variable conversion
Matched strings are referenced by tag.
echo this is en example | seg 's/\w+/[&]/g'___FCKpd___37gt;[this] [is] [en] [example]
Substring matching tag
The content of the first matching bracket is referenced by Mark \ 1.
sed 's/hello\([0-9]\)/\1/'
Double quotation marks
Sed is usually referenced by single quotes. you can also use double quotation marks. after double quotation marks are used, double quotation marks evaluate the expression:
sed 's/$var/HLLOE/'
When using double quotation marks, we can specify variables in the sed style and replacement string;
eg:p=pattenr=replacedecho "line con a patten" | sed "s/$p/$r/g"___FCKpd___40gt;line con a replaced
Other examples
String insertion character: converts each line of content (PEKSHA) in the text to PEK/SHA
sed 's/^.\{3\}/&\//g' file
Awk data stream processing tool
Awk script structure
Awk 'In in {statements} statements2 END {statements }'
Work mode
1. execute the begin statement block;
2. read a row from a file or stdin, and then execute statements2. Repeat this process until all the files are read;
3. execute the end statement block;
Print current row
When the print without parameters is used, the current row is printed;
echo -e "line1\nline2" | awk 'BEGIN{print "start"} {print } END{ print "End" }'
When print is separated by commas (,), the parameters are bounded by spaces;
echo | awk ' {var1 = "v1" ; var2 = "V2"; var3="v3"; \print var1, var2 , var3; }'___FCKpd___43gt;v1 V2 v3
Use-concatenation operator ("" As concatenation operator );
echo | awk ' {var1 = "v1" ; var2 = "V2"; var3="v3"; \print var1"-"var2"-"var3; }'___FCKpd___44gt;v1-V2-v3
Special variable: nr nf $0 $1 $2
NR: indicates the number of records, which corresponds to the current row number during execution;
NF: indicates the number of Fields. the total number of fields corresponding to the current row during execution;
$0: The variable contains the text of the current row during execution;
$1: text content of the first field;
$2: text content of the second field;
echo -e "line1 f2 f3\n line2 \n line 3" | awk '{print NR":"$0"-"$1"-"$2}'
Print the second and third fields of each row:
awk '{print $2, $3}' file
Number of statistics files:
awk ' END {print NR}' file
Add the first field of each row:
echo -e "1\n 2\n 3\n 4\n" | awk 'BEGIN{num = 0 ; print "begin";} {sum += $1;} END {print "=="; print sum }'
Passing external variables
Var = 1000 echo | awk '{print vara}' vara = $ var # Input from stdinawk '{print vara}' vara = $ var file # Input from file
Filter the rows processed by awk using styles
Awk 'NR <5' # the row number is smaller than 5
Awk 'NR = 1, NR = 4 {print} 'file # print the row numbers equal to 1 and 4
Awk '/linux/' # lines containing linux text (can be specified using regular expressions, super powerful)
Awk '! /Linux/'# lines that do not contain linux text
Set the delimiter
Use-F to set the delimiter (space by default)
Awk-F: '{print $ NF}'/etc/passwd
Read command output
Use getline to read the output of the external shell command into the variable cmdout;
echo | awk '{"grep root /etc/passwd" | getline cmdout; print cmdout }'
Use loops in awk
For (I = 0; I <10; I ++) {print $ I ;}
For (I in array) {print array [I];}
Eg:
Print rows in reverse order: (implementation of the tac command)
seq 9| \awk '{lifo[NR] = $0; lno=NR} \END{ for(;lno>-1;lno--){print lifo[lno];}} '
Awk implements head and tail commands
Head:
awk 'NR<=10{print}' filename
Tail:
awk '{buffer[NR%10] = $0;} END{for(i=0;i<11;i++){ \ print buffer[i %10]} } ' filename
Print specified column
Awk implementation:
ls -lrt | awk '{print $6}'
Cut implementation
ls -lrt | cut -f6
Print the specified text area
Determine the row number
seq 100| awk 'NR==4,NR==6{print}'
Confirm text
Print the text between start_pattern and end_pattern;
awk '/start_pattern/, /end_pattern/' filename
Eg:
seq 100 | awk '/13/,/15/'cat /etc/passwd| awk '/mai.*mail/,/news.*news/'
Common built-in functions of awk
Index (string, search_string): returns the position where search_string appears in string.
Sub (regex, replacement_str, string): replace the first part of the regular expression with replacement_str;
Match (regex, string): checks whether regular expressions can match strings;
Length (string): returns the string length.
echo | awk '{"grep root /etc/passwd" | getline cmdout; print length(cmdout) }'
Printf is similar to printf in C language.
Eg:
seq 10 | awk '{printf "->%4s\n", $1}'
Iterate the rows, words, and characters in the file
1. Each row in the iteration file
While loop method
While read line; doecho $ line; done <file.txt changed to sub-shell: cat file.txt | (while read line; do echo $ line; done)
Awk method:
Cat file.txt | awk '{print }'
2. iterate every word in a row
for word in $line;do echo $word;done
3. iterate every character
$ {String: start_pos: num_of_chars}: extract a character from the string. (bash text slicing)
$ {# Word}: returns the length of the variable word.
for((i=0;i<${#word};i++))doecho ${word:i:1);done