The latest most common tool for Linux Shell Text Processing

Last Update:2016-09-07 Source: Internet

Author: User

Tags character classes printable characters

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The latest most common tool for Linux Shell Text Processing
GuideThis article describes the most common tools for using Shell to process text in Linux: find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, awk; the examples and parameters provided are the most common and practical. The principle I use for shell scripts is to write a single line of command, and try to avoid more than two lines. If there is a more complex task requirement, consider python!Find file search

Find txt and PDF files
Find txt and PDF files

find . \( -name "*.txt" -o -name "*.pdf" \) -print

Find the regular expression TXT and pdf

Find.-iregex ". * \ (\. txt | \. pdf \) ___ FCKpd ___ 1 quot; #-iregex: Ignore case-insensitive Regular Expressions

Negative parameter, search for all non-txt text

find . ! -name "*.txt" -print

Specify search depth
Print the file in the current directory (depth: 1)

find . -maxdepth 1 -type f

Custom Search
Search by type:
-Type f file/l Symbolic Link

Find.-type d-print // only list all directories

Search by Time:
-Atime access time (unit: Day, unit:-amin, similar to the following)
-Mtime modification time (content modified)
-Ctime change time (metadata or permission change)

All files accessed in the last seven days:

find . -atime 7 -type f -print

Search by size:
Search for files larger than 2 k

find . -type f -size +2k

Search by permission:

Find.-type f-perm 644-print // find all files with executable permissions

Search by user:

Find.-type f-user weber-print // find the file owned by the user weber

Subsequent actions after finding
Delete all swp files in the current directory:

find . -type f -name "*.swp" -delete

Execute action (powerful exec)

Find.-type f-user root-exec chown weber {}\; // change the ownership in the current directory to weber

Note: {} is a special string. For each matching file, {} is replaced with the corresponding file name;
Eg: copy all the files found to another directory:

find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;

Combined with multiple commands
Tips: If you need to execute multiple commands in the future, you can write multiple commands as one script, and then execute the script when-exec is called;

-exec ./commands.sh {} \;

-The identifier of the print file. '\ n' is used by default as the delimiter of the file;
-Print adds a carriage return line break after each output, but-print0 does not. Use '\ 0' as the file's separator to search for files containing spaces;
Files in the current directory are sorted in ascending order (including hidden files) and the file name is not ".":

find . -maxdepth 1 ! -name "." -print0 | xargs -0 du -b | sort -nr | head -10 | nl

Grep text search
Grep match_patten file // default access matching row
Common Parameters
-O only outputs matched text lines VS-v only outputs unmatched text lines
-C: the number of times the statistics file contains text
grep -c "text" filename
-N: print the matched row number.
-I case insensitive during search
-L print only the file name
Recursive text search in multi-level directories (a programmer's favorite search code ):
grep "class" . -R -n
Match multiple modes
grep -e "class" -e "vitural" file
Grep output file name ending with \ 0: (-z)
grep "test" file* -lZ| xargs -0 rm
Xargs command line parameter conversion
Xargs can convert input data into command line parameters of a specific command. In this way, it can be used together with many commands. Such as grep, such as find;
Convert multi-row output to single-row output
cat file.txt| xargs
\ N is the delimiter between multiple texts.
Convert a single row to multiple rows
Cat single.txt | xargs-n 3-n: specifies the number of fields displayed in each row.
Xargs parameter description
-D defines the delimiters (by default, the delimiters of multiple lines with spaces are \ n)
-N: Specify the output as multiple rows.
-I {} specifies the replacement string, which will be replaced during xargs extension. When multiple parameters are required for the command to be executed
Eg:
cat file.txt | xargs -I {} ./command.sh -p {} -1
-0: Specify \ 0 as the input delimiter.
Eg: count the number of program lines
find source_dir/ -type f -name "*.cpp" -print0 |xargs -0 wc -l
Sort sorting
Field description:
-N sort by number VS-d sort by lexicographically
-R reverse sorting
-K N indicates sorting by column N
Eg:
Sort-nrk 1 data.txt sort-bd data // ignore leading blank characters such as spaces
Uniq eliminates duplicate rows
Eliminate duplicate rows
sort unsort.txt | uniq
Count the number of times each row appears in a file
sort unsort.txt | uniq -c
Find duplicate rows
sort unsort.txt | uniq -d
You can specify the repeated content to be compared in each line:-s start position-w comparison Character Count
Conversion Using tr
General Usage
Echo 12345 | tr '0-9' '000000' // encryption/Decryption conversion, replace the corresponding character cat text | tr' \ t''' // convert the tab to a space
Tr delete character
Cat file | tr-d '0-9' // Delete All numbers
-C: Complement
Cat file | tr-c '0-9' // obtain all numbers in the file cat file | tr-d-c '0-9 \ n' // Delete non-numeric data
Tr compressed characters
Duplicate characters in tr-s compressed text; most commonly used to compress redundant Spaces
cat file | tr -s ' '

Character class
Various character classes are available in tr:
Alnum: letters and numbers
Alpha: letter
Digit: Number
Space: white space
Lower: lower case
Upper: uppercase
Cntrl: control (non-printable) characters
Print: printable characters

Usage: tr [: class:] [: class:]
eg: tr '[:lower:]' '[:upper:]'
Cut split text by Column
The 2nd and 4th columns of the truncated file:
cut -f2,4 filename
Remove all columns except 3rd columns from the file:
cut -f3 --complement filename
-D specify the delimiter:
cat -f2 -d";" filename
Cut range
N-the nth field to the end
-M: 1st fields are M.
N-M N to M Fields
Unit of cut
-B is in bytes.
-C in characters
-F is in the unit of fields (using delimiters)
Cut-C1-5 file // print the First to five characters cut-C-2 file // print the first 2 Characters
Paste concatenates text by Column
Concatenates two texts by column;
cat file112cat file2colinbookpaste file1 file21 colin2 book
The default Delimiter is a tab. You can use-d to specify the delimiter.
Paste file1 file2-d ","
1, colin
2, book
Wc statistical line and character tools
Wc-l file // number of statistical lines wc-w file // number of statistical words wc-c file // number of statistical characters
Sed text replacement tool
First place replacement
Seg's/text/replace_text/'file // Replace the first matched text in each row
Global replacement
seg 's/text/replace_text/g' file
After replacement by default, the replaced content is output. If you need to replace the original file directly, use-I:
seg -i 's/text/repalce_text/g' file
Remove blank rows:
sed '/^$/d' file
Variable Conversion
Matched strings are referenced by TAG.
echo this is en example | seg 's/\w+/[&]/g'___FCKpd___37gt;[this] [is] [en] [example]
Substring matching tag
The content of the First Matching bracket is referenced by Mark \ 1.
sed 's/hello$[0-9]$/\1/'
Double quotation marks
Sed is usually referenced by single quotes. You can also use double quotation marks. After double quotation marks are used, double quotation marks evaluate the expression:
sed 's/$var/HLLOE/'
When using double quotation marks, we can specify variables in the sed style and replacement string;
p=pattenr=replacedecho "line con a patten" | sed "s/$p/$r/g"___FCKpd___40gt;line con a replaced
Other examples
String insertion character: converts each line of content (PEKSHA) in the text to PEK/SHA
sed 's/^.\{3\}/&\//g' file
Awk data stream processing tool
Awk script Structure
awk 'BEGIN{ statements } statements2 END{ statements }'
Work Mode
1. Execute the begin statement block;
2. Read a row from a file or stdin, and then execute statements2. repeat this process until all the files are read;
3. Execute the end statement block;
Print current row
When the print without parameters is used, the current row is printed;
echo -e "line1\nline2" | awk 'BEGIN{print "start"} {print } END{ print "End" }'
When print is separated by commas (,), the parameters are bounded by spaces;
echo | awk ' {var1 = "v1" ; var2 = "V2"; var3="v3"; \print var1, var2 , var3; }'___FCKpd___43gt;v1 V2 v3
Use-concatenation operator ("" As concatenation operator );
echo | awk ' {var1 = "v1" ; var2 = "V2"; var3="v3"; \print var1"-"var2"-"var3; }'___FCKpd___44gt;v1-V2-v3
Special variable: nr nf $0 $1 $2
NR: indicates the number of records, which corresponds to the current row number during execution;
NF: indicates the number of fields. The total number of fields corresponding to the current row during execution;
$0: The variable contains the text of the current row during execution;
$1: text content of the first field;
$2: Text Content of the second field;
echo -e "line1 f2 f3\n line2 \n line 3" | awk '{print NR":"$0"-"$1"-"$2}'
Print the second and third fields of each row:
awk '{print $2, $3}' file
Number of statistics files:
awk ' END {print NR}' file
Add the first field of each row:
echo -e "1\n 2\n 3\n 4\n" | awk 'BEGIN{num = 0 ; print "begin";} {sum += $1;} END {print "=="; print sum }'
Passing external variables
Var = 1000 echo | awk '{print vara}' vara = $ var # input from stdinawk '{print vara}' vara = $ var file # input from file
Filter the rows processed by awk using styles
Awk 'nr <5' # The row number is smaller than 5
Awk 'nr = 1, NR = 4 {print} 'file # print the row numbers equal to 1 and 4
Awk '/linux/' # lines containing linux text (can be specified using regular expressions, super powerful)
Awk '! /Linux/'# lines that do not contain linux text
Set the delimiter
Use-F to set the delimiter (space by default)
awk -F: '{print $NF}' /etc/passwd
READ command output
Use getline to read the output of the external shell command into the variable cmdout;
echo | awk '{"grep root /etc/passwd" | getline cmdout; print cmdout }'
Use loops in awk
for(i=0;i<10;i++){print $i;}for(i in array){print array[i];}
Print rows in reverse order: (implementation of the tac command)
seq 9| \awk '{lifo[NR] = $0; lno=NR} \END{ for(;lno>-1;lno--){print lifo[lno];}} '
Awk implements head and tail commands
head: awk 'NR< =10{print}' filename
tail: awk '{buffer[NR%10] = $0;} END{for(i=0;i<11;i++){ \ print buffer[i %10]} } ' filename
Print specified Column
Awk implementation:
ls -lrt | awk '{print $6}'
Cut implementation
ls -lrt | cut -f6
Print the specified text area
Determine the row number
seq 100| awk 'NR==4,NR==6{print}'
Confirm text
Print the text between start_pattern and end_pattern;
awk '/start_pattern/, /end_pattern/' filename
eg: seq 100 | awk '/13/,/15/'cat /etc/passwd| awk '/mai.*mail/,/news.*news/'
Common built-in functions of awk
Index (string, search_string): returns the position where search_string appears in string.
Sub (regex, replacement_str, string): Replace the first part of the regular expression with replacement_str;
Match (regex, string): checks whether regular expressions can match strings;
Length (string): returns the string length.
echo | awk '{"grep root /etc/passwd" | getline cmdout; print length(cmdout) }'
Printf is similar to printf in C language.
seq 10 | awk '{printf "->%4s\n", $1}'
Iterate the rows, words, and characters in the file
1. Each row in the iteration File
While Loop Method
While read line; doecho $ line; done <file.txt changed to sub-shell: cat file.txt | (while read line; do echo $ line; done)
Awk method:
cat file.txt| awk '{print}'
2. iterate every word in a row
for word in $line;do echo $word;done
3. iterate every character
$ {String: start_pos: num_of_chars}: extract a character from the string. (bash text slicing)
$ {# Word}: returns the length of the variable word.
for((i=0;i< ${#word};i++))doecho ${word:i:1);done

From: http://www.yunweipai.com/archives/6074.html

Address: http://www.linuxprobe.com/linux-shell-text-commonly-tools.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More