The latest most common tool for Linux Shell Text Processing

Source: Internet
Author: User
Tags character classes printable characters

The latest most common tool for Linux Shell Text Processing
GuideThis article describes the most common tools for using Shell to process text in Linux: find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, awk; the examples and parameters provided are the most common and practical. The principle I use for shell scripts is to write a single line of command, and try to avoid more than two lines. If there is a more complex task requirement, consider python!Find file search

Find txt and PDF files
Find txt and PDF files

find . \( -name "*.txt" -o -name "*.pdf" \) -print

Find the regular expression TXT and pdf

Find.-iregex ". * \ (\. txt | \. pdf \) ___ FCKpd ___ 1 quot; #-iregex: Ignore case-insensitive Regular Expressions

Negative parameter, search for all non-txt text

find . ! -name "*.txt" -print

Specify search depth
Print the file in the current directory (depth: 1)

find . -maxdepth 1 -type f

Custom Search
Search by type:
-Type f file/l Symbolic Link

Find.-type d-print // only list all directories

Search by Time:
-Atime access time (unit: Day, unit:-amin, similar to the following)
-Mtime modification time (content modified)
-Ctime change time (metadata or permission change)

All files accessed in the last seven days:

find . -atime 7 -type f -print

Search by size:
Search for files larger than 2 k

find . -type f -size +2k

Search by permission:

Find.-type f-perm 644-print // find all files with executable permissions

Search by user:

Find.-type f-user weber-print // find the file owned by the user weber

Subsequent actions after finding
Delete all swp files in the current directory:

find . -type f -name "*.swp" -delete

Execute action (powerful exec)

Find.-type f-user root-exec chown weber {}\; // change the ownership in the current directory to weber

Note: {} is a special string. For each matching file, {} is replaced with the corresponding file name;
Eg: copy all the files found to another directory:

find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;

Combined with multiple commands
Tips: If you need to execute multiple commands in the future, you can write multiple commands as one script, and then execute the script when-exec is called;

-exec ./commands.sh {} \;

-The identifier of the print file. '\ n' is used by default as the delimiter of the file;
-Print adds a carriage return line break after each output, but-print0 does not. Use '\ 0' as the file's separator to search for files containing spaces;
Files in the current directory are sorted in ascending order (including hidden files) and the file name is not ".":

find . -maxdepth 1 ! -name "." -print0 | xargs -0 du -b | sort -nr | head -10 | nl
Grep text search

Grep match_patten file // default access matching row
Common Parameters
-O only outputs matched text lines VS-v only outputs unmatched text lines
-C: the number of times the statistics file contains text

grep -c "text" filename

-N: print the matched row number.
-I case insensitive during search
-L print only the file name
Recursive text search in multi-level directories (a programmer's favorite search code ):

grep "class" . -R -n

Match multiple modes

grep -e "class" -e "vitural" file

Grep output file name ending with \ 0: (-z)

grep "test" file* -lZ| xargs -0 rm
Xargs command line parameter conversion

Xargs can convert input data into command line parameters of a specific command. In this way, it can be used together with many commands. Such as grep, such as find;
Convert multi-row output to single-row output

cat file.txt| xargs

\ N is the delimiter between multiple texts.
Convert a single row to multiple rows

Cat single.txt | xargs-n 3-n: specifies the number of fields displayed in each row.

Xargs parameter description
-D defines the delimiters (by default, the delimiters of multiple lines with spaces are \ n)
-N: Specify the output as multiple rows.
-I {} specifies the replacement string, which will be replaced during xargs extension. When multiple parameters are required for the command to be executed
Eg:

cat file.txt | xargs -I {} ./command.sh -p {} -1

-0: Specify \ 0 as the input delimiter.
Eg: count the number of program lines

find source_dir/ -type f -name "*.cpp" -print0 |xargs -0 wc -l
Sort sorting

Field description:
-N sort by number VS-d sort by lexicographically
-R reverse sorting
-K N indicates sorting by column N
Eg:

Sort-nrk 1 data.txt sort-bd data // ignore leading blank characters such as spaces

Uniq eliminates duplicate rows
Eliminate duplicate rows

sort unsort.txt | uniq

Count the number of times each row appears in a file

sort unsort.txt | uniq -c

Find duplicate rows

sort unsort.txt | uniq -d

You can specify the repeated content to be compared in each line:-s start position-w comparison Character Count

Conversion Using tr

General Usage

Echo 12345 | tr '0-9' '000000' // encryption/Decryption conversion, replace the corresponding character cat text | tr' \ t''' // convert the tab to a space

Tr delete character

Cat file | tr-d '0-9' // Delete All numbers

-C: Complement

Cat file | tr-c '0-9' // obtain all numbers in the file cat file | tr-d-c '0-9 \ n' // Delete non-numeric data

Tr compressed characters
Duplicate characters in tr-s compressed text; most commonly used to compress redundant Spaces

cat file | tr -s ' '

Character class
Various character classes are available in tr:
Alnum: letters and numbers
Alpha: letter
Digit: Number
Space: white space
Lower: lower case
Upper: uppercase
Cntrl: control (non-printable) characters
Print: printable characters

Usage: tr [: class:] [: class:]

eg: tr '[:lower:]' '[:upper:]'
Cut split text by Column

The 2nd and 4th columns of the truncated file:

cut -f2,4 filename

Remove all columns except 3rd columns from the file:

cut -f3 --complement filename

-D specify the delimiter:

cat -f2 -d";" filename

Cut range
N-the nth field to the end
-M: 1st fields are M.
N-M N to M Fields

Unit of cut
-B is in bytes.
-C in characters
-F is in the unit of fields (using delimiters)

Cut-C1-5 file // print the First to five characters cut-C-2 file // print the first 2 Characters
Paste concatenates text by Column

Concatenates two texts by column;

cat file112cat file2colinbookpaste file1 file21 colin2 book

The default Delimiter is a tab. You can use-d to specify the delimiter.
Paste file1 file2-d ","
1, colin
2, book

Wc statistical line and character tools
Wc-l file // number of statistical lines wc-w file // number of statistical words wc-c file // number of statistical characters
Sed text replacement tool

First place replacement

Seg's/text/replace_text/'file // Replace the first matched text in each row

Global replacement

seg 's/text/replace_text/g' file

After replacement by default, the replaced content is output. If you need to replace the original file directly, use-I:

seg -i 's/text/repalce_text/g' file

Remove blank rows:

sed '/^$/d' file

Variable Conversion
Matched strings are referenced by TAG.

echo this is en example | seg 's/\w+/[&]/g'___FCKpd___37gt;[this]  [is] [en] [example]

Substring matching tag
The content of the First Matching bracket is referenced by Mark \ 1.

sed 's/hello\([0-9]\)/\1/'

Double quotation marks
Sed is usually referenced by single quotes. You can also use double quotation marks. After double quotation marks are used, double quotation marks evaluate the expression:

sed 's/$var/HLLOE/'

When using double quotation marks, we can specify variables in the sed style and replacement string;

p=pattenr=replacedecho "line con a patten" | sed "s/$p/$r/g"___FCKpd___40gt;line con a replaced

Other examples
String insertion character: converts each line of content (PEKSHA) in the text to PEK/SHA

sed 's/^.\{3\}/&\//g' file
Awk data stream processing tool

Awk script Structure

awk 'BEGIN{ statements } statements2 END{ statements }'

Work Mode
1. Execute the begin statement block;
2. Read a row from a file or stdin, and then execute statements2. repeat this process until all the files are read;
3. Execute the end statement block;

Print current row
When the print without parameters is used, the current row is printed;

echo -e "line1\nline2" | awk 'BEGIN{print "start"} {print } END{ print "End" }'

When print is separated by commas (,), the parameters are bounded by spaces;

echo | awk ' {var1 = "v1" ; var2 = "V2"; var3="v3"; \print var1, var2 , var3; }'___FCKpd___43gt;v1 V2 v3

Use-concatenation operator ("" As concatenation operator );

echo | awk ' {var1 = "v1" ; var2 = "V2"; var3="v3"; \print var1"-"var2"-"var3; }'___FCKpd___44gt;v1-V2-v3

Special variable: nr nf $0 $1 $2
NR: indicates the number of records, which corresponds to the current row number during execution;
NF: indicates the number of fields. The total number of fields corresponding to the current row during execution;
$0: The variable contains the text of the current row during execution;
$1: text content of the first field;
$2: Text Content of the second field;

echo -e "line1 f2 f3\n line2 \n line 3" | awk '{print NR":"$0"-"$1"-"$2}'

Print the second and third fields of each row:

awk '{print $2, $3}' file

Number of statistics files:

awk ' END {print NR}' file

Add the first field of each row:

echo -e "1\n 2\n 3\n 4\n" | awk 'BEGIN{num = 0 ;  print "begin";} {sum += $1;} END {print "=="; print sum }'

Passing external variables

Var = 1000 echo | awk '{print vara}' vara = $ var # input from stdinawk '{print vara}' vara = $ var file # input from file

Filter the rows processed by awk using styles
Awk 'nr <5' # The row number is smaller than 5
Awk 'nr = 1, NR = 4 {print} 'file # print the row numbers equal to 1 and 4
Awk '/linux/' # lines containing linux text (can be specified using regular expressions, super powerful)
Awk '! /Linux/'# lines that do not contain linux text

Set the delimiter
Use-F to set the delimiter (space by default)

awk -F: '{print $NF}' /etc/passwd

READ command output
Use getline to read the output of the external shell command into the variable cmdout;

echo | awk '{"grep root /etc/passwd" | getline cmdout; print cmdout }'

Use loops in awk

for(i=0;i<10;i++){print $i;}for(i in array){print array[i];}

Print rows in reverse order: (implementation of the tac command)

seq 9| \awk '{lifo[NR] = $0; lno=NR} \END{ for(;lno>-1;lno--){print lifo[lno];}} '

Awk implements head and tail commands

head:  awk 'NR< =10{print}' filename
tail:  awk '{buffer[NR%10] = $0;} END{for(i=0;i<11;i++){ \  print buffer[i %10]} } ' filename

Print specified Column
Awk implementation:

ls -lrt | awk '{print $6}'

Cut implementation

ls -lrt | cut -f6

Print the specified text area
Determine the row number

seq 100| awk 'NR==4,NR==6{print}'

Confirm text
Print the text between start_pattern and end_pattern;

awk '/start_pattern/, /end_pattern/' filename
eg: seq 100 | awk '/13/,/15/'cat /etc/passwd| awk '/mai.*mail/,/news.*news/'

Common built-in functions of awk
Index (string, search_string): returns the position where search_string appears in string.
Sub (regex, replacement_str, string): Replace the first part of the regular expression with replacement_str;
Match (regex, string): checks whether regular expressions can match strings;
Length (string): returns the string length.

echo | awk '{"grep root /etc/passwd" | getline cmdout; print length(cmdout) }'

Printf is similar to printf in C language.

seq 10 | awk '{printf "->%4s\n", $1}'

Iterate the rows, words, and characters in the file
1. Each row in the iteration File
While Loop Method

While read line; doecho $ line; done <file.txt changed to sub-shell: cat file.txt | (while read line; do echo $ line; done)

Awk method:

cat file.txt| awk '{print}'

2. iterate every word in a row

for word in $line;do echo $word;done

3. iterate every character
$ {String: start_pos: num_of_chars}: extract a character from the string. (bash text slicing)
$ {# Word}: returns the length of the variable word.

for((i=0;i< ${#word};i++))doecho ${word:i:1);done

Original reprinted address: http://www.linuxprobe.com


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.