Linux Shell Text Processing tool Highlights

Source: Internet
Author: User
Tags character classes printable characters

This article describes the most common tools for using the shell to process text under Linux:

Find, grep, Xargs, Sort, uniq, tr, cut, paste, WC, sed, awk;

The examples and parameters provided are the most commonly used and most practical.

I use the principle of Shell script is command line writing, try not to more than 2 lines;

If you have more complex task requirements, consider python.

Find File Lookup
    • Find txt and PDF files
        Find. (-name "*.txt"-o-name "*.pdf")-print
    • Regular way to find. txt and PDF
        Find. -regex  ". * (. txt|. PDF) $ "

      -iregex: Ignoring case-sensitive regular

    • Negation parameter find all non-txt text
         Find. ! -name "*.txt"-print
    • Specifies that the search depth prints out the current directory file (depth is 1)
        Find. -maxdepth 1-type F
Custom Search
    • Search By Type:
        Find. -type d-print  //List all directories only

      -type f File/L Symbolic link

    • Search by Time:-atime access Time (in days, minutes units-amin, similar)-mtime modification time (content modified)-ctime change time (metadata or permission changes)

      All files that have been visited in the last 7 days:

        Find. -atime 7-type F-print
    • Search by Size: W-word k M g looking for files larger than 2k
        Find. -type f-size +2k

      Search by permissions:

        Find. -type f-perm 644-print//Find all files with executable permissions

      Search by User:

        Find. -type f-user weber-print//Find the files owned by the user Weber
Follow-up action found after
    • Delete: Delete all SWP files in the current directory:
        Find. -type f-name "*.SWP"-delete
    • Perform actions (powerful exec)
        Find. -type f-user root-exec chown Weber {}; Change ownership in the current directory to Weber

      Note: {} is a special string, for each matching file, {} will be replaced with the corresponding file name;

      Eg: Copy all the found files to another directory:

        Find. -type f-mtime +10-name "*.txt"-exec cp {} old;
    • Combine multiple commands tips: If you need to execute multiple commands later, you can write multiple commands into a single script. Then execute the script when the-exec is called;
        -exec./commands.sh {} \;
Delimiter of the-print

Use ' n ' as the file delimiter by default;

-print0 use "as the delimiter for the file, so you can search for files that contain spaces;

grep Text Search

grep match_patten File//default access matching line

    • Common Parameters-O output only matching lines of text vs-v only output no matching lines of text the number of times the text is contained in the. C Statistic File
        Grep-c "text" filename

      -N Prints matching line numbers

      -I ignore case when searching

      -L print File name only

    • Recursive search of text in a multilevel directory (the programmer searches for code favorites):
        grep "Class". -r-n
    • Match multiple patterns
        Grep-e "Class"-E "vitural" file
    • grep output as a filename for the trailing character: (-Z)
        grep "Test" file*-lz| xargs-0 RM
Xargs command-line argument conversions

Xargs can convert input data into command-line arguments for a particular command, so that it can be combined with a number of commands. Like grep, like find;

    • Convert multi-line output to single-line output cat file.txt| XARGSN is a delimiter between multiple lines of text
    • Convert a single line to multiline output cat Single.txt | Xargs-n 3-n: Specify the number of fields to display per row
Xargs parameter Description

-D defines delimiters (the delimiter is n for multiple lines by default)

-n Specifies that the output is multiple lines

-I {} Specifies the replacement string, which is replaced when the xargs extension is used, when multiple arguments are required for the command to be executed

eg

Cat File.txt | Xargs-i {}./command.sh-p {}-1

-0: Specified as input delimiter

Eg: number of statistical program lines

Find source_dir/-type f-name "*.cpp"-print0 |xargs-0 wc-l
Sort sorts

Field Description:

-N Sort by number vs-d in dictionary order

-R Reverse Order

-k n Specifies sorting by nth column

eg

SORT-NRK 1 DATA.TXTSORT-BD Data//ignore leading whitespace characters such as spaces
Uniq Eliminating Duplicate rows
    • Eliminate duplicate rows
        Sort Unsort.txt | Uniq
    • Count the number of times each line appears in a file
        Sort Unsort.txt | Uniq-c
    • Find duplicate rows
        Sort Unsort.txt | Uniq-d

      You can specify the duplicates that need to be compared in each row:-S start position-W comparison character number

Convert with TR
    • General usage
        echo 12345 | Tr ' 0-9 ' 9876543210 '//Add decryption conversion, replace the corresponding character  cat text| tr ' t ' '  //tab Turn space
    • TR Delete character
        Cat File | Tr-d ' 0-9 '//Delete all numbers

      -C Seeking complement set

        Cat File | Tr-c ' 0-9 '//Get all the numbers in the file  cat file | tr-d-C ' 0-9 n '  //Delete non-numeric data
    • TR compressed characters tr-s the repeated characters that appear in compressed text; most commonly used to compress extra spaces
        Cat File | Tr-s "
    • Character class TR can be used in various character classes: Alnum: Letters and Numbers Alpha: letters

      Digit: Digital

      Space: white space characters

      Lower: lowercase

      Upper: Uppercase

      Cntrl: Controlling (non-printable) characters

      Print: Printable characters

      How to use: TR [: Class:] [: Class:]

        Eg:tr ' [: Lower:] ' [: Upper:] '
Cut split text by column
    • Intercept the 2nd and 4th columns of the file:
        cut-f2,4 filename
    • Go to all columns except column 3rd of the file:
        CUT-F3--complement filename
    • -D Specify delimiter:
        Cat-f2-d ";" FileName
    • Cut range N-nth field to end-M 1th field mn-m N to M Fields
    • The unit of cut-B in bytes-C in Characters-f in fields (using delimiters)
    • eg
        Cut-c1-5 File//print first to 5th character  cut-c-2 file  //print first 2 characters
Paste stitching text by column

Stitch two text together by column;

Cat File112cat file2colinbookpaste file1 file21 colin2 Book

The default delimiter is a tab character, which can be specified with-D

Paste File1 file2-d ","

1,colin

2,book

Tools for WC statistics lines and characters

Wc-l File//Count rows

Wc-w File//Count of words

Wc-c File//Count characters

Sed Text Replacement tool
    • First place replacement
        Seg ' s/text/replace_text/' file   //replace the first matching text of each line
    • Global substitution
         Seg ' s/text/replace_text/g ' file

      After the default substitution, output the replaced content, if you need to replace the original file directly, use-I:

        Seg-i ' s/text/repalce_text/g ' file
    • To remove a blank line:
        Sed '/^$/d ' file
    • Variable conversions A matched string is referenced by Tag &.
      echo this is en example | Seg ' s/\w+/[&]/g ' $>[this] [is  ] [en] [Example]
    • SUBSTRING match tag The first matching parenthesis content is referenced using a tag
        Sed ' S/hello ([0-9])/1/'
    • Double quotation marks the SED is usually quoted as a single quotation mark, or double quotation marks, and double quotation marks are used to evaluate an expression:
        Sed ' s/$var/hlloe/'

      When using double quotes, we can specify variables in the SED style and in the replacement string;

      Eg:p=pattenr=replacedecho "line con a patten" | Sed "s/$p/$r/g" $>line con a replaced
    • Additional sample string Insert character: Converts each line of content in the text (Peksha) to Pek/sha
        Sed ' s/^. {3}/
awk Data Flow processing tool
    • AWK script Structure awk ' begin{statements} statements2 end{statements} '
    • How it works 1. Execute the statement block in Begin, 2. Read a line from a file or stdin, and then execute statements2, repeating the process until the file is fully read; 3. Execute the end statement block;
Print printing when moving forward
    • When you use print without parameters, the current line is printed;
        Echo-e "Line1nline2" | awk ' Begin{print ' "start"} {print} end{print "END"} '
    • When print is separated by commas, the parameters are bounded by spaces;
      echo | awk ' {var1 = ' v1 '; var2 = "V2"; var3= "V3"; print var1, var2, var3;} ' $>V1 V2 v3
    • Use the-stitching method ("" as the stitching character);
      echo | awk ' {var1 = ' v1 '; var2 = "V2"; var3= "V3"; print var1 "-" var2 "-" VAR3;} ' $>v1-v2-v3
Special variable: NR NF $ $ $

NR: Indicates the number of records, in the course of the implementation of the forward number;

NF: Indicates the number of fields, the total number of fields that should go forward during the execution;

$: This variable contains the text content of the current line during execution;

$: The text content of the first field;

$: The text content of the second field;

Echo-e "line1 F2 f3n line2 n Line 3" | awk ' {print NR ': ' $ '-' $ '-' $ '
    • Print the second and third fields of each line:
        awk ' {print $, $ $} ' file
    • Number of rows in the statistics file:
        awk ' END {print NR} ' file
    • Accumulate the first field of each row:
        Echo-e "1n 2n 3n 4n" | awk ' begin{num = 0;  print "Begin";} {sum + = $;} END {print "= ="; Print sum} '
Passing external variables
Var=1000echo | awk ' {print Vara} ' vara= $var #  input from Stdinawk ' {print Vara} ' vara= $var file # input from files
To filter the rows that awk handles with a style

awk ' NR awk ' nr==1,nr==4 {print} ' file #行号等于1和4的打印出来

awk '/linux/' #包含linux文本的行 (can be specified with regular expressions, super powerful)

awk '!/linux/' #不包含linux文本的行

Set delimiter

Use-F to set delimiters (default is a space)

Awk-f: ' {print $NF} '/etc/passwd

Read command output

Using Getline, the output of the external shell command is read into the variable cmdout;

echo | awk ' {"grep root/etc/passwd" | getline cmdout; print Cmdout} '
Using loops in awk

For (I=0;ifor (i in array) {print array[i];}

eg

Print lines in reverse order: (Implementation of the TAC command)

Seq 9| awk ' {LIFO[NR] = $ LNO=NR} end{for (; lno>-1;lno--) {print Lifo[lno];}} ‘
AWK implements head, tail commands
    • Head
         awk ' Nr<=10{print} ' filename
    • Tail
        awk ' {buffer[nr%10] = $;} End{for (i=0;i<11;i++) {   print buffer[i%10]}} ' filename
Print the specified column
    • The awk approach implements:
        LS-LRT | awk ' {print $6} '
    • Cut Mode implementation
        LS-LRT | Cut-f6
Print the specified text area
    • Determine line number
        Seq 100| awk ' Nr==4,nr==6{print} '
    • Determines text that is printed between Start_pattern and End_pattern;
        awk '/start_pattern/,/end_pattern/' filename

      eg

      SEQ 100 | awk '/13/,/15/' cat/etc/passwd| awk '/mai.*mail/,/news.*news/'
awk common built-in functions

Index (string,search_string): Returns the position search_string appears in the string

Sub (regex,replacement_str,string): Replace the first content of the regular match with the REPLACEMENT_STR;

Match (regex,string): Checks if the regular expression matches the string;

Length (String): Returns the string length

echo | awk ' {"grep root/etc/passwd" | getline cmdout; print length (cmdout)} '

printf, similar to the C language, formats the output

eg

Seq 10 | awk ' {printf '->%4sn ', ' $ '
Iterate over the lines, words, and characters in the file 1. Iterate through each line in a file
    • While Loop method
      While read Line;doecho $line;d One < file.txt to sub shell:cat file.txt | (While read line;do echo $line;d one)
    • Awk method: Cat file.txt| awk ' {print} '
2. Iterate through each word in a row
For word in $line;d o echo $word;d one
3. Iterate through each of the characters

${string:start_pos:num_of_chars}: Extracts a character from a string; (bash text slices)

${#word}: Returns the length of a variable word

For ((i=0;i<${#word};i++)) Doecho ${word:i:1);d One

Linux Shell Text Processing tool collection

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.