Shell text processing tool highlights

Source: Internet
Author: User
Tags character classes printable characters
This article describes the most common tools for using Shell to process text in Linux: find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, awk; the examples and parameters provided are the most common and practical. The principle I use for shell scripts is to write a single line of command, and try to avoid more than two lines. If there is a more complex task requirement, consider python. find the file and find the txt and PDF files. \ (-name

This article describes the most common tools for using Shell to process text in Linux:
Find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, awk;
The examples and parameters provided are the most common and practical;
The principle I use for shell scripts is to write a single line of commands. Try not to exceed 2 lines;
If you have more complex task requirements, consider python;

Find file search
  • Find txt and PDF files

      find . \( -name "*.txt" -o -name "*.pdf" \) -print
  • Find the regular expression TXT and pdf

      find . -regex  ".*\(\.txt|\.pdf\)$"

    -Iregex: Ignore case-insensitive regular expressions.

  • Negative Parameter
    Search for all non-txt text

       find . ! -name "*.txt" -print
  • Specify search depth
    Print the file in the current directory (depth: 1)

      find . -maxdepth 1 -type f  
Custom Search
  • Search by type:

    Find.-type d-print // only list all directories

    -Type f file/l Symbolic Link

  • Search by Time:
    -Atime access time (unit: Day, unit:-amin, similar to the following)
    -Mtime modification time (content modified)
    -Ctime change time (metadata or permission change)
    All files accessed in the last seven days:

      find . -atime 7 -type f -print
  • Search by size:
    W words k M G
    Search for files larger than 2 K

      find . -type f -size +2k

    Search by permission:

    Find.-type f-perm 644-print // find all files with executable permissions

    Search by user:

    Find.-type f-user weber-print // find the file owned by the user weber
Subsequent actions after finding
  • Delete:
    Delete all swp files in the current directory:

      find . -type f -name "*.swp" -delete
  • Execute action (powerful exec)

    Find.-type f-user root-exec chown weber {}\; // change the ownership in the current directory to weber

    Note: {} is a special string. For each matching file, {} is replaced with the corresponding file name;
    Eg: copy all the files found to another directory:

      find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;
  • Combined with multiple commands
    Tips: If you need to execute multiple commands later, you can write multiple commands as one script. -Execute the script when exec is called;

      -exec ./commands.sh {} \;
-The identifier of the print

'\ N' is used by default as the delimiter of the file;
-Print0 uses '\ 0' as the delimiter of the file to search for files containing spaces;

Grep text search

Grep match_patten file // default access matching row

  • Common Parameters
    -O only outputs matched lines of textVS-V only outputs unmatched text lines
    -C: the number of times the statistics file contains text

      grep -c "text" filename

    -N: print the matched row number.
    -I case insensitive during search
    -L print only the file name

  • Recursive text search in multi-level directories (a programmer's favorite search code ):

      grep "class" . -R -n
  • Match multiple modes

      grep -e "class" -e "vitural" file


  • Grep output file name ending with \ 0: (-z)

      grep "test" file* -lZ| xargs -0 rm


Xargs command line parameter conversion

Xargs can convert input data into command line parameters of a specific command. In this way, it can be used together with many commands. Such as grep, such as find;

  • Convert multi-row output to single-row output
    Cat file.txt | xargs
    \ N is the delimiter between multiple texts.

  • Convert a single row to multiple rows
    Cat single.txt | xargs-n 3
    -N: specifies the number of fields displayed in each row.

Xargs parameter description

-D defines the delimiters (by default, the delimiters of multiple lines with spaces are \ n)
-N: Specify the output as multiple rows.
-I {} specifies the replacement string, which will be replaced during xargs extension. When multiple parameters are required for the command to be executed
Eg:

cat file.txt | xargs -I {} ./command.sh -p {} -1

-0: Specify \ 0 as the input delimiter.
Eg: count the number of program lines

find source_dir/ -type f -name "*.cpp" -print0 |xargs -0 wc -l
Sort sorting

Field description:
-N sort by number VS-d sort by lexicographically
-R reverse sorting
-K N indicates sorting by column N
Eg:

Sort-nrk 1 data.txt sort-bd data // ignore leading blank characters such as spaces
Uniq eliminates duplicate rows
  • Eliminate duplicate rows

      sort unsort.txt | uniq 


  • Count the number of times each row appears in a file

      sort unsort.txt | uniq -c


  • Find duplicate rows

      sort unsort.txt | uniq -d
    You can specify the repeated content to be compared in each line:-s start position-w comparison Character Count


Conversion Using tr
  • General Usage

    Echo 12345 | tr '0-9' '000000' // encryption/Decryption conversion, replace the corresponding character cat text | tr' \ t''' // convert the tab to a space
  • Tr delete character

    Cat file | tr-d '0-9' // Delete All numbers

    -C: Complement

    Cat file | tr-c '0-9' // obtain all numbers in the file cat file | tr-d-c '0-9 \ n' // Delete non-numeric data
  • Tr compressed characters
    Duplicate characters in tr-s compressed text; most commonly used to compress redundant Spaces

      cat file | tr -s ' '
  • Character class
    Various character classes are available in tr:
    Alnum: letters and numbers
    Alpha: letter
    Digit: Number
    Space: white space
    Lower: lower case
    Upper: uppercase
    Cntrl: control (non-printable) characters
    Print: printable characters
    Usage: tr [: class:] [: class:]

      eg: tr '[:lower:]' '[:upper:]'
Cut split text by Column
  • The 2nd and 4th columns of the truncated file:

      cut -f2,4 filename


  • Remove all columns except 3rd columns from the file:

      cut -f3 --complement filename


  • -D specify the delimiter:

      cat -f2 -d";" filename


  • Cut range
    N-the nth field to the end
    -M: 1st fields are M.
    N-M N to M Fields

  • Unit of cut
    -B is in bytes.
    -C in characters
    -F is in the unit of fields (using delimiters)

  • Eg:

    Cut-C1-5 file // print the First to five characters cut-C-2 file // print the first 2 Characters


Paste concatenates text by Column

Concatenates two texts by column;

cat file112cat file2colinbookpaste file1 file21 colin2 book

The default Delimiter is a tab. You can use-d to specify the delimiter.
Paste file1 file2-d ","
1, colin
2, book

Wc statistical line and character tools

Wc-l file // count the number of rows
Wc-w file // count words
Wc-c file // number of characters

Sed text replacement tool
  • First place replacement

    Seg's/text/replace_text/'file // Replace the first matched text in each row


  • Global replacement

       seg 's/text/replace_text/g' file

    After replacement by default, the replaced content is output. If you need to replace the original file directly, use-I:

      seg -i 's/text/repalce_text/g' file
  • Remove blank rows:

      sed '/^$/d' file
  • Variable Conversion
    Matched strings are referenced by TAG.

    echo this is en example | seg 's/\w+/[&]/g'$>[this]  [is] [en] [example]
  • Substring matching tag
    The content of the First Matching bracket is referenced by Mark \ 1.

      sed 's/hello\([0-9]\)/\1/'
  • Double quotation marks
    Sed is usually referenced by single quotes. You can also use double quotation marks. After double quotation marks are used, double quotation marks evaluate the expression:

      sed 's/$var/HLLOE/' 

    When using double quotation marks, we can specify variables in the sed style and replacement string;

    eg:p=pattenr=replacedecho "line con a patten" | sed "s/$p/$r/g"$>line con a replaced
  • Other examples
    String insertion character: converts each line of content (PEKSHA) in the text to PEK/SHA

      sed 's/^.\{3\}/&\//g' file
Awk data stream processing tool
  • Awk script Structure
    Awk 'in in {statements} statements2 END {statements }'

  • Work Mode
    1. Execute the begin statement block;
    2. Read a row from a file or stdin, and then execute statements2. repeat this process until all the files are read;
    3. Execute the end statement block;

Print current row
  • When the print without parameters is used, the current row is printed;

      echo -e "line1\nline2" | awk 'BEGIN{print "start"} {print } END{ print "End" }' 
  • When print is separated by commas (,), the parameters are bounded by spaces;

    echo | awk ' {var1 = "v1" ; var2 = "V2"; var3="v3"; \print var1, var2 , var3; }'$>v1 V2 v3
  • Use-concatenation operator ("" As concatenation operator );

    echo | awk ' {var1 = "v1" ; var2 = "V2"; var3="v3"; \print var1"-"var2"-"var3; }'$>v1-V2-v3


Special variable: nr nf $0 $1 $2

NR: indicates the number of records, which corresponds to the current row number during execution;
NF: indicates the number of fields. The total number of fields corresponding to the current row during execution;
$0: The variable contains the text of the current row during execution;
$1: text content of the first field;
$2: Text Content of the second field;

echo -e "line1 f2 f3\n line2 \n line 3" | awk '{print NR":"$0"-"$1"-"$2}'
  • Print the second and third fields of each row:

      awk '{print $2, $3}' file


  • Number of statistics files:

      awk ' END {print NR}' file
  • Add the first field of each row:

      echo -e "1\n 2\n 3\n 4\n" | awk 'BEGIN{num = 0 ;  print "begin";} {sum += $1;} END {print "=="; print sum }'
Passing external variables
Var = 1000 echo | awk '{print vara}' vara = $ var # input from stdinawk '{print vara}' vara = $ var file # input from file
Filter the rows processed by awk using styles

Awk 'nr <5' # The row number is smaller than 5
Awk 'nr = 1, NR = 4 {print} 'file # print the row numbers equal to 1 and 4
Awk '/linux/' # lines containing linux text (can be specified using regular expressions, super powerful)
Awk '! /Linux/'# lines that do not contain linux text

Set the delimiter

Use-F to set the delimiter (space by default)
Awk-F: '{print $ NF}'/etc/passwd

READ command output

Use getline to read the output of the external shell command into the variable cmdout;

echo | awk '{"grep root /etc/passwd" | getline cmdout; print cmdout }' 
Use loops in awk

For (I = 0; I <10; I ++) {print $ I ;}
For (I in array) {print array [I];}

Eg:
Print rows in reverse order: (implementation of the tac command)

seq 9| \awk '{lifo[NR] = $0; lno=NR} \END{ for(;lno>-1;lno--){print lifo[lno];}} '
Awk implements head and tail commands
  • Head:

      awk 'NR<=10{print}' filename
  • Tail:

      awk '{buffer[NR%10] = $0;} END{for(i=0;i<11;i++){ \  print buffer[i %10]} } ' filename
Print specified Column
  • Awk implementation:

      ls -lrt | awk '{print $6}'


  • Cut implementation

      ls -lrt | cut -f6


Print the specified text area
  • Determine the row number

      seq 100| awk 'NR==4,NR==6{print}'


  • Confirm text
    Print the text between start_pattern and end_pattern;

      awk '/start_pattern/, /end_pattern/' filename
    Eg:
    seq 100 | awk '/13/,/15/'cat /etc/passwd| awk '/mai.*mail/,/news.*news/'


Common built-in functions of awk

Index (string, search_string): returns the position where search_string appears in string.
Sub (regex, replacement_str, string): Replace the first part of the regular expression with replacement_str;
Match (regex, string): checks whether regular expressions can match strings;
Length (string): returns the string length.

echo | awk '{"grep root /etc/passwd" | getline cmdout; print length(cmdout) }' 

Printf is similar to printf in C language.
Eg:

seq 10 | awk '{printf "->%4s\n", $1}'
Lines, words, and characters in the iteration file 1. Each line in the iteration File
  • While Loop Method

    While read line; doecho $ line; done <file.txt changed to sub-shell: cat file.txt | (while read line; do echo $ line; done)
  • Awk method:
    Cat file.txt | awk '{print }'

2. iterate every word in a row
for word in $line;do echo $word;done
3. iterate every character

$ {String: start_pos: num_of_chars}: extract a character from the string. (bash text slicing)
$ {# Word}: returns the length of the variable word.

for((i=0;i<${#word};i++))doecho ${word:i:1);done

This article is the Reading Notes of linux Shell script strategy. The main content and examples in this article are as follows:
Linux Shell script strategy;


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.