DAY7: Text-processing tools and regular expressions

Source: Internet
Author: User
Tags control characters expression engine egrep

August 4, the main learning contents are as follows:

First, the tool of extracting text: Less,cat,head,tail,cut

II. Tools for analyzing text: Wc,sort,diff,patch

Second, grep and regular expressions

Third, Egrep extended regular expression


I. Tools for extracting text

1) File View command:

    • Cat [OPTION] ... [FILE] ...

      -E: Display line terminator $

      -N: Numbering each line displayed

      -A: Show all control characters

      -B: Non-empty line number

      -S: Compress consecutive blank lines into a row

    • Tac

      Features are the same as cat, displaying the contents in reverse of cat

2) Pagination View tool

    • MORE: Paging through files

      More [OPTIONS ...] FILE ...

      -D: Show page flipping and exit tips

    • Less: A page-by-page view of a file or stdin output
      The commands that are useful for viewing are:

      /Text Search text

      n/n jumps to the next or previous match

      Less command is a pager used by the man command

3) display text before or after content

    • Head

      Head [OPTION] ... [FILE] ...

      -C #: Specify get before # bytes

      -N #: Specifies the first # line to get

      -#: Specify the number of rows (same as-n#)

    • Tail

      tail [OPTION] ... [FILE] ...

      -C #: Specifies the # bytes after fetching

      -N #: Specifies the # line after fetch

      -#: Specify the number of rows

      -F: Trace display File New additions, common log monitoring

4) Extract text cut and merge files by column paste

    • Cut [OPTION] ... [FILE] ...
      -D DELIMITER: Indicates delimiter, default tab (-D and delimiter can have no spaces)

      -F Fileds:

      #: Section # Fields

      #,#[,#]: Discrete multiple fields, such as 1,3,6

      #-#: Multiple consecutive fields, such as 1-6

      Mixed use: 1-3,7

      -C by character cut--output-delimiter=string Specify output delimiter

      Cut-d:-f1/etc/passwd

      cat/etc/passwd | Cut-d:-f7

      Cut-c2-5/usr/share/dict/words

    • Paste merge two files with row number columns to one line
      -D delimiter: Specify Delimiter, default tab

      -S: All rows are composited on a single line display

      Paste F1 F2

      Paste-s F1 F2


Ii. Text Analysis Tools

1) Text data statistics

WC: Count lines, total words, total characters (and total bytes), can run on data in a file or stdin

WC Story.txt

237 1901 Story.txt

Number of characters in the line number of digits

-L count of rows only

-W counts only the total number of words

-c counts only bytes total

-m count number characters total


2) Text sorting

Sort: Display the sorted text in stdout (default by character size) does not change the original file

Sort [Options] file (s)
-R performs reverse direction (top to bottom) finishing

-N Execution by numeric sizing

The-f option ignores character capitalization in the (fold) string

-u option (unique) Delete duplicate rows in output

The-t C option uses C as the field delimiter

The-k x option can be used multiple times by using the C character Delimited X column collation


3) In addition to weight

Uniq: Remove duplicate front and back rows from input

Uniq [OPTION] ... [FILE] ...

-C: Shows the number of occurrences per line

-D: Show only rows that have been repeated

-U: Show only rows that have not been duplicated: continuous and exact duplicates

Commonly used with the sort command: Sort Userlist.txt | Uniq-c


4) Compare files

diff: Compare the differences between two files per line,

diff [OPTION] ... [OLDFILE] [NEWFILE] Shows differences and measures compared to oldfile and NEWFILE

Diff Foo.conf-broken Foo.conf-works

5C5 (Note that there is a difference in line 5th)

< Use_widgets = No

---

> use_widgets = yes

-U Displays the context of the changed row, default 3 rows (for patch files)

Diff/path/to/oldfile/path/to/newfile >/path/to/patch_file

Diff can also be used to compare two different directories, showing the difference between each of these files


Patch: Copy Changes to file (patch to file)

Patch-i/path/to/patch_file/path/to/oldfile

Patch/path/to/oldfile </path/to/patch_file

-B option to automatically back up changed files


Iii. grep and regular expressions

1) Three Musketeers of text processing on Linux

grep: Text filter (Pattern: pattern) Tool

grep, Egrep (supports extended regular expressions), Fgrep (regular expression search not supported)

Sed:stream Editor, text editing tools

Implementation Gawk on Awk:linux, Text Report Generator


2) Grep:global search REgular expression and Print out of the line

Function:: Text Search tool, according to user-specified "mode" to match the target text line by row to check; print matching lines; pattern: Filter conditions written by regular expression characters and text characters

grep [OPTIONS] PATTERN [FILE ...]

grep root/etc/passwd

grep [OPTIONS] PATTERN [FILE ...] grep root/etc/passwd

Command options:

--color=auto: Coloring the text to match to a display

-V: Shows rows that cannot be matched to pattern

-I: Ignore character case

-N: Show matching line numbers

-C: Count the number of matching rows

-O: Show only the matching string

-Q: Silent mode, do not output any information (with echo $?). Can be used to write scripts)

-a #:after, showing the following # lines at the same time

-B #: Before, Front # line

-c #:context, front and back # lines

-E: Implementing a logical or relationship between multiple options

Grep–e ' Cat '-e ' dog ' file

-W: Entire line matches Whole word

-E: Regular expression using ere extension


3) Regular Expressions

    • REGEXP: A pattern written by a class of special characters and text characters, in which some characters (metacharacters) do not represent literal meanings, but are functions that represent control or a wildcard

    • Program support: grep, VIM, Less,nginx, etc.

    • Divided into two categories: basic Regular Expression: BRE extended Regular expression: ERE

    • Meta-character classification: character matching, number of matches, position anchoring, grouping


4) Basic Regular expressions

  • Character matching

    . : Matches any single character

    []: matches any single character within the specified range

    [^]: matches any single character outside the specified range

    [:d Igit:], [: Lower:], [: Upper:], [: Alpha:], [: Alnum:], [:p UNCT:], [: Space:]

  • Number of matches (the default works in greedy mode: match as long as possible)

    Used after the number of characters to be specified, to specify the number of occurrences of the preceding character

    *: matches the preceding character any time, including 0 times

    . *: Any character of any length

    \?: match its preceding character 0 or 1 times

    \+: Matches the preceding characters at least 1 times

    \{m\}: Matches the preceding character m times

    \{m,n\}: Matches the preceding character at least m times, up to N times

    \{0,n\}: Matches the preceding character up to n times

    \{m,\}: Matches the preceding character at least m times

  • Position anchoring: positioning where it appears

    ^: The beginning of the line is anchored to the leftmost side of the pattern (^root starts with root)

    $: End of line anchor for the right-most side of the pattern (root$ line with Root)

    ^pattern$: Used for pattern matching of entire rows (only this pattern for the entire row)

    ^$: Blank line (white space character not included)

    ^[[:space:]]*$: Blank lines (blank lines or lines that contain white space characters)

    Word: A continuous character (string) consisting of non-special characters (including numbers, without special characters)

    \< or \b: The first anchor of the word, used for the left side of the word pattern

    \> or \b: the ending anchor; for the right side of the word pattern

    \<pattern\>: Match Whole word

  • Grouping and referencing

    Group: \ (\): Binds one or more characters together as a whole, such as: \ (root\) \+
    Note: The contents of the pattern matching in the grouping brackets are recorded in internal variables by the regular expression engine, which are named: \1, \2, \3, ...

    \1: The character that matches the pattern between the first opening parenthesis and the matching closing parenthesis, starting from the left

    Example: \ (string1\+\ (string2\) *\)

    \1:string1\+\ (string2\) *

    \2:string2

    Back reference: References the pattern in the preceding grouping brackets to match the character (not the pattern itself)


Four, EGRP and extended regular expressions

1) egrep

Egrep = Grep-e

Egrep [OPTIONS] PATTERN [FILE ...]


2) Extended Regular expression

    • Character matching (same as basic regular expression)

    • Number of Matches

      *: matches the preceding character any time

      ?: 0 or 1 times

      +:1 Times or more

      {m}: matches M-Times

      {M,n}: At least m, up to N times

      {0,n} {m,}

    • Position anchoring (same as basic regular expression)

    • Group

      ()

      Back reference: \1, \2, .....

    • Or

      A|b

      C|cat:c or Cat

      (c|c) At:cat or cat


3) Fgrep

Regular expression metacharacters are not supported: using Fgrep is better when you don't need to use metacharacters to write patterns


This article is from the "Laugh Monkey" blog, please be sure to keep this source http://xiaomonky.blog.51cto.com/11869371/1835347

DAY7: Text-processing tools and regular expressions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.