Linux Operations Foundation Text Processing

Source: Internet
Author: User
Tags control characters print format expression engine egrep

  • 1 Tools for extracting text
    File content: Less and cat
    File interception: Head and tail
    Extract by column: Cut
    Extract by keyword: grep
  • 1.2 File View
    File View command:
    Cat,tac,rev
    Cat [OPTION] ... [FILE] ...
    -E: Display line terminator $
    -N: Numbering each line displayed
    -A: Show all control characters
    -B: Non-empty line number
    -S: Compress consecutive blank lines into a row
    Tac
    Rev

    1.3 Page View the contents of a file
    MORE: Paging through files
    More [OPTIONS ...] FILE ...
    -D: Show page flipping and exit tips
    Less: A page-by-page view of a file or stdin output
    The commands that are useful for viewing are:
    /Text Search text
    n/n jumps to the next or previous match
    Less command is a pager used by the man command

    1.4 Extract text cut and merge files by column paste
    Cut [OPTION] ... [FILE] ...
    -D DELIMITER: Indicates delimiter, default tab
    -F Fileds:
    #: Section # Fields
    #,#[,#]: Discrete multiple fields, such as 1,3,6
    #-#: Multiple consecutive fields, such as 1-6
    Mixed use: 1-3,7
    -C cut by character
    --output-delimiter=string specifying the output delimiter

    1.5 Tools for analyzing text
    Text data statistics: WC
    Collating text: Sort
    Compare Files: diff and patch

    1.6uniq
    Uniq command: Remove duplicate rows from the input before and after a phase
    Uniq [OPTION] ... [FILE] ...
    -C: Shows the number of occurrences per line
    -D: Show only rows that have been repeated
    -U: Show only rows that have not been duplicated
    Note: Repeat for continuous and exact same side
    Commonly used with the sort command:
    Sort Userlist.txt | Uniq-c

    2Linux Text Processing Three Musketeers
    grep: Text filter (Pattern: pattern) Tool
    grep, Egrep, fgrep (regular expression search not supported)
    Sed:stream Editor, text editing tools
    Implementation Gawk on Awk:linux, Text Report Generator

    Regular Expression
    REGEXP: A pattern written by a class of special characters and text characters in which some characters (metacharacters) do not represent character literal meaning, while the function of a control or a wildcard
    program support: Grep,sed,awk,vim, Less,nginx, Varnish, etc.
    divided into two categories:
    Basic Regular expression: BRE
    Extended Regular expression: ERE
    Grep-e, egrep
    Regular expression engine:
    using different algorithms, check the software module that handles regular expressions
    PCRE (Perl Compatible Regular Expressions)
    metacharacters: Character matching, number of matches, position anchoring, grouping
    Man 7 regex
    character match:
    . Match any single character
    [] Matches any single character in the specified range
    [^] matches any single character outside the specified range
    [: alnum:] The letters and numbers
    [: Alpha:] represent any English uppercase and lowercase characters, i.e. A-Z, a-Z
    [: lower:] lowercase letters [: Upper:] Uppercase Letters
    [: blank:] white space characters (spaces and tabs)
    [: space:] horizontal and vertical whitespace characters (wider than [: blank:])
    [: Cntrl:] non-printable control characters (backspace, delete, alarm ...
    [:d igit:] decimal digit [: xdigit:] Hexadecimal number
    [: graph:] printable non-whitespace character
    [:p rint:] printable character
    [:p unct:] Punctuation

    Number of matches: used after the number of characters to be specified, to specify the number of occurrences of the preceding character

      • Matches the preceding character any time, including 0 times
        Greedy mode: Match as long as possible
        . Any character of any length
        \? Match its preceding character 0 or 1 times
        + match the characters in front of it at least 1 times
        {n} matches the preceding character n times
        {M,n} matches the preceding character at least m times, up to N times
        {, n} matches the preceding character up to n times
        {N,} matches the preceding character at least n times
        Position anchoring: positioning where it appears
        ^ Beginning of the line anchor, for the leftmost mode
        $ line End anchor for the right side of the pattern
        ^pattern$ for pattern matching entire row
        ^$ Empty Line
        ^[[:space:] "
        $ blank Line
        \< or \b The first anchor for the left side of the word pattern
        \> or \b ending anchor; for the right side of the word pattern
        \<pattern\> Match Whole Word
        Grouping: () binds one or more characters together as a whole, such as: (Root) +
        The contents of the pattern in the grouping brackets are recorded in internal variables by the regular expression engine, which are named: \1, \2, \3, ...
        \1 represents the character that matches the pattern between the first opening parenthesis and the matching closing parenthesis from the left
        Example: (string1+ (string2))
        \1:string1+ (string2)

        \2:string2
        Back reference: References the pattern in the preceding grouping brackets matches the character, not the pattern itself
        Or: |
        Example: A|b:a or B c|cat:c or cat (c|c) At:cat or cat




    2 vim
    Vim: A pattern editor
    Keystroke behavior is dependent on the "mode" of vim?
    Three main modes:
    Command (Normal) mode: default mode, move cursor, cut/paste text
    Insert (insert) or edit mode: Modify text
    Extended commands (Extended command) mode: Save, exit, etc.
    ESC key? Exiting the current mode
    ESC key? ESC key? Always return to command mode

    3 Tools for handling text sed
    Usage:
    sed [option] ... ' Script ' Inputfile ...
    Common options:
    -N: does not output mode space content to the screen, i.e. does not print automatically
    -e: Multi-point editing
    -f:/path/script_file: Reading the edit script from the specified file
    -R: Supports the use of extended regular expressions
    -i.bak: Backing up files and editing them in place
    Script
    ' Address command '

    4 awk language
    Basic format: awk [options] ' program ' File ...
    program:pattern{action statements;..}
    Pattern and action:
    The pattern section determines when an action statement triggers and triggers an event
    Begin,end
    Action statements the data and places it within {} to indicate
    Print, printf
    separators, fields, and records
    When Awk executes, a delimiter-delimited field (field) tag $1,$2: $n is called a domain identity. $ $ $ For all domains, note: and Shell variable $ characters have different meanings
    Each line of the file is called a record
    Omit action, default to print $
    Print format: Print item1, item2, ...
    Points:
    (1) Comma delimiter
    (2) Each item of the output can be a string, or it can be a numeric value; An expression of a field, variable, or awk of the current record
    (3) If item is omitted, it is equivalent to print $
    Example:
    awk ' {print ' Hello,awk '} '
    Awk–f: ' {print} '/etc/passwd
    Awk–f: ' {print ' Wang '} '/etc/passwd
    Awk–f: ' {print '} '/etc/passwd
    Awk–f: ' {print $} '/etc/passwd
    Awk–f: ' {print $ ' \ t ' $/etc/passwd} '
    Tail–3/etc/fstab |awk ' {print $2,$4} '

    Linux Operations Foundation Text Processing

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.