Linux Text Processing Tools

Source: Internet
Author: User
Tags control characters printable characters expression engine

One: Linux Text Processing tool

?: Extracting text from a tool
?: File contents: Less and Cat
Cat
-E: Display line terminator $
-N: Numbering each line displayed
-A: Show all control characters
-B: Non-empty line number
-S: Compress consecutive blank lines into a row

TAC: View content Upside-down display (up and down order)
Rev: View content upside-down display (left and right order)
Less: A page-by-page view of a file or stdin output that can be paged up or down.
MORE: A filter that browses a full screen file at a time.

?: File interception: Head and tail
Head
-C #: Specifies the previous # bytes (used when random numbers are taken)

-N #: Specifies the first # line to get

Tail
-C #: Specifies the # bytes after fetching
-N #: Specifies the # line after fetch
-F: Trace display file fd new additions, common log monitoring, equivalent to--follow=descriptor (★)
-F: Trace file name, equivalent to-follow=name--retry (delete trace files, will error, hint).

?: Extract by Column: Cut
-D DELIMITER: Indicates delimiter, default tab (Common)

?: Collect text statistics WC
?-l counts only the number of rows

?-W counts only the total number of words
?-c counts only the total number of bytes (often used to print word lengths)

?-m count number characters total
?-L Displays the length of the longest line in a file

?: Sorting Text sort
-R performs the reverse direction (top to bottom) collation (frequently used when counting IP access times)

-R random ordering (can be used for random numbers)

-N Execution by number size (combined with-R common)
The-f option ignores character capitalization in the (fold) string
-u option (unique) Delete duplicate rows in output
The-t C option uses C as the field delimiter

The-k x option can be used multiple times by using the C character Delimited X column collation

?: Uniq Command: Remove duplicate rows from the input before and after
-C: Shows the number of occurrences per line
-D: Show only rows that have been repeated
-U: Show only rows that have not been duplicated
Note: Repeat for continuous and exact same side
The. Common with the Sort command works together:
Sort Userlist.txt | Uniq-c

?: Linux Text Processing Three Musketeers
?: grep: Text filter (Pattern: pattern) Tool
?: grep, Egrep, fgrep (regular expression search not supported)
?: Sed:stream Editor, text editing tools
?: implementation Gawk on Awk:linux, Text Report Generator

?: grep command option
? --color=auto: Coloring the text to match to a display
? -V: Displays rows that are not matched by pattern (for abnormal use)

? -I: Ignore character case
? -N: Show matching line numbers
? -C: Count the number of matching rows
? -O: Show only the matching string

? -Q: Silent mode, does not output any information
? -A #: After, followed by # lines
? -B #: Before, Front # line
? -c #:context, front and back # lines
? -E: Implementing a logical or relationship between multiple options
Grep–e ' Cat '-e ' dog ' file
? -W: Matches entire word
? -E: Use ere
? -F: Equivalent to Fgrep, does not support regular expressions

? Basic regular Expression meta-character:
Character Matching:
. Match any single character
[] matches any single character within the specified range
[^] matches any single character outside the specified range
[: Alnum:] Letters and numbers
[: Alpha:] represents any English uppercase and lowercase characters, i.e. A-Z, A-Z
[: Lower:] lowercase letter [: Upper:] Uppercase
[: Blank:] white space characters (spaces and tabs)
[: Space:] Horizontal and vertical white space characters (more than [: blank:] contains a wide range)
[: Cntrl:] non-printable control characters (backspace, delete, alarm ...) )
[:d igit:] decimal digits [: xdigit:] hexadecimal digits
[: Graph:] printable non-whitespace characters
[:p rint:] printable characters
[:p UNCT:] Punctuation

? Number of matches: used after the number of characters to be specified, to specify the number of occurrences of the preceding character

    • Match the preceding character any time, including 0 times: Greedy mode: Match as long as possible
      . * Any character of any length
      \? Match its preceding character 0 or 1 times
      + match the characters in front of it at least 1 times
      {n} matches the preceding character n times
      {M,n} matches the preceding character at least m times, up to N times
      {, n} matches the preceding character up to n times
      {N,} matches the preceding character at least n times

      ? Position anchoring: positioning where it appears
      ^ Beginning of the line anchor, for the leftmost mode
      $ line End anchor for the right side of the pattern
      ^pattern$ for pattern matching entire row
      ^$ Empty Line
      ^[[:space:]]*$ Blank Line
      \< or \b The first anchor for the left side of the word pattern
      \> or \b ending anchor; for the right side of the word pattern
      \<pattern\> Match Whole Word

      ? Grouping: () binds one or more characters together as a whole, such as: (Root) +
      The contents of the pattern in the grouping brackets are recorded in the internal variables by the regular expression engine, which
      Some of the variables are named: \1, \2, \3, ...
      ? \1 represents the character that matches the pattern between the first opening parenthesis and the matching closing parenthesis from the left.
      ? Example: (string1+ (string2))
      \1:string1+ (string2)

      \2:string2
      ? Back reference: References the pattern in the preceding grouping brackets matches the character, not the pattern itself
      ? OR: |
      Example: A|b:a or B c|cat:c or cat (c|c) At:cat or cat
      Example: Add user Bash,basher,nologin (whose shell is specified as-s/sbin/nologin) to find the same row as the user name as the shell.

Linux Text Processing Tools

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.