Text-processing regular expressions and grep

Source: Internet
Author: User
Tags diff uppercase letter expression engine egrep

Linux is a file-based system, you can think of all the commands of Linux is the operation of the file (some operations are variables), proficient in the Linux Text Processing tool is all learning Linux What is necessary, especially for those who are ready to enter the Linux industry, is that we will encounter a lot of text with huge content, and the batch processing and precise positioning will become the daily operation of the work. Sometimes even if we have mastered all the text processing tools, the face of complex text will feel no way to start, it is very normal,Linux Text processing skills need time and practice to accumulate, so do not feel tired, below to comb the daily common to the command bar.

First, the basic Text Processing command of Linux:

1, Cat, connect files and print to standard output

Cat [OPTION] ... [FILE] ...

-A equates to:-vet

-B Non-empty lines display line numbers, which are overridden by- n

-e equivalent to -ve

-e in each line plus $

-N Display line number

-S compression continuous empty behavior a blank line (empty behavior carriage return if there is a space before the carriage return does not compress)

-T equates to -VT

-T display tab as ^i

-V use ^ and m -symbols, except LFD and TAB

similar to cat commands

TAC: Text Inverted line display

Rev: Each line of text is shown in reverse

View text:less and more:better

Head and Tail command:

Head: Default display of the first 10 lines of text

-C displays only the first few bytes

-N Displays the first few rows

Tail: Display after 10 lines

The-c and -n options are the same as head

-F : Dynamic monitoring last few lines

How do I monitor /var/log/secure and print changes only when changes occur?

[Email protected] ~]# tailf-n 0/var/log/secure &cut

-D: Specifies the delimiter, which is tab by default

-C: Cut a field by the number of characters

-F: Pick field

Paste: Merge two file companion numbers together

-D: Specifies the delimiter, which is tab by default

-S: All lines of a file are displayed as one line

WC: Data for statistical files

-L: Count rows

-W: Count the number of words

-C: Statistics of bytes

-M: Statistics of characters

Sort: Sorts the lines of a file

-B: Ignores whitespace at the beginning of the line and compares the first non-null character with other rows

-F: Compares lowercase letters to large letters, ignoring case and all uppercase comparisons

-G: sorted by regular numbers, with large numbers in the back

-N: By string numeric comparison, and-g difference

-R: Reverse order

-U: Delete duplicate rows in output

-T: Specify delimiter

-K: Select the field to compare

Uniq: report or omit duplicate rows (same as repeat),

-C: Displays the number of duplicate rows

-D: Duplicate rows are displayed

-U: Show rows that are not duplicated

In general, sort the duplicate rows together and then use Uniq to redo the statistics.

diff: Compare the contents of a file one line at a Vimdiff more intuitive

-U: Output comparison results in a uniform format for the patch command to recover files

Patch: Restore the original file with diff file and name the resulting file as the base file name (be sure to add the-B option)

-B automatically backs up the base file for File.orig

1. Find all IPV4 addresses of the ifconfig command results in this machine:

[Email protected] ~]# ifconfig |grep "netmask" |cut-d ' n '-f2|grep "\b[[:d igit:]]\{1,3\}. [[:d igit:]]\{1,3\}. [[:d igit:]]\{1,3\}. [[:d igit:]]\{1,3\}\b]-o10.1.70.102127.0.0.1192.168.122.1

2. Find out the maximum percentage value of partition space utilization

[[email protected] ~]# df |tr-s ' |cut-d '-f5|egrep-o ' [0-9]{1,2} ' |sort-nr|head-138

3, identify the user UID maximum user name, UID and shell type

[[email protected] ~]# sort-nr-t:-k3/etc/passwd|head-1|cut-d:-f1,3,7nfsnobody:65534:/sbin/nologin

4. Find out the permissions of/tmp and display them digitally

[Email protected] ~]# stat/tmp/|grep "Access" |head-1|cut-c11-13777

5. Count the number of connections to each remote host IP currently connected to this machine, and sort from large to small

[[email protected] ~]# netstat-nt|tr-s ' |cut-d '-f5|egrep ' \b ([0-9]{1,3}.) {3} [0-9] {1,3}\b "-o|sort|uniq-c 2 10.1.250.91

Second,the Linux Three Musketeers grep

grep: (Global search regularexpression and print out of the line) Text Search tool, according to the user-specified "mode" to match the target text lines to check; print matching lines.

Patterns: Filter conditions written by regular expression characters and text characters

The grep family of Unix includes grep, Egrep, and Fgrep. Egrep and Fgrep commands are only a small difference from grep. Egrep is the extension of grep, which supports more re metacharacters, and fgrep is fixed grep or fast grep, which regards all the letters as words, that is, the metacharacters in the regular expression represents the literal meaning back to itself, no longer special. Linux uses the GNU version of grep. It is more powerful and can use the Egrep and FGREP functions with the-G,-e,-f command line options.

grep has many options, with detailed options to view

Http://www.lampweb.org/linux/3/27.html

The more commonly used options are listed below

--color=auto: Coloring the text to match to a display

-V: Shows rows that cannot be matched to pattern

-I: Ignore character case

-N: Show matching line numbers

-C: Count the number of matching rows

-O: Show only the matching string

-Q: Silent mode, do not output any information, commonly used as a script to judge

-E: Implementing a logical or relationship between multiple options

-W: Entire line matches whole word, words can include letters, numbers, and underscores

-A: Display matching rows and their first few rows

-B: Show matching rows and their first few rows

-C: Display matching rows and their upper and lower rows

-L: Print filenames that match the matching pattern

-H: Add file name before matching line is displayed



Third, regular expression

REGEXP: A pattern written by a class of special characters and text characters, in which some characters (metacharacters) do not represent literal meanings, but are functions that represent control or a wildcard

Regular expressions are commonly used to match text content, and shell wildcard constants are used to match file paths

Help can be viewed through the man 7 regex

Regular expression meta-character classification: character matching, number of matches, position anchoring, grouping

Regular expressions can be divided into two categories:

1. Basic Regular Expression Bre

Character Matching:

. : Matches any single character

[]: matches any single character in square brackets

[^]: matches characters specified in non-square brackets

[[:d Igit:]] matches a single number, same as [0-9], using both brackets

[[: Alpha:]] matches any single case letter

[[: Lower:]] matches a single lowercase letter

[[: Upper:]] matches any single uppercase letter

[[: Alnum:]] matches a single case or number

[[:p UNCT:]] matches any single punctuation mark

[[: Space:]] match a single space

Number of matches:

*: matches the preceding character any time (greedy mode, followed by a question mark to cancel greedy mode)

. *: Arbitrary length of any character

\?: match the preceding character 0 or one time

\+: Matches the preceding character one or more times

\{m,n\} matches the preceding characters m to n times,

If M is zero, then the maximum match n times

If n is zero, the minimum match is m times

Location anchoring:

^: match starts with a character

$: Match ends with a character

^$: Matches lines without spaces, that is, only the return dealer

^[[:space:]]*$: Matches lines that include spaces

\< or \b: The first anchor of the word

\> or \b: Final anchoring

Group:

\ (\): Binds one or more characters together and treats them as a whole

The contents of the pattern in the grouping brackets are recorded in internal variables by the regular expression engine, which are named: \1, \2, \3, ...

\1: The character that matches the pattern between the first opening parenthesis and the matching closing parenthesis, starting from the left

Back reference: References the pattern in the preceding grouping brackets to match the character (not the pattern itself)


2. Extending the regular expression ere

GREP-E or egrep supports regular expressions

Number of matches:

*: matches the preceding character any time

? : 0 or 1 times

+:1 Times or more

{m}: matches M-Times

{M,n}: At least m, up to N times

Location anchoring:

^: Beginning of the line

$: End of line

\<, \b: The head of the word

\>, \b: suffix

Group:

() or \ (\) are supported

"or" matches:

A|b: Match A or b

C|cat:c or Cat

(c|c) At:cat or cat

For details, please see the following blog post:



Text-processing regular expressions and grep

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.