2017-12-9linux Basics (16) Text Processing tools

Last Update:2017-12-11 Source: Internet

Author: User

Tags expression engine

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

?? We outline the previous chapter, mainly on the basics of Bash programming, about its programming type, and about the types of programming languages that are procedural and object-based programming, and then we write the first script and how to run them, and then we talk about some bash configuration files, is the profile class and the BASHRC class, which is provided by the login shell, which is provided by the non-login shell, and then to the shell's writing format We also learned how to edit and format the description, then in this chapter, Let's talk about the Linux Text Processing tool.

first, the Three Musketeers of Linux text processing

?? Text processing in Linux the main feature of the Three Musketeers is that grep, sed and awk;grep are a text filtering tool that is filtered in pattern mode, so sed is a streaming editor, so it is also a text editing tool. While Awk is implemented as Gawk on Linux, it is a text report generator that is capable of formatting text and outputting tools in a very aesthetically pleasing manner.
?? So the above three tools will use a regular expression, then the regular expression is a class of special characters and text characters written by the pattern, in this expression, some of the characters do not represent its original literal meaning, but used to represent the function of control or wildcard, and the regular expression is divided into two categories, One is the basic regular expression, the other is the extension regular expression, the difference is that their meta-characters are different, and the function of metacharacters is used for matching and control functions, let us summarize:

??? Regular expression: REGEXP????? A pattern written by a class of special characters and text characters, some of which do not represent their original literal meaning, but are used to represent the function of control or distribution;???????? Divided into two categories:???????????? Basic Regular expression: BRE???????????? Extended Regular expression: ERE???????????????? Metacharacters: For matching and control functions;

Second, the grep tool

?? In this chapter, we mainly talk about the grep tool in the text Processing Three musketeers, search by row, match line by row, then the user specifies its pattern to match, if a row can be matched by the pattern, the matching line is displayed in the standard output. The pattern is the filter condition written by the metacharacters of the regular expression and the text character. However, if these regular expressions are to be identified, the regular expression engine is recognized in the command, and both grep and SED and awk have different regular expression engines, so the metacharacters are different depending on the engine that supports the regular expression.

?? OK, now let's introduce the grep command with the following command format:

??? Grep? [OPTIONS]? PATTERN? [FILE ...]??? Grep? [OPTIONS]? [-E? Pattern?|? F? FILE]? [FILE ...]

??? Function: Text Search tool, according to user-specified "mode (filter)" to match the target text line by row to check; print matching lines;??? Pattern: The filter condition written by metacharacters and text characters of regular expressions;

?? For example, we find the row in the/etc/passwd file that matches the User1 related user;

??? #?grep? " User1 "?/etc/passwd??? User1:x:1004:1004::/home/user1:/bin/bash

?? This is the most basic usage, the quotation mark can not add, because after all, is the string, the reason is because the CentOS 7 has defined its alias, in other words, this is not the original grep command, but the alias of grep, it should be noted that CentOS 6 is not, Debian is not.
?? Then grep supports the following common options:

??? --color=auto: Highlight the text that matches to;??? -I,?--Ignore-case: Ignores the case of characters;??? -O,?--only-matching: Matches only to the string itself;??? -V,?--invert-match: Displays the rows that cannot be matched by the pattern;??? -E,?--extended-regexp: Supports the use of extended regular expression metacharacters;??? -Q,?--Quiet,?--silent: silent mode, that is, do not output any information;??? -A? NUM,?--After-context=num: After # line;??? B? NUM,?--before-context=num: Front # line;??? -C? Num,?-num,?--context=num: Before and after each # line;

?? The meta-characters of the basic regular expressions are divided into the following categories: character matching, number matching, position anchoring and grouping, and references, which we will now enumerate.

??? Basic regular Expression meta-character:?????? Character matches:?????????. : matches any single character;????????? []: matches any single character within the established range;????????? [^]: matches any single character outside the specified range;???????????? [:d igit:],? [: Lower:],? [: Upper:],? [: Alpha:],? [: Alnum:],? [:p UNCT:],? [: Space:]?????? Number of matches: used to limit the number of occurrences of the preceding character, after the character to specify the number of times it appears; default working greedy mode;????????? *: matches the preceding character any time, 0, 1, multiple times;?????????. *: matches any character of any length;????????? \?: matches the preceding character 0 or 1 times, i.e. the preceding character is optional; \+: matches the character before it 1 or more times, that is, the preceding character must appear at least once;? \{m\}: Matches the preceding character m (exact match);? \{m,n\}: Matches its preceding character at least m times, Up to n times;???? \{0,n\}: Up to n times;???? \{m,\}: at least m times;?????? Location anchoring:????????? ^: Anchor at the beginning of the line for the leftmost mode;????????? $: End-of-line anchoring for the right-most side of the pattern;????????? ^pattern: Use PATTERN to match the whole line;???????????? ^$: blank line;???????????? ^[[:space:]]*$: A blank line or a line containing white space characters;

?? We do not only anchor in the position of the line, but also in the position of the word to anchor, then what is the word, and the matching of the basic regular expression of the metacharacters is what, we use the following to introduce:

??? Word: A continuous character consisting of non-special characters (a string) is called a word;??? \<,? or? \b: The first anchor of the word, used for the left side of the word pattern;??? \>,? or? \b: The ending anchor for the right side of the word pattern;??? \<pattern\>: matches complete words;

?? OK, we understand the above regular expression, the following exercises can be referred to the following:
?? 1. Display the lines in the/etc/passwd file that do not end in/bin/bash;

? # grep-v "/bin/bash$"/etc/passwd

?? 2. Find out the two-bit or three-digit number in the/etc/passwd file;

? # grep "[[:d igit:]]\{2,3\}"/etc/passwd

?? 3, find the/etc/rc.d/rc.sysinit or/etc/grub2.cfg file, with at least one blank character beginning, and followed by a non-whitespace character line;

?? 4. Find the result in the "Netstat-tan" command LISTEN followed by 0, 1 or more white-space characters to the end of the line;

? # Netstat-tan | grep "listen[[:space:]]*$"

?? The next step is to start by grouping and referencing, simply by bundling one or more characters together as a whole rather than matching a single character, so if X and y want to match together, enclose it in parentheses, noting that the parentheses are used for special purposes, so to escape, after escaping: ().
?? Then after the grouping is actually can be quoted, the reference is to match the pattern in the block brackets to the content will be automatically recorded in the internal variables by the regular expression engine, the variables are:

??? \1: The character to match the pattern between the first opening parenthesis and the closing parenthesis that matches it, starting from the left, and (referring to what the first pattern matches)??? \2: The character to match the pattern between the second opening parenthesis and the closing parenthesis that matches it, starting from the left side;??? \3: The character to match the pattern between the third opening parenthesis and the closing parenthesis that matches it, starting from the left side;??? ...

?? Let's take a look at the example:

??? He?likes?his?lover.??? He?loves?his?lover.??? She?likes?her?liker.??? She?lovers?her?liker.??? #?grep? " \ (L.. e\). *\1 "? Lover.txt

?? And this is the back reference, the back reference is to refer to the pattern in the preceding grouping brackets, the character to match to, it should be noted that if there is no need to group, but there are references necessary, you have to add parentheses, otherwise you do not know to refer to the paragraph between the content. The parentheses do not refer to the words are only to do the grouping, so the result of the enclosed save can not be referenced.

2017-12-9linux Basics (16) Text Processing tools

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More