Linux regular expression-repeated characters, linux Regular Expression
The asterisk (*) metacharacters indicate that the regular expression before it can appear zero or multiple times. That is to say, if it modifies a single character, it can be there or not, and if it is there, there may be more than one character. You can use asterisks to match words that appear in quotation marks.
□" * Hypertext "* □
Whether or not the word "hypertext" appears in quotation marks is matched.
In addition, if the character modified by the asterisk does exist, it may appear multiple times. For example, let's look at a series of numbers:
1
5
10
50
100
500
1000
5000
The regular expression [15] 0 * matches all rows, and the expression [15] 00 * matches all rows except the first two rows. The first 0 is the literal value, but the second is modified by an asterisk, meaning it may or may not appear. The following expression can be used to match one or more spaces (instead of zero or multiple) in a similar way:
□□ *
Match any number of characters when the asterisk metacharacters are preceded by periods metacharacters. This can be used to identify the span of characters between two fixed strings. If you want to match any string in quotation marks, you can specify:
".*"
It matches all characters and quotation marks between the first and last quotation marks on the row. The range of matching with ". *" is always the largest. At present, it does not seem important, but it is important once you learn to replace the matched string.
As another example, a pair of common symbols used to enclose formatting instructions in the bracket markup language, such as HTML. You can specify the following expression to print all rows with tags:
$ Grep '<. *> 'sample
When an asterisk is used to modify a character class, it can match any number of characters in the class. For example, the following five lines of sample files:
I can do it
I cannot do it
I can not do it
I can't do it
I cant do it
If we want to match the negative statement in the preceding statement but do not match the affirmative statement, we can use the following regular expression:
Can [no'] * t
An asterisk matches any character in the class in any order and appears multiple times. As follows:
$ Grep "can [no'] * t" sample
I cannot do it
I can not do it
I can't do it
I cant do it
There are four successes and one failure (affirmative statement ). Note that if the regular expression tries to match any character between the string "can" and "t", as shown in the following example:
Can. * t
It will match all rows.
Technical term "closure (closed)" has the ability to match "zero or multiple times. The meta character extension set used by egrep and awk provides several very useful closure changes. The plus sign (+) matches the regular expression one or more times. The preceding example of matching one or more spaces can be simplified:
□+
The metacharacter plus sign can be considered as a leading character of "at least one. In fact, it corresponds to the "*" number used by many people.
Question mark (?) Matches zero or one occurrence. For example, in the previous example, we used regular expressions to match "80286", "80386", and "80486 ". If you want to match the string "8086", you can use egrep or awk to write a regular expression:
80 [234]? 86
It matches "80" followed by a 2, a 3, a 4, or no character, and then with the string "86 ". Do not confuse? And in shell? Wildcard. In Shell? Represents a single character, which is equivalent to "." In a regular expression "."
References: http://www.linuxawk.com/communication/436.html