1. Regular expressions
In fact, it is a specification, that is, the pattern, constraint string, and so on what kind of format, such as the requirement that the string must begin with a, t end, strings that meet such conditions will use the regular expression.
2. Two sets of libraries
Gnu/linux There are two sets of libraries with regular expression programming, POSIX library, self-contained, Pcre Library, Perl, functional comparison, this article uses Perl.
3. First Experience
$egrep "^a.*t$"/usr/share/dict/words represents a word from the words file that ends with a and t.
$egrep "^a.*t$"/usr/share/dict/words | Wc-c statistics to get the number of words.
4. Character sets and words
“.” Used to match any character except newline characters,. At can match cat, #at, &at ...
[A-z]at: limited to lowercase letters only
[Abc]at: Can only be A or B or C
[A-za-z]at. Matches all English letters, [0-9] matches numbers from 0 to 9.
Description: "[A-z]at]: As long as contains [A-z]at are matched, such as cat, a# $bat \@[email protected]$, and so on, to limit only the cat, to add word constraint/<[a-z]at/>
In Linux, "word" refers to strings separated by non-word characters on both sides.
Non-word characters are any characters except letters, numbers, and underscores.
5, the concept of character class
$egrep "[: upper:]]t$" words, matches the character that begins with a capital letter and ends with a lowercase letter T. such as [[: Uper:]] Many, lowercase letters and so on.
6. Position Matching
"^" At the beginning of the line, "$" at the end
Example: ^a[a-z]t$: Begins with the letter A. T ends with a word in the middle that contains a lowercase letter.
7, character escapes \
This is consistent with the concept of escape characters in languages such as C, C + +, for example, you want to lose ".", but "." Represents any character other than a line break, you can use the \. Is the expression of its own.
8. Repetition
"*" indicates any number of repetitions
"+" means repeat at least 1 times
“? "means repeat 0 or 1 times
"{3}" means repeat 3 times, "{n,m}" means repeat at least n times, at most m times; "{N,}" means repeat at least n times, no upper limit.
9, sub-expression
Also known as "grouping", the more straightforward point is that a lump of string as a whole, enclosed in parentheses (), the overall appearance.
Example: "(my) {2}&t" means the string ending with T, my two occurrences. At this time my is a whole. Mymyt etc. will be matched.
10. Anti-righteousness
is to match the original match to the opposite. Overall inversion.
[^y]: can match except for the y character. [6AOEIU]: all characters except AOOEIU.
Note: ^[^y] indicates that the beginning of the line is not the difference between the y,^ at the beginning and the opposite position.
11. Branch
^ht$: The beginning of the line is H, and the end is T, indicating and.
^h|t$: The beginning of the line is H, or the end is T, indicating or.
Example: Jan (uary| |\.): Matches January or January or Jan.
12. Reverse Reference
The content captured in a subexpression (grouping) can be reused elsewhere in the regular expression, and the user may use the backslash "\" and the label from the expression to refer to what the group matches.
(\<.*\>).? () *\1: One of the \1 Front (\<.*\>), which represents the first expression matching any length grade
From left to right the first occurrence of the self-expression is 1, the second is 2 ...
......
Of course, the regular expression matching is the door brainiac, usually we can only write simple, specific applications also have to write some more complex regular expressions, and the regular expression of the writing is very error-prone, mainly consider not all.
You can also refer to the following article, written very good, very suitable for getting started, simple application: Regular expression 30-minute introductory tutorial
The regular expression of shell programming