Vi/vim Regular Expressions

Source: Internet
Author: User
Tags perl regular expression


Needless to say, in the vim of the expression has been very widely used. In the most commonly used /and: S commands , regular expressions are essential. The following is a description of some of the difficulties of regular expressions in vim.

About Magic

There is a magic setting in Vim. The setting method is:

: Set Magic "Settings Magic:set nomagic" Cancel Magic:h Magic "view Help

Vim, after all, is an editor, and the large number of metacharacters contained in regular expressions, if referenced (like Perl), is bound to cause trouble for people who do not understand regular expressions, such as the /foo (1) command, which most people use to find the string foo (1). But if you interpret it as a regular expression, the object being looked up becomes foo1 .

As a result, vim specifies that the meta-character of the regular expression must be escaped with a backslash, as in the example above, if it is true to use a regular expression, it should be written as /foo\ (1\) . But, like. * This extremely common meta-character, plus the backslash is too troublesome. And, tune, some people like to use regular expressions, some people do not like to use ...

To solve this problem, vim sets the magic of this thing. To put it simply, magic is to set which metacharacters to add backslashes which are not added . In simple terms:

Magic (\m): except $. * ^ Other meta-characters are inverted slashes .

Nomagic (\m): all meta characters except $ ^ are inverted slashes .

This setting can also be temporarily switched through the \m \m switch in regular expressions. the regular expression after \m is processed according to the magic, and the regular expression after \m follows Nomagic , ignoring the actual magic setting.

For example:

/\m.* # Find any string

/\m.* # Find string ". *"

In addition, there are more powerful \v and \v.

\v (that is, very magic): No meta-characters are added with backslashes

\v (meaning very nomagic): Any metacharacters must be inverted slash

For example:

/\v (A.C) {3}$ # find ABCACCADC at end of line

/\m (A.C) {3}$ # find end of Line (ABC) {3}

/\m (A.C) {3}$ # find end of line (A.C) {3}

/\v (A.C) {3}$ # Find Anywhere (A.C) {3}$

Usage of the regular expression () and []:


1. (a\d) {2}---->

(ABC)?---->0 or 1 ABC----->

(ABC) +---->1 more than ABC-------->

(ABC) *----->0 one or more ABC--------->


2. (abc|123)---->


2.1gr (a|e) y----> match gray or Grey---> equivalent to Gr[ae]y

2.2 (doctor| Dr\.?) ---> Match doctor Dr-----, 0 or 1;

PS (doctor|dr.?)  This can also match doctor Dr Dr. Different to understand the meaning here:


3. Alternate behavior for error matching: Sometimes unexpected errors occur when using alternating behavior

When using (A|AB) to match AB, only a

With (Ab|a) can match ab------------->


4. Capturing parentheses: In a regular expression, the content that matches the pattern that is between the parentheses is captured


4.1 When there are nested parentheses in the pattern, the number of the variable is once in the position where the round opening parenthesis occurs

([A-za-z] (\d{2})) ((-) \d{2})-----> match a22-33 when matched as follows:






In 5..NET and JavaScript, the variable that matches the first set is specified as "\1"

PS + means "more than one"? means "0 or 1" * means "O or more"

(boy) \1------>

PS: (boy) is a "boy" \1 is a "boy" so can only match Boyboy

(boy) (girl) \1\2----->



Next is [] learning (described in C # syntax---> other syntax although different but regular rules are the same)

1. Simple character groups


2. Range Character Group (range class, used with hyphen "-")

If you want to match 0-9 can be written as [0123456789], but with a range of character groups more concise, can be written/[0-9]/

Matching lowercase English letters can be used with a character set [A-z], matching uppercase English letters with [a-z].

The most critical here is the hyphen "-", not to be understood as a minus sign. Its meaning is "from what to what", such as [A-z] is understood to be from "a" to "Z".

Some points to note

1. Hyphens (-) are only in the character group (square brackets) is a meta-character. Such as

2. Even within a character group, it is not necessarily a meta-character. Such as

In addition, many metacharacters become ordinary characters within the character group, such as (^$?). such as

3. The scope can not be confused, such as only [0-9], can not [9-0]. The range character group is actually the installation character for the ASCII code value to be determined, the value is small in front, the value is large in the back. For example [0-9] The code value of 48~57,[a-z] code value of 97~122,[a-z] is 65~90.

4. Group of characters grouped together by multiple character groups--to note that there are no spaces in the character group, and someone likes to add a space between F and 1 to make reading comfortable, but this is not allowed

--------------------------------------------------------------------------------------------------------------- ---------------------------

3. Exclusion character group (negated character class, used with caret "^")---> that does not match xxx


4. Character group operation (square brackets nested + operator)

Some language support, such as &&+[], is supported in Java, but JavaScript does not support



\s---> Spaces

\S{3}----> Match 3 spaces

\s[1,3]----> match 1 spaces or 2 spaces or 3 spaces

(0-9)----> match ' 0-9 '

[0-9] {1,3} vs [0-9]{1,4} vs [0-9]{1,2} How does this work? I didn't read the grammar.

(a) {1,3} vs (a){1,4} vs (a){to} How does this work? I didn't read the grammar.


1{n} repeats n times

1{m,n} minimum repeat m times, up to n times---------------------------->

1{m,} minimum repeat M---------------------------->--> string 0-9 Repeat Number 10 times so \d{9,} is able to match this string out.

Thank you for the blog post of bloggers.

The default setting is Magic,vim also recommend that you use the magic settings, when there are special needs, directly through the \v\m\m\v.

The metacharacters used below in this article are all in magic mode.


Vim's quantifiers are not inferior to Perl. The comparison between Vim's quantifier and Perl's quantifier

Vim Perl Significance
* * 0 or more (matches first)
\+ + 1 or more (matches first)
\? or \= ? 0 or 1 (match first), \? Use in command (reverse lookup)
\{N,M} {N,m} N to M (match first)
\{n,} {N,} Min N (match first)
\{,M} {, M} Up to M (match first)
\{n} N Exactly N of
\{-N,M} {n,m}? N to M (ignore precedence)
\{-} *? 0 or more (ignore precedence)
\{-1,} +? 1 or more (ignore precedence)
\{-,1} ?? 0 or 1 (ignore precedence)

Surround and cure Groups

Vim also supports the function of surround and cure grouping, powerful, like a look at the interpretation of the Yurii, please refer to the "Proficient regular expression" book.

vim perl meaning
\@= (? = order look around
\@! (?!
\@<= (? <= reverse look around
\@<! (? <! reverse negative look around
\% (atom\) (?: non-capturing brackets

Slightly different from Perl, the location of the surround view and cure groupings in Vim is different from Perl. For example, finding the Bar,perl immediately following Foo writes the pattern in the parentheses around the look, and Vim writes the pattern before the meta-characters that surround it.

# Perl's notation/(? <=foo) bar/

# Vim's notation/\ (foo\) \@<=barvim regular expression wrote

Meta-character Description
. Match any one character
[ABC] matches any one of the characters in the square brackets. You can use-to represent a range of characters, such as [a-z0-9] matches lowercase letters and Arabic numerals. [^ABC] starts with the ^ symbol in square brackets, which means that any character other than the character in square brackets is matched.
\d matches the Arabic numerals, equivalent to [0-9].
\d matches any character other than Arabic numerals, equivalent to [^0-9].
\x matches a hexadecimal number, equivalent to [0-9a-fa-f].
\x matches a hexadecimal number, equivalent to [^0-9a-fa-f].
\w match Word letters, equivalent to [0-9a-za-z_].
\w matches any character other than the word letter, equivalent to [^0-9a-za-z_].
\ t matches the <TAB> character.
\s matching whitespace characters, equivalent to [\ t].
\s matches non-whitespace characters, equivalent to [^ \ t].
\a All the alphabetic characters. equivalent to [a-za-z]
\l Small Letter [A-z]
\l non-lowercase letters [^a-z]
\u Capital Letters[A-z]
\u non-capital letters [^A-Z]

represents the number of metacharacters
Metacharacters description
* Match 0-any one
\+ match 1-any attention to the preceding \
match 0-1 Note the preceding \
\{n,m} matches N-mnote the previous \
\{n} matches Nnote the previous \
\{n,} matches N-anynote the previous \
\{,m} matches 0-m note the previous \
\_. Match all characters that contain line breaks
\{-} indicates that the previous character can occur 0 or more times, but the fewer characters are matched if the entire regular expression can match successfully
\= match an optional item
\_s match spaces or break

Meta-character Description
\* matches * characters.
\. The. Character.
\ \ matches \ characters.
\[matches the [character.

symbol that represents the position
Metacharacters description
$ Match Line End
^ Match beginning of Line
\< match Word first
\> match Word endings

Substitution Variables
using the \ (and \) notation in regular expressions, you can use variables such as \1, \2, and so on to access the contents of \ (and \) later.

Lazy Mode
\{-n,m} like \{n,m}, repeat as few times as possible
\{-} matches the item in front of it one or 0 times, as little as possible
\| "or" operator
\& Juxtaposition

function type
: s/Replacement string/\= function
In a functional style, you can use Submatch (1), Submatch (2), and so on to refer to \1, \2, and so on, while Submatch (0) can reference the entire contents of the match.

What is the difference from a Perl regular expression?
The difference between metacharacters
Vim Syntax perl syntax meaning
\+ + 1-any one
\?       ? 0-1 x
\{n,m} {n,m} n-m
\ (and \) (and) group

For example:
1, remove all line trailing spaces: ":%s/\s\+$//". "%" means to replace the entire file scope, "\s" denotes whitespace characters (spaces and tabs), "\+" matches the preceding characters one or more times (the more the better), "___fckpd___0rdquo; matches the end of the line (using the \___fckpd___0rdquo; Denotes a simple "___fckpd___0rdquo; character"; The replaced content is empty; Because a line can be replaced at most, no special flags are required. This is still relatively simple. (/<space><tab>)
2, remove all blank lines: ":%s/\ (\s*\n\) \+/\r/". This is a lot more "\ (", "\"), "\ n", "\ R", and "*". "*" represents 0 or more occurrences of the preceding character (here, "\s") (the more the better; use "\*" to denote simple "*" characters, "\ n" for line breaks, "\ r" for carriage returns, "\ (" and "\)" to group the expressions so that they are considered an integral whole. Thus, the full meaning of this expression is to replace the successive newline characters (including the contiguous white space character that may precede the line break) with a single line break. The only thing that is special is that "\ n" is used in the pattern, but "\ n" is not used in the replaced content and only "\ r" is used. The reason is history, and if you are interested, you can check out ": Help Nl-used-for-nul".
3. Remove all "//" Comments: ":%s!\ s*//.*!!". The first thing to notice here is that the delimiter is replaced by "!" because the "/" character is used in the pattern or string part, and the words "/" are written "/" Each time the "/" character is used instead of the other delimiter, and the command above is written as ":%s/\s*\/\/.*//" and is less readable. The command itself is quite simple, and people who have used regular expressions know that "." Matches any character other than the line break.
4. Remove all the "/* */" comments: ":%s!\s*/\*\_.\{-}\*/\s*!!g". This is slightly more complicated and uses several less commonly used Vim regular expression features. “\_.” Matches all characters including line breaks; "\{-}" means that the previous character can occur 0 or more times, but the number of matched characters is better when the entire regular expression can be matched successfully; the flag "G" indicates that a row can be matched and replaced multiple times. The result of the substitution is a space to ensure that expressions such as "int/* space not necessary around comments */main ()" are still valid after the substitution.

: g/^\s*$/d Delete only blank lines

: s/\ (\w\+\)\s\+\ (\w\+\)/\2\ t\1 will Data1 Data2 modified to Data2 data1

:%s/\ (\w\+\), \ (\w\+\)/\2 \1/Change Doe, John to John Doe

:%s/\<id\>/\=line (".") replaces the ID string of each row with the line number

:%s/\ (^\<\w\+\>\)/\= (Line (".") -10). ".". Submatch (1) Replace the word at the beginning of each line with (line number-10). The word format, such as line 11th, replaces word with 1. Word


Vi/vim Regular Expressions

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.