Regular Expression __vim in vim

Source: Internet
Author: User
Tags logical operators lowercase perl regular expression printable characters ranges uppercase letter


Vim as a editing software has a powerful operation instructions, flexible configuration methods, through the appropriate combination to achieve dazzling functionality, and regular expressions as a processing of text and data important tool, and vim similar, through the simple combination of meta characters can match the ever-changing text and data, It is so powerful that some tasks do not have a good way to implement without regular expressions. Here's a look at how these two powerful weapons are combined.



This article translates from the http://www.vimregex.com/, is a more comprehensive vim regular expression introduction.

  2. Introduce

2.1 What is vim.



VIM (vi improve) is an improved version of the VI editor, which is ubiquitous in Unix. Vim is invented by Bram Moolenaar, is a free editor, of course, if you like, you can donate a part of the money.



VIM has its own website for www.vim.org and mailing lists, and the information above covers all aspects of vim. Currently, VIM can run on major operating systems, even the default editor for some Linux hairstyles versions (Redhat).



Vim has many features of modern editing: syntax highlighting, customizable user interface, and easy integration with a variety of Ides, thus having some more appealing features such as failback, Automatic command completion, session management, etc.



Vim has a huge user base, with more than 10 million Linux users, and the number is further increasing. 

2.2 about this tutorial



The only reason I'm writing this tutorial is because I love regular expressions, and there's nothing more exciting than writing a regular expression that's designed to meet your needs, and I hope it's a quote.



But really, regular expressions, as a tool for handling text and data, are embedded in other programming languages or tools, such as the famous grep program in Unix, which looks for the contents of a file according to a certain pattern. You can think of a regular expression as a pattern-matching language, and it would be surprisingly effective to deal with some tricky text problems.

  2.3 Thanks



Thanks to Benji Fisher, Zdenek Sekera, Preben "Peppe" Guldberg, Steve Kirkendall, Shaul Karl (in alphabetical order) and all the people who gave me advice.



If you have any good suggestions or ideas, please send me a letter at any time (olontir at yahoo dot com). 


3. Replacement command

3.1 Find/Replace



: Range S[ubstitute]/pattern/string/cgii
c Each replacement must be confirmed
G Replaces all occurrences of a row (no G replaces only the first matching value, Pingao Note: note differs from%)
I ignore case
I do not ignore case



[] The option 3.2 range operation, line address, and tag are indicated



Before you talk about matching patterns, start by understanding the downlink address. Some commands can accept a row range, so the command is scoped to the scope of execution. Row ranges are usually made up of identifiers separated by commas (,) or semicolons (;), and you can also use the command mi to make a mark in the current position to facilitate later use, "I" can be any letter.


Identifiers
Description
Digital Line number
. When moving forward
$ Last line of File
% Entire file, same as 1,$
' t Tag T
/pattern[/] Next matching line of pattern
? pattern[?] Last matching row of pattern
\/ Next matching row of the last search pattern
\? Last matching row of the most recent search pattern
\& The next matching row of the most recent substitution pattern


If no row is specified, the action is only for the current line.



Here are some examples of



10, 20



-10 to 20 lines



/section 1/+,/section 2/-



-All lines between section 1 and section 2, excluding their rows, + marked plus one,-mark minus one, can repeat multiple



:/section/+ y



-Copy the next matching row of the section



:/Normal P



-Paste into next line of section next matching row



TIP1: If you use/in pattern, be sure to use \ to escape, for example,
S/\/dir1\/dir2\/dir3\/file/dir4\/dir5\/file2/g



To avoid this confusing escape catastrophe, you can customize the separator in Vim, and I like to use a colon (:)



TIP2: Put the following two shortcut key mappings in your VIMRC file,
Noremap;; :%s:::g<left><left><left>
Noremap; ':%s:::cg<left><left><left><left>



With these two shortcuts, you'll save a lot of keystrokes, and it will go directly to the search mode, enter the search section and then enter the replacement section and press ENTER. The second shortcut key adds a confirmation flag. 


4. Pattern Description 

4.1 Anchor



If you want to replace all VI with VIM, it is easy to think of the following command,



S/vi/vim/g



But if you do, you will find that it will replace All VI with VIM, and even VI is part of a word that may not be what you want.



You might also think that by adding spaces on both sides of VI to achieve the desired effect,



S:vi:vim:g



You will find that the result has not changed, the correct way is to use the word boundary sign \<\>



S:\<vi\>:vim:g



The line begins and ends with its own identifier ^ and $, replacing all the rows that appear at the beginning of VI,



S:^vi\>:vim:



If only vi in a row is substituted,



S:^vi$:vim:



Now assume that you will not only replace VI but also to replace VI, VI, there are several ways to achieve, the simplest way is to use the I logo,%s:vi:vim:gi defined character class (character Class),:%s:[vv]i:vim: Will replace all VI and VI 


4.2 Escape character or meta character



So far, all of the matching patterns (pattern) are made up of some normal characters, the real strength of regular expressions is in metacharacters (Metacharacter), which refer to characters that have special meanings, and often have a backslash in front of them, as shown in the following table ,


# Match # Match
. Any character except the line feed
\s White space characters \s Non-whitespace characters
\d Digital \d Non-numeric
\x Hexadecimal \x Non-hexadecimal
\o Octal \o Non octal
\h Word Head (a-za-z_) \h Non-word header
\p printable characters \p Non-printable characters
\w Word Letter \w Non-word letters
\a Letters \a Non-letter
\l lowercase letters \l Non-lowercase letters
\u Capital \u Non-capital letters


For example, if you want to match 09/01/2000, you can use the following regular expression,



\d\d/\d\d/\d\d\d\d



Matches a six-letter word in a first-letter capitalization,



\u\w\w\w\w\w



If you want to match a word that does not know the length or a long word, it is not convenient to write each \w, this will use the quantifier (quantifiers) concept described below. 


4.3 classifier, greedy match and lazy match



You can limit the number of repetitions of a quantifier (quantifiers) after it is placed in the pattern part.


quantifiers Description
* 0 or more,. * Match anything, even a blank line
\+ 1 or more
\= 0 or 1 (Pingao Note: equivalent?)
\{n, M} Match N to M times
\{n} Match n Times
\{, M} Match 0 to M times
\{n,} Match at least N times


Both N and m must be positive integers



It is now easy to write an expression that matches any length of the word: \u\w\+.



These quantifiers are all working in greedy mode, and they will match as many characters as possible. Sometimes this brings unexpected problems, consider a typical example, if you want to match a text with some qualifier, such as quotes or parentheses enclosing text, because you do not know what is in these qualifiers, we can use/". *"/.



But this expression will match any text that is in the middle of the first and last quotation marks, such as the part of the bold callout.



This file is normally "$VIM/.GVIMRC". You can check here with ": Version".



This problem can be solved using inert (non-greedy) quantifiers.


quantifiers Description
\{-} 0 or more, match as few as possible
\{-N,M} N or more, match as few as possible
\{-n,} Match at least n times with as few matches as possible
\{-, M} Match up to M times with as few matches as possible


Let's replace the above with \{-}, so. \{-} will match the contents of the first quotation mark.



This file is normally "$VIM/GVIMRC". You can check here with ": Version".



\{-} did not disappoint us, let's see what happens when you execute the following command,



: s:.\{-}:_:g



Before execution:



N and m are decimal numbers between



After execution:



n a_n_d m a_r_e d_e_c_i_m_a_l n_u_m_b_e_r_s B_e_t_w_e_e_n



"As little as possible" is meant to match 0 characters, but the match happens between characters, and I quote Bram's own words to explain this behavior,



Matching to 0 characters is also a match, so it replaces 0 characters with an "_", then goes to the next position and continues to match to 0 characters.



In most cases, \{-} is of little use, it runs this way primarily to keep in line with the *, the latter will match 0 characters, in contrast, X\{-1,} is a more useless method, it will only match an x, and X function, a more useful way to x\{70}, as for x\ { -3,} "," X\{-2,} "," X\{-1,} is of little use, just to keep in line with the quantifiers of greedy patterns.



-bram



But what if you just want to match the second quote. Or we just want to change part of the quote. We'll use a grouping (grouping) and a reverse reference (backreference) before we look at the concept of the character range (character range). 


4.4 Character Range



Typical character intervals:



[012345] will match any of the parentheses, [0-5] equivalently, similarly, we can define a character range for all lowercase letters [A-z], all the letters [a-za-z], numbers plus letters [0-9a-za-z], and depending on where you are, you can add à to the character range, Non-ASCII characters such as ö,ß.



Note that the character interval matches only one of these characters, [0123] and 0123 are different, the order is not important for one character interval, [0123] and [0231], and 0123 and 0231 are distinct patterns. See what happens when you execute the following sentences,



S:[65]:D ig:g



Before execution:



High to 70. Southeast wind around 10



After execution:



High Digdig to 70. Southeast wind around 10



And then execute



S:65:dig:g



Before execution:



High to 70. Southeast wind around 10



After execution:



High Dig to 70. Southeast wind around 10



By placing an anti-selection symbol (^) at the front of the character range, you can easily remove characters that you do not want to match, and the following matches any character except uppercase letters,



/[^a-z]/



We can use the character interval to rewrite the text within the quotation marks.



/"[^"]\+"/



Note that the meta characters inside [] will lose their special meaning, so if you want a character range that contains--put--on top, the following expression will match all the numbers and--



/[-0-9]/



At the same time ^ if not at the front, it will lose its special meaning.



Now consider a realistic example, assuming that a grammar checker wants to find all sentences that do not begin with an uppercase letter, the following expression can do this,



\.\S\+[A-Z]



This will match a period, one or more spaces and then a lowercase letter, and now we know how to find the error, let's see how to fix it. Here we need to remember the previous match so that we can call it back, which is where the reverse reference is. 


4.5 Grouping and reverse references



You can use \ (\) to group pattern matches, and then pass \1, \2 ... \ 9来 references. A typical example for exchanging the first two words of each line,



S:\ (\w\+\) \ (\s\+\) \ (\w\+\): \3\2\1:



\1 represents the first word, \2 represents one or more blank characters, and the \3 represents the second word. How to know which number represents which match, from left to right number \ (number of.


# meaning # meaning
& Everything that the pattern matches to \l Converts the following characters to lowercase
The Ditto \u Converts the following characters to uppercase
\1 The match in the first bracket \e End of \u and \l
\2 Match in the second parenthesis \e End of \u and \l
... ... \ r Divide a line into two lines
\9 The match in the Nineth bracket \i Converts the next character to lowercase
~ The previously replaced string \u Converts the next character to uppercase


Look at the full expression of the grammar check problem above,



S:\ ([.!??] \) \s\+\ ([a-z]\): \1 \u\2:g



We replace 0 or more whitespace characters with two spaces. 


4.6 Alternative



Alternative (alternation) refers to the use of \| to combine multiple expressions so that once an expression is matched, the entire expression is matched successfully and the expression is returned to match the content. (Pingao Note: Similar to logical operators |)



\ (date:\| subject:\| from:\) \ (\s.*\)



The expression above will place the header and the contents of the message in \1 and \2, for alternative attention, it is not a greedy match, and once multiple expressions have an expression that matches, the subsequent expressions will no longer match, meaning that the order of the expressions is important for an alternative.



TIP3: quickly put \ (\) in an expression,
precedence of cmap \ \ (\) <Left><Left> 


4.7 Regular expression Operators



As with arithmetic expressions, the operators of regular expressions have a certain precedence, and the following table lists the priorities of each operation from high to low.


Priority Level operator Description
1 \(\) Group
2 \=,\+,*,\{n} Quantifiers
3 Abc\t\.\w character, meta character
4 \| Alternative

5. Global command


5.1 Global Search and execution

I'd like to introduce another powerful command with a wide range of uses,



: Range g[lobal][!] /pattern/cmd
In range, executes ex-cmd (default: P[rint]) on the pattern matching line, if the pattern is preceded by a!, which indicates that the pattern does not have a matching row.



The global command works by scanning each row of range ranges for the first time, making a mark on the pattern matching line, and executing cmd for each tag line for the second time. Range defaults to the entire file.



Note: The Ex command includes all commands you enter at the VIM command line, such as



: S[ubstitute],: co[py],:d [elete],: W[rite]



The non-ex command (normal command) can also be executed,



: Norm[al]non-ex Command 

5.2 Example



: g/^$/D



-Delete the empty lines in the file



: G/^$/,/./-j



-Convert multiple blank lines to a blank line



: 10,20g/^/Mo 10



-Reverses the order of 10 to 20 lines



Here is an example from Walter Zintz VI tutorial, with examples of changes



: ' A, ' B g/^error/. W >> Errors.txt



-Locate the line starting with error between tags ' A and ' B, and append the rows to Errors.txt. Note: The front of the W. (current line) do not miss, otherwise the entire file will be appended to the errors.txt.



You can use the | as a delimiter to execute multiple commands, and if you want to use the | In the parameter, escape it with \. Another example of Zintz,



: g/^error:/Copy $ | S/error/copy of the error/



Copy all the error lines to the end of the file and replace the error with the copy of the error. The s command does not specify an address, which defaults to the current line.



: g/^error:/s/error/copy of the error/| Copy $



Reverse the order of operation above, replace and copy first. 

6. More examples 

6.1 Tips



(1) provided by Antonio Colombo



Remove all lines from the trailing whitespace,



s:\s*$:: Or s:\s\+$:: 

6.2 Create an outline



This example requires you to have a bit of HTML background and we need to separate the headings and subheadings in the


(1) First of all, we make a mark for each label,、


: s:\ (




Description






(2) Next, copy the title to a place,






:%g/




The above command will copy the







The first step, in order to link the elements of the table to their respective locations, we will "replace the name= with href=" #.






S:name= ": href=" #:






Second step, to make H1 and H2 look different, we define "Majorhead" and "Minorhead" two CSS classes,





g/





We no longer need




S:




Replace




S:/H[21]:BR:






Look at the file now,





<a class= "Majorhead" name= "Anchor1" >Heading1></a><br> <a class= "Minorhead"

Anchor2 ">Heading2></a><br>"

(Pingao Note: I think the file should be like this at this time)

<a class= "Majorhead" href= "#" >Heading1></a><br>

<a class= "Minorhead" href= "#" >Heading2></a><br>



6.3 Processing Table



In many cases, you need to process text in tabular form. For example, the following text,





Asia    America  Africa   Europe

Africa  Europe     Europe Africa Europe Asia Europe   Europe

Asia    America  Africa   Europe     Africa Europe Asia Africa

Europe  Asia     Asia     Europe

Europe  America  Africa   Asia

Africa  Europe   Europe   Africa

Europe  Asia     Europe   Europe





Suppose you want to replace the third column of "Europe" with "Asia",






:%s:\ (\ (\w\+\s\+\) \{2}\) Europe:\1asia:






Swap the first two columns,






:%s:\ (\w\+\) \ (. *\s\+\) \ (\w\+\) $:\3\2\1:






Not to be continued ...


7. Features of regular expressions in other languages






Now I'm comparing Vim's regular expression to regular expressions in other languages, especially Perl. Perl must mention the regular expression.






The main difference between Perl and VIM (organized with the help of Steve Kirkendall) is that most of the meta characters in Perl do not require backslashes. Personally, the less the backslash the better, the more readable the regular expression will be. In Perl, you can add one to the quantifier. Converts the quantifiers of greedy patterns into non greedy patterns, such as * for non greedy patterns. Perl's regular expressions support a variety of strange options. Perl's regular expressions can contain variables, and the variables will be replaced with specific values, which is called "variable substitution." 8. Link






In normal mode, enter ": Help patterns" and read the regular expressions and search chapters of the Vim help document.






There are two good books on the market that introduce Vim's regular expressions, "Learning the Editor" by Linda Lamb and Arnold Robbins. "Vi Improved-vim" by Steve Oualline






Jeffrey Friedl's "Mastering Regular Expressions" is a regular expression of the authoritative guide, this book mainly introduces Perl regular expression, by O ' Reilly published, the official online chapter free.








Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.