Regular Expression, Regular Expression

Source: Internet
Author: User
Tags i18n

Regular Expression, Regular Expression

(Transfer)

Hello everyone:

 

Regular Expressions are an old technology and are widely used nowadays. Especially in front-end Web development, the processing objects are mainly in string format. In front-end libraries, you can see a variety of regular expressions. If you don't know them, it's still quite painful. In particular, the current advanced Editor provides the regular expression search function, which can write regular expressions to make your life better.

 

Legend has it that regular expressions are a programmer R & D technique that has been transferred from the study of neurology. I feel like a ghost drive drawn by a traditional Chinese Taoist priest. When I don't understand it, it's very mysterious. If I understand it, it will be passed to the demon division.

This time, we first look at the two basic concepts of regular expressions.

 

First, regular expressions are about text technology, which can help us search for specific content in the text.

For example, in the Visual Studio text editor, you can directly use regular expressions for advanced search. The following exercises can be completed directly in the Visual Studio text editor.

 

Part 2

 

The first concept Metacharacter

 

When searching text, we must first describe the search target. For example, notepad supports completely writing the search content, such as searching for the word "hello". In a regular expression, each search target is represented by multiple metacharacters. For example, the word 'hello' contains five literal metacharacters. These characters directly represent the search targets and are directly written. They are generally called literally.

 

It is certainly better to describe our search targets one character by one. However, it is often difficult for us to write them directly. For example, we only remember the first two letters of hello, what should we do with this word?

You can use wildcards to describe. Each wildcard can represent a single character. In a regular expression, the English dot (.) represents any character except line break. If we know that this word has a total of five characters, we can write it as he... . What if the character to be represented is a dot? In a regular expression, you must use a backslash to escape all the characters. The regular expression must be he \.\.\..

 

DOT (.) it can match any character except for line breaks. Then, he *** can also be matched. Our target must be a character, and we can also use more special metacharacters to strictly limit it, \ w can match any letter, number, or underline. Therefore, we can write it as he \ w. Note that a metacharacter can only match one character, so it must be written three times. In this way, he will not be confused.

 

There are 7 frequently used metacharacters

.

Match any character except linefeed

\ W

Match letters, numbers, underscores, or Chinese Characters

\ S

Match any blank space character

\ D

Matching number

\ W

Match any character that is not a letter, number, underline, or Chinese Character

\ S

Match any character that is not a blank character

\ D

Match any non-numeric characters

 

As you can see, the metacharacters are case-insensitive, and the meaning is the opposite. The lower-case characters indicate the range. The upper-case characters indicate that they are not in the range. It is easy to remember.

 

Custom character range

 

However, he123 can still be matched. can we further limit it to English letters? Of course you can, but there are no predefined metacharacters. We need to describe the optional range by ourselves.

[] Can clearly indicate our alternative characters. The characters listed in brackets are alternative characters. For example, if the characters are limited to lower-case English characters, all lowercase characters can be listed. [abcdefghijklmnopqrstuvwxyz], this expression can represent any character from a to z without a number.

Then, we can write the regular expression he [abcdefghijklmnopqrstuvwxyz] [abcdefghijklmnopqrstuvwxyz] [abcdefghijklmnopqrstuvwxyz]. Are you dizzy? I am not in trouble.

To reduce your workload, you can use hyphens to set the range in []. For example, [a-z] can represent all lowercase English characters from a to z. The numbers from 0 to 9 can be written as [0-9. Note: writing it as [9-0] may not work. It has a direction.

In this way, we can use the regular expression he [a-z] [a-z] [a-z] To search for a target consisting of five characters starting with he. It feels better.

 

Boundary characters

 

What will happen if heabcdefg is met? We will still match the heabc, but this is not a five-character word, so we also need to express that after our 5 metacharacters, it cannot be other common characters. It should be separators such as spaces, tabulation, and line breaks. Or, in other words, it can be completed when symbols such as spaces, tabulation, and carriage return are encountered, however, we do not need to match the space, tabulation, and carriage return characters.

In a regular expression, we call it a boundary. A boundary is a special position, such as the start, end, or end of a word. The position is not a character in a common sense. Therefore, it must be expressed in a special way.

 

There are four common locations.

\ B

Start or end of a matching word

^

Start of matching string

$

End of matching string

\ B

Match is not the start or end of a word

 

\ B indicates the boundary of a word, for example, comma (.) space, etc. We can re-write \ bhe [a-z] [a-z] [a-z] \ B, which indicates a five-character combination, it starts with the word "he.

 

The second concept is also the last one, the quantizer.

 

Now, our expression \ bhe [a-z] [a-z] [a-z] \ B is complicated. What should we simplify? Of course I wrote [a-z] Three times. This word is only five characters long. Have you heard of the i18n problem? The International English word is 18 characters long and can be written as i18n directly by foreigners. If it is also written as a regular expression like ours, it will not be exhausted. Quantifiers solve this problem.

 

If it is too difficult to write multiple encoding repeatedly, you can use *, which means to repeat the metacharacters next to it multiple times. This n can be from 0 to n, if our regular expression is \ bhe [a-z] * \ B, it indicates a word that starts with "he" with at least two characters, it can match the word he, and healthy can also match.

 

If you do not want to match the word "he" and want [a-z] to match at least one character, you can use the plus sign (+), which is different from the asterisk, this n is from 1 to n.

If you only want to match 0 times or 1 time, you can use the question mark (?), The n value ranges from 0 to 1.

 

Slow down. After talking about it for a long time, we didn't need it. I need to repeat it three times. It must be three times. It cannot be more or less, then you can use {n}. You can set this n directly. Now you can write it as \ bhe [a-z] {3} \ B. Is it so nice.

You can specify at least several times or repeat n to m times.

 

Common quantifiers

Code/Syntax

Description

*

Repeated zero or more times

+

Repeat once or more times

?

Zero or one repetition

{N}

Repeated n times

{N ,}

Repeat n or more times

{N, m}

Repeat n to m times

 

 

To sum up

This section describes the usage of regular expressions, basic metacharacters, and quantifiers.

However, this is a very basic usage. We will introduce advanced usage later. If you are in a hurry, you can first read an article titled Regular Expression 30-minute getting started.

 

For developers, you can also install the RegexBuddy tool, which integrates various functions of learning, creating, and testing regular expressions, as well as the programmer's artifacts.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.