Regular Expression Basics

Source: Internet
Author: User
Tags printable characters

Regular Expression Basics

function libraries for regular expressions:

1. PCRE, a regular expression function compatible with the Perl language.

2. POSIX.

"Grammar Rules"

A Atomic:
Atoms are the most basic constituent units of regular expressions, with at least one atom in each pattern. Atoms are made up of all printed and nonprinting characters that are not displayed as metacharacters.

1) ordinary characters as atoms

Includes uppercase lowercase letters, all numbers.

2) Some special words and metacharacters characters as atoms

Characters that have special meaning can be used/escaped.

3) Some nonprinting characters as atoms

nonprinting characters are formatting control symbols that are not displayed in a string.

Non-printable characters common in regular expressions

Atomic characters

Meaning description

\f

Match a page break

\ n

Match a line break

\ r

Match a carriage return character

\ t

Match a tab

\v

Match a vertical tab

Attention:

'/\n/'-whether a carriage return appears in the matching string in the Windows system.

'/\r\n/'-used in a Linux system to match a carriage return in a string.

4) Use universal character type as atomic

Common character types commonly found in regular expressions

Atomic characters

Meaning description

\d

Match any decimal number equivalent to [0-9]

\d

Matches any number except a decimal number, equivalent to [^0-9]

\s

Matches any white space character, equivalent to [\f\n\r\t\v]

\s

Matches any character except whitespace characters, equivalent to [^\f\n\r\t\v]

\w

Match any number, letter or underscore equivalent to [0-9a-za-z]

\w

Matches any character except a number, letter, or underscore equivalent to [^0-9a-za-z]

5) Custom Atomic table ([]) as an atom

You can use an atomic table to define a set of atoms that are equal to each other, such as:

'/[ja]sp/' – can match two types of ASP and JSP.

Two Metacharacters

The so-called meta-character is used to construct regular expressions with special meaning characters, such as: ' * ', ', ', ' + ', '? ' such as

Metacharacters can not appear alone, it is used to modify the atom, you can use \ to escape, so that it loses special meaning.

The meta-character of a regular expression

Metacharacters

Meaning description

*

Match 0 times, 1 times or more before the atom

+

Match 1 or more times before the atom

Match 0 or 1 times before the Atom

.

Match any character except line break

|

Match two or more branches

N

The atoms in front of them happen to happen N times

{N,}

The atoms in front of it appear at least n times

{N,m}

The atoms in front of them appear at least n times, up to M times

^ or \a

Where to start the match string

$ or \z

Match the end position of the input string

\b

Match the boundaries of a word

\b

Match a section other than the word boundary

[]

Matches any one of the atoms specified in the square brackets

[^]

Match any character except for the atoms in square brackets

()

Match the whole as an atom, or a pattern unit, that can be understood as a large atom of multiple individual atoms.

1. Qualifiers

Qualifiers are used to specify how many times a given atom of a regular expression must appear to satisfy a match. Have "*", "+", "? "," {n} "," {N,} "," {n,m} "six qualifiers, the main difference between them is that the number of repetitions is different.

2. Border restrictions

Used to limit the bounds of a string or word to obtain a more accurate result of the match. The metacharacters "^ and $ refer to the beginning and end of the string, respectively, and" \b "is used to describe the front or back bounds of each word in the string.

3. Period

. Can match any one of the characters in the target, including non-printable characters.

4. Mode selector (|)

| Selectors have the lowest precedence and are used to separate multiple selection modes.

5. Mode Unit

Use the meta-character "()" to make a large atom of multiple atoms, as a separate unit.

Three Pattern modifier

Pattern correction Symbols:

Pattern correction Symbols

Function description

I

Case insensitive when matching pattern

X

Whitespace in a pattern is ignored unless it is escaped

Regular Expression Basics

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.