The technique of JavaScript regular expressions parsing URLs _ regular expressions

Source: Internet
Author: User

A regular expression is an object that describes a character pattern.

First of all, this article is not directly to tell you, what is the regular expression URL, and how to use this regular expression to parse a URL address, I believe that this problem on the network has been able to find a lot. The purpose of this article is to teach you how to understand regular expressions of URLs to understand regular expressions and to write relatively simple regular in future work. To get to the point, let's take a look at the example:

var parse_url =/^ (?:( [a-za-z]+):)? (\/{0,3}) ([0-9.\-a-za-z]+) (?::(\d+))? (?:\ /([^?#]*))? (?:\? ([^#]*))? (?:#(.*))?$/;
var parse_url =/^ (?:( [a-za-z]+):)? (\/{,}) ([-.\-a-za-z]+) (?::(\d+))? (?:\ /([^?#]*))? (?:\? ([^#]*))? (?:#(.*))?$/;
 var url = "Http://qiji.kerlai.net:/GoodsBasic/Operate/?q#simen";
 var result = parse_url.exec (URL);
 var names = ["url", "scheme", "Slash", "host", "Port", "path", "Query", "hash"];
 for (var i= i <names.length;i++) {
  Console.log (names[i]+ ":" +result[i]);
 }
 Output/
 *
 url:http://qiji.kerlai.net:/goodsbasic/operate/?q#simen
 scheme:http
 slash://
 host:qiji.kerlai.net
 Port:
 path:goodsbasic/operate/
 query:q
 hash:simen
 

Let's take a look at the results first:

Url:http://qiji123.kerlai.net:81/goodsbasic/operate/12678?q
Scheme:http
slash://
Host:qiji123.kerlai.net
port:81
path:goodsbasic/operate/12678
Query:q
Hash:simen

The collection of result arrays in code is [' http://qiji123.kerlai.net:81/GoodsBasic/Operate/12678?q ', ' http ', '//', ' qiji123.kerlai.net ', ' Bayi ', ' goodsbasic/operate/12678 ', ' Q ', ' Simen ']

Now we try to link from the 2nd to the last one, the result is: "http//qiji123.kerlai.net bayi goodsbasic/operate/12678 q Simen" and the original URL compared to the lack of ":? ". Why is this? Speaking of which, we're going to draw a notion of regular expressions as a grouping of regular expressions. There are 4 groups of regular expressions, namely: Capture type, fly capture type, forward positive match, forward negative match. Here I focus on the first two kinds, the back two kinds of people can do their own brain. Those that are not captured will not appear in an array of results, () surrounded by a group, which occupies a position in the result array. Similarly, if you do not enclose parentheses in your regular expression, the matching character will not appear in the array returned by the Exec () method. Regular groupings are referred to () as a grouping.

1, Capture group: (...)

2, non-capture group: (?: ...)

3, forward to match: (? = ...)

4, forward negative to match: (?!) .........)

Next we'll break down parse_url, the regular expression, the first grouping

1. ^ Represents the beginning of a string

The entire regular factor is matched by a protocol name: HTTP

2. (?:) represents a non-capture grouping: The characters within this bracket that are not matched within their parentheses are not placed in the result array.

3. () represents a captured grouping in which the characters matched in parentheses are placed in the corresponding URL in the result array: HTTP characters

4, [] is a regular expression class, which represents any one by one characters in parentheses.

7. A-za-z represents the letter A to the letter Z, the letter A to the letter Z. [A-za-z] represents any one by one characters that match the letter A through the letter Z, the letter A through the letter Z

5, + show matching 1 times goods

6,? Indicates that this group is an optional matching criterion

Second regular factor: (\/{0,3})://

Capture Grouping, \/indicates that one should be matched/,{0,3} representations \ will be matched 0 times or between 1 and 3 times

([0-9.\-a-za-z]+): Qiji123.kerlai.net

Captured groupings, consisting of one or more digits, ".", "\-" (escaped as "-"), letters A through Z and letters A to Z

(?::(\d+))? : 81

Predecessors: Placed in non-capture groupings will not appear in the return array, \d represents the matching number. The whole factor is the matching predecessor, followed by one or more digits. This grouping factor is optional

(?:\ /([^?#]*))?: goodsbasic/operate/12678

The grouping is/starts, ^ here is the meaning of the non, that is, except #之外的所有字符 last? Represents this regular factor grouping optional

(?:\? ([^#]*))? : Q

The grouping represents 0 or more non # characters

(?: # (. *))?: Simen

The group starts with #, (.) All characters except the Terminator will be matched.

$ indicates the end of this string.

All the groups for the URL have been parsed by this end. Next you can write a regular expression of the phone number: it can match the phone number of a fixed phone (this will use a new character: |)

Character Implications
\

As a turn, that is, the characters usually after "\" do not interpret the original meaning, such as the/b/matching character "B", when B is preceded by a backslash/\b/, turn to match the boundary of a word. Or
A restore of a regular expression feature character, such as "*" matches its preceding metacharacters 0 or more times,/a*/will match a,aa,aaa, and after "\",/a\*/will only match "a *".

^ Match an input or the beginning of a line,/^a/matches "an A", but does not match "an A"
$ Match an input or end of a line,/a$/matches "an A" and does not match "an A"
* Matches the preceding metacharacters 0 or more times,/ba*/will match b,ba,baa,baaa
+ Matches the preceding metacharacters 1 or more times,/ba*/will match ba,baa,baaa
? Match the preceding metacharacters 0 or 1 times,/ba*/will match B,ba
(x) Match x Save x in a variable named $1...$9
X|y Match x or Y
N Exact Match n times
{N,} Match n times above
{N,m} Matching n-m times
[XYZ] Character set (character set) that matches any one by one characters (or metacharacters) in this collection
[^XYZ] does not match any one of the characters in this collection
[\b] Match a backspace
\b Match the bounds of a word
\b Match the non-boundary of a word
\cx Here, X is a control character,/\cm/match ctrl-m
\d Matches a character number character,/\d/=/[0-9]/
\d Matches a non-word number character,/\d/=/[^0-9]/
\ n Match a line feed
\ r Match a return character
\s Match a blank character, including \n,\r,\f,\t,\v, etc.
\s Matches a non-white-space character equal to/[^\n\f\r\t\v]/
\ t Match a tab
\v Match a heavy-straight tab
\w Match a character that can make up a word (alphanumeric, which is my transliteration, with numbers), including underscores, like [\w] matches 5 in "$5.98", equals [a-za-z0-9]
\w Matches a character that cannot be made into words, such as [\w] matches $ in "$5.98", equal to [^a-za-z0-9].

Use re = new RegExp ("pattern", ["flags"]) in a better way: regular expression flags:g (all patterns that appear in Full-text lookup) I (Ignore case) m (multiple-line lookup)

The problem of vascript dynamic regular expression

Can the regular expression be generated dynamically? For example, in javascript: var str = "strtemp"; To generate: var re =/strtemp/; If the character is connected: var re = "/" + str + "/" Can
But to generate an expression, can you implement it?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.