"JS Review notes" 05 Regular expressions

Source: Internet
Author: User

Well, regular expression, I never had a demerit. I used to copy it on the Internet.

Here's a little bit of it, and then if you have a regular expression to use then collect it in this post. (although I don't think so, because I'm not a professional front end, I'm just going to draw the water. \ (^o^)/)

Application scope: Regular expressions are primarily used to implement find, replace, and extract operations on information in a string.

There are 6 ways to handle regular expressions:

Regexp.exec,regexp.test,string.match,string.replace,string.search and String.Split

Application reason: In JS, the regular expression has a significant performance advantage relative to the equivalent string processing.

Cons: As most people see, this thing sometimes seems complicated and difficult to understand. At least you let me have this dish to maintain a regular expression. I do not copy on the Internet, the general will be used in the form of non-regular expressions to deal with, the United States its name: code readability!

JS the expression must be written in one line, blank needs special attention.

The following paragraph is the code:

var myregexp=/^ (?:( [a-za-z]+]:)? (\/{0,3}) ([0-9.\-a-za-z]+) (?::(\d+))? (?:\ /([^?#]*))? (?:\? ([^#]*))? (?:#(.*))?$/; var url= "Http://www.ora.com:8041/goodparts?q#fragment"; var result=myregexp.exec (URL);

See above this code you know what meaning, most people do not know, know people also want to see half a day. That's why people don't want to write this stuff. All right, here's the chapter, let's go.

Even so, I'm going to write it myself, because the effect he achieves is this:

The result is the following array:

["http://www.ora.com:8041/goodparts?q#fragment", "http", "//", "www.ora.com", "8041", "Goodparts", "Q", "Fragment"]

That's why I keep on writing.

Well, let's learn the pain and the sharp grammar:

  • ^ indicates that the string starts in the following way
  • (?:( [a-za-z]+]:)?    must be followed by a colon to match ( Remember that the colon in this case matches the ), which is determined by the subsequent colon matches a protocol name, which is HTTP.
      • :)    represents a non-capturing grouping
      • suffix?   Indicates that the group is optional, and that he repeats 0 or more times. Just like the URL you entered is www.baidu.com, it can be matched without the protocol name.
      • (...)   represents a capturing type grouping. A capturing type group copies the text it matches and places it in the result array. Each capturing group is assigned a number. The number of the first capturing group is 1, so result[1] represents it. Result[0] is the original string.
      • [...]   represents a character class. a-za-z is well understood, that is, 26 uppercase letters and 26 lowercase letters. - represents a range.
      • suffix + means that the character class will be matched one or more times.
      • after that: It means that the matched string must be followed by a colon
  •   (\/{0,3}) This is the Capture Packet 2, which matches the two left slash
    • \ /Represents an escape character that can be understood as \ n.
    • {0,3} means/This thing will be matched 0 to 3 times
  • ([0-9.\-a-za-z]+) This is capturing a grouping of 3, matching a www.baidu.com thing, consisting of one or more letters and numbers, as well as . and - two characters. That is to say your URL is www.baidu ...----com---is also correct
  • (?::(\d+)? This is a non-capturing grouping with capture Packet 4, which matches the port number. That is, the number that begins with. The colleague captures the number and puts it into the result array.
    • \d represents a numeric character, and[0-9] can achieve the same effect
  • (?:\   /([^?#]*))? This is another, a non-capturing grouping with capture Packet 5, which captures the Goodparts
    • (?:\ /(...))? Match a string with a left slash/start 0 to 1 times
    • [^?#] match not ? and # all the characters,^ denotes non-meaning
    • The suffix * means to be matched 0 or more times, and suffix + almost, but + is starting from 1,
  • (?:\? ([^#]*))? Ditto, similar, I should be able to understand it
  • (?:#(.*))? roughly ibid.,
      • . matches all characters except the line terminator
  • $ means that the string ends in the same way as above

To tell you the truth, I read the book and summed up the words, and I've been thinking about a problem.

When did I find the regular expression difficult?

It's when I'm super-food and I don't like to learn. See what all feel difficult, plus people also impetuous, do not want to sink down to learn, so formed a such impression. Now it seems so simple.

I'll tell you that I've basically never written a regular expression myself, I'll just copy.

But I have just one hours of study, I think I can, and I can immediately write a 6 of the regular expression, no matter how long, just need to put each capturing group to write a line, and then paste into the code when the composite line.

Sudden perception: Programmers just need a quiet heart and learning interest.

I'm not going to tell you that I'm writing a blog while reading, so let's go ahead.

At any rate, I now understand that regular expressions are not difficult, but it is still easier to write regular expressions as simply as possible.

So let's write a regular expression that matches numbers.

var myregexp=/^-?\d+ (?: \. \d*)? (?: e[+\-])? \d+)? $/i; var url= " -1.3e-3"; var result=myregexp.test (URL); // result is true

The last I of the regular expression above indicates that the case is ignored when the string is matched. So let's expand:

    • End With I: Indicates ignoring string case, matching
    • End With G: Represents the global (multiple matches). The g,string search method is not recommended for the test method to automatically ignore the G ID.
    • End With M: MultiRow ($ and ^ can match line terminator)

How to create a regular expression:

    • The simplest, just like I played on top of it.
      • var myregexp=/^-?\d+$/i
    • Another way is to use the RegExp constructor. The Reg constructor is suitable for situations where regular expressions must be dynamically generated at run time.
      •   
        var myregexp=New RegExp ("\" (?: \ \\\.| [^\\\\\\\"]) *\ "", ' G ');

      • Properties of the RegExp
        • Global: If the identity g is used, the value is true.
        • IgnoreCase: If identity i is used, the value is true.
        • LastIndex: The next exec match starts the index. The initial value is 0.
        • Multiline: If identity m is used, the value is True
        • Source: Regular Expression Source text
      • A RegExp object created with regular expression literals, sharing the same singleton. ( I measured it myself and found it was not so, so the authenticity of this statement is still to be confirmed )

About elements that make up regular expressions

  • branch : In |, two regular expressions can be used | and up into one, if the string matches any one of the two regular Expressions delimited by |, then this option matches.
  • The regular expression matches the quantifier , simply speaking is how many times matches
    • {3,6} means matching 3 to 6 times
    • * Equivalent to {0,}
    • + equals {1,}
    • ? Equivalent to {0,1}
  • The matching notation of the ASCII code special characters :
    • [!-\/:[email protected]\[-' {-~]

      Very ugly, and difficult to understand, so my regular expression ah, alas ~ ~ ~

  • Regular Expression grouping type
    • Capture Type: ()
    • Non-capturing type: (?:) For a simple match, the matched text is not captured. Will have a weak performance advantage.
    • Forward positive match: (? =) The author says that this feature and the following feature are not good features, so I've decided to start forgetting.
    • Backward negative matching: (?!)
  • characters that require an escape character : \/[] ()? + - * | . ^ s
  • At the same time some interesting escape characters
    • \f Page Break
    • \ n line break
    • \ r return character
    • \ t tab is tab
    • \u allows you to specify a Unicode character to represent a 16-binary constant
    • \d is equivalent to [0-9],\d the opposite, equivalent to [^0-9]
    • \s is equivalent to [\f\n\r\t\u000b\u0020\u00a0\u2028\u2029]. This is an incomplete subset of Unicode whitespace characters, and \s is just the opposite
    • \w is equivalent to [0-9a-z_a-z],\w the opposite, \w wants to represent the letter class but it is usually difficult to work with.
    • So a simpler letter class is [A-ZA-Z\U00C0-\U1FFF\U2800-\UFFFD], which includes all Unicode letters and other non-alphabetic characters. Unicode is much larger than this, but it's too big and inefficient. So just use this simple.
    • The \b is specified as a word boundary identifier, which facilitates matching of the word boundaries of the text. However, he will use \w to find the border, so it is a bad feature for many languages.
    • \1 \2 \3 A reference to the text captured by the 1th, 2, and 3 groupings of the respective values
      • So using this regular expression can be used to search for the presence of duplicate words in the text that are separated by several whitespace characters:
        var doubledword=/([a-za-z\u00c0-\u1fff\u2800-\ufffd]+) \s+\1/gi;

"JS Review notes" 05 Regular expressions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.