Update: A beta version of Xregexp 0.3 is now available as part of the regexpal downloadpackage.
JavaScript ' s regular expression flavor doesn ' t support named. So, says who? Xregexp 0.2 brings named capture support, along with several and other new features. But all, if you haven ' t seen the previous version, make sure to check out I post on xregexp0.1, Becaus E isn't all of the documentation is repeated below.
Highlights
- Comprehensive named capture support (NEW)
- Supports regex literals through the
addFlags
method (New)
- Free-spacing and Comments Mode (
x
)
- Dot matches all mode (
s
)
- Several other minor improvements over v0.1
Named capture
There are several different syntaxes in the wild for named. I ' ve compiled the following table based on my understanding of the regex support to the libraries in question. Xregexp ' s syntax is included in the top.
Library |
Capture |
backreference | In
replacement |
Stored | at
Xregexp |
(<name>…) |
\k<name> |
${name} |
result.name |
. NET |
(?<name>…) (?'name'…) |
\k<name> \k'name' |
${name} |
Matcher.Groups('name') |
Perl 5.10 (Beta) |
(?<name>…) (?'name'…) |
\k<name> \k'name' \g{name} |
$+{name} |
?? |
Python |
(?P<name>…) |
(?P=name) |
\g<name> |
result.group('name') |
PHP Preg (PCRE) |
(. NET, Perl, and Python styles) |
$regs['name'] |
$result['name'] |
No other major Regex library currently supports named capture, although the Jgsoft engine (used by products like regex Buddy) supports both. NET and Python syntax. Xregexp does not with a question mark at the beginning of a named capturing group because this would prevent it from being Used in regex literals (JavaScript would immediately throw, "invalid quantifier" error).
Xregexp supports named capture on a on-request basis. You can add named capture support to any regex though the use of the new " k
" flag. This is doing for compatibility reasons and to ensure the Regex compilation time remains as fast as possible in all Situat ions.
Following are several examples of using named Capture:
//ADD named Capture support using the Xregexp constructorvar repeatedwords = new Xregexp ("\\b <word> \\w+) \\s+ \\k<word> \\b", "gixk");//Add named Capture support using REGEXP, after overriding the native Constr Uctor xregexp.overridenative (); var repeatedwords = new RegExp ("\\b <word> \\w+) \\s+ \\k<word> \\b", "gixk"); //Add named Capture support to a regex literal var repeatedwords =/\b (<word> \w+) \s+ \k<word> \b/.addflags ("gixk"); var data = "The test data." //Check if data contains repeated words var hasduplicates = Repeatedwords.test (data); //hasduplicates:true //use the Regex to remove repeated words var output = Data.replace (Repeatedwords, "${word}") ; //output: "The test data."
In the above code, I ' ve also used the x
flag provided by XREGEXP, to improve readability. Note addFlags
that the can is called multiple the same regex (e.g., /pattern/g.addFlags("k").addFlags("s")
), but I ' d recommend adding all flag s in one shot, for efficiency.
Here are a few the more examples of using named Capture, with a overly simplistic url-matching regex (for comprehensive URL p Arsing, Parseuri):
var url = "Http://microsoft.com/path/to/file?q=1"; var urlparser = new Xregexp ("^" (<protocol>[^ :/?] +):///(/* the Result:parts.protocol: "http" parts.host: "microsoft.com" Parts.path: "/path/to/file" Parts.query: "q=1" */ // Backreferences are also available in replace () callback functions as properties of the "the" the "the", var argument = Url.replace (Urlparser, function (match) {return match.replace (match.host, "yahoo.com");}); //newurl: "Http://yahoo.com/path/to/file?q=1"
Note This xregexp ' s named Capture functionality does not support deprecated JavaScript features the including /> the global object and the method RegExp
RegExp.prototype.compile()
.
Singleline (s) and extended (x) modes
The other non-native flags Xregexp supports are s
(singleline) for ' dot matches all ' mode, and x
(extended) for "F" Ree-spacing and Comments "mode. For all details about these modifiers, the "FAQ in" my xregexp 0.1 post. However, one difference from the previous version are that xregexp 0.2, when using x
the flag, now allows whitespace being Tween a Regex token and its quantifier (quantifiers are, e.g,, +
, *?
, or {1,3}
). Although the previous version ' s handling/limitation in this regard is documented, it is atypical compared to other regex Libraries. This has been fixed.
The Code
/* xregexp 0.2.2; MIT License by Steven Levithan /
* Protect this from running more than once, which would the break of its references to native functions * * if (window. Xregexp = = = undefined) {var xregexp; (function () {var native = {regexp:regexp, exec:RegExp.prototype.exec, Match:String.prototype.match, replace:string. Prototype.replace}; Xregexp = function (pattern, flags) {return native. REGEXP (pattern). Addflags (flags); }; RegExp.prototype.addFlags = function (flags) {var pattern = This.source, Usenamedcapture = false, re = xregexp._re; flags = (Flags | | "") + Native.replace.call (this.tostring (),/^[\s\s]+\//, ""); if (Flags.indexof ("x") >-1) {pattern = Native.replace.call (pattern, re.extended, function ($, $) {return $? ($ $: "(?:)"): $;}; } if (Flags.indexof ("K") >-1) {var capturenames = []; pattern = Native.replace.call (pattern, Re.capturinggroup, funct Ion ($, $) {if (/^\) (?! \?) /.test ($) {if ($) usenamedcapture = true; Capturenames.push ($ | | null); return "(;} else {return $}}); if (usenamedcapture) {/ * Replace named with numbered backreferences * *Pattern = Native.replace.call (pattern, re.namedbackreference, function ($, $, $) {var index = $ capturenames.indexo F ($):-1; return index >-1? "\" + (index + 1). ToString () + ($) "(?:)" + $: ""): $});}/ * If "]" is the leading character in a character class, replace it with "\]" for consistent cross-browser handling. This is needed to maintain correctness without the aid of browser sniffing when constructing the regexes which-deal with C Haracter classes. They treat a leading "]" within a character class as a non-terminating, literal character, which is consistent with IE,. N ET, Perl, PCRE, Python, Ruby, Jgsoft, and most other regex engines. */Pattern = Native.replace.call (pattern, Re.characterclass, function ($, $) {/ * This second regex ' leading ' "" "exists in the character class *Return $? Native.replace.call ($,/^ (\[\^?)] /, "$1\\]"): $; }); if (Flags.indexof ("s") >-1) {pattern = Native.replace.call (pattern, re.singleline, function ($) {return $ = =]. ? "[\\S\\s]": $; }); The var regex = native. REGEXP (pattern, Native.replace.call (flags,/[sxk]+/g, "")); if (usenamedcapture) {regex._capturenames = Capturenames;/* Preserve capture names if adding flags to a regex which has already run through Addflags ("K") * *else if (this._capturenames) {regex._capturenames = This._capturenames.valueof ();} return regex; }; String.prototype.replace = function (search, replacement) {/ * If search is isn't a regex which uses named capturing groups, just run the native replace method * *if (!) ( Search instanceof Native. REGEXP && search._capturenames)) {return native.replace.apply (this, arguments);} if (typeof replacement = = "Fun Ction ") {return Native.replace.call (this, search, function () {/ * Convert Arguments[0] From a string primitive to a string object which can store properties * /Arguments[0] = new String (arguments[0]);/ * Store named backreferences on the argument before calling replacement * *for (var i = 0; i < search._capturenames.length i++) {if (search._capturenames[i)) arguments[0][search._capturenames[ I]] = arguments[i + 1]; Return replacement.apply (window, arguments); }); else {return Native.replace.call (this, search, function () {var args = arguments; return Native.replace.call (Replaceme NT, xregexp._re.replacementvariable, function ($, $, $) {/ * Numbered backreference or special variable * *if ($) {switch ($) {case "$": Return "$"; case "&": return args[0]; case "'": Return Args[args.length-1].substri Ng (0, args[args.length-2]); Case "'": Return args[args.length-1].substring (Args[args.length-2] + args[0].length);/ * Numbered backreference * *Default/* What does "$" mean-backreference, if at least capturing groups Exist-backreference 1 followed by "0", If at least one capturing group Exists-else, it ' s the string "$" * *var literalnumbers = ""; $ = +$1;/ * CHEAP type-conversion * *while ($ > search._capturenames.length) {literalnumbers = $1.tostring (). Match (/\d$/) [0] + literalnumbers; $ = math.f Loor ($1/10);* Drop the last digit * *Return ($ args[$1]: "$") + literalnumbers; }/ * Named backreference * *else if ($) {/* What does "${name}" mean-backreference to named capture "name", if it exists-else, it ' s the string "${name}" * /var index = Search._capturenames.indexof ($); return index >-1? Args[index + 1]: $; else {return $}}); }); } }; RegExp.prototype.exec = function (str) {var result = Native.exec.call (this, str); if (!) ( This._capturenames && result && result.length > 1) for (var i = 1; i < result.length; i++) {var name = This._capturenames[i-1]; if (name) result[name] = Result[i];} r Eturn result; }; String.prototype.match = function (regexp) {if (!regexp._capturenames | | | regexp.global) return Native.match.call (this, R EGEXP); Return regexp.exec (this); }; })(); }/ * Regex syntax parsing with support for escapings, character classes, and various the other context and Cross-browser ISS UES * Xregexp._re = {extended:/(?: [^[#\s\\]+|\\ (?: [\s\s]|$) |\[\^?]? (?: [^\\\]]+|\\ (?: [\s\s]|$)] *]?) +| (\s*#[^\n\r]*\s*|\s+) ([?*+]| {\d+ (?:, \d*)?})? /g, Singleline:/(?: [^[\\.] +|\\ (?: [\s\s]|$) |\[\^?]? (?: [^\\\]]+|\\ (?: [\s\s]|$)] *]?) +|\./g, Characterclass:/(?: [^\\[]+|\\ (?: [\s\s]|$)) +|\[\^? (]?) (?: [^\\\]]+|\\ (?: [\s\s]|$)) *]?/g, Capturinggroup:/(?: ^[(\\]+|\\ (?: [\s\s]|$) |\[\^?]? (?: [^\\\]]+|\\ (?: [\s\s]|$)) *]?| \((?=\?)) +|\ (?:< ([$\w]+) >)/g, namedbackreference:/(?: [^\\[]+|\\ (?: [^k]|$) |\[\^?]? (?: [^\\\]]+|\\ (?: [\s\s]|$)) *]?| \\k (?!) <[$\w]+>)) +|\\k< ([$\w]+) > (\d*)/g, replacementvariable:/(?: [^$]+|\$ (?! [1-9$& ']| {[$\w]+})] +|\$(?:( [1-9]\d*| [$& ']) | {([$\w]+)}) /g}; xregexp.overridenative = function () {/* Override the global RegExp constructor/object with the Xregexp constructor. This precludes accessing properties of the last match via the global RegExp object. However, those properties are deprecated as of JavaScript 1.5, and the values are available on REGEXP instances or via Reg Exp/string methods. It also affects the result of (/x/.constructor = = RegExp) and/x/instanceof RegExp. */ REGEXP = xregexp; }; /* IndexOf method from Mootools 1.11; MIT License */ Array.prototype.indexOf = Array.prototype.indexOf | | function (item, from) {var len = This.length for (var i = (from < 0)? Math.max (0, Len + from): from | | 0; i < Len; i++) {if (this[i] = = Item) return i;} return-1; };
Can download it, or get the packed version (2.7 KB).
Xregexp has been tested in IE 5.5–7, Firefox 2.0.0.4, Opera 9.21, Safari 3.0.2 beta for Windows, and Swift 0.2.
Finally, this is the XRE
object from v0.1 has been removed. Xregexp now is only creates one global variable: XRegExp
. To permanently override the native RegExp
Constructor/object, you can now runXRegExp.overrideNative();