ArticleDirectory
Learn how to write a regular expression:
---------------------------------------------------------------
Http://geekswithblogs.net/brcraju/articles/235.aspx
What regular expression?
A regular expression is a pattern that can match various text strings, used for validations.
Where and when to use regular expression?
It can be used in the programming languages ages which supports or has regular expression class as in built or it supports third party Regular Expression Libraries.
Regular Expressions can be used to valid different type of data without increase the Code with IF and case conditions. A number of if conditions can be omitted with single line of Regular Expression checking.
Benefits of regular expression:
The following are benefits (not all supported DED) of use of regular expression.
A) # line of code can be already CED.
B) speed coding.
C) Easy Maintenance (you don't need to change if validation criteria changes, just check the regular expression string ).
D) easy to understand (you don't need to understand the programmer logic on large if statements and case statements ).
Elements of regular expression:
Here are the basic elements of Regular Expression characters/literals, which can be used to build big Regular Expressions:
^ ----> Start of a string.
$ ----> End of a string.
. ----> Any character (character t \ n newline)
{...} ----> Explicit quantifier notation.
[...] ----> Explicit set of characters to match.
(...) ----> Logical grouping of part of an expression.
* ----> 0 or more of previous expression.
+ ----> 1 or more of previous expression.
? ----> 0 or 1 of previous expression; also forces minimal matching when an expression might match several strings within a search string.
\ ----> Preceding one of the above, it makes it a literal instead of a special character. preceding a special matching character, see below.
\ W ----> matches any word character, equivalent to [a-zA-Z0-9]
\ W ----> matches any non word character, equivalent to [^ a-zA-Z0-9].
\ S ----> matches any white space character, equivalent to [\ f \ n \ r \ v]
\ S ----> matches any non-white space characters, equivalent to [^ \ f \ n \ r \ v]
\ D ----> matches any decimal digits, equivalent to [0-9]
\ D ----> matches any non-digit characters, equivalent to [^ 0-9]
\ A ----> matches a bell (Alarm) \ u0007.
\ B ----> matches a backspace \ u0008 if in a [] character class; otherwise, see the note following this table.
\ T ----> matches a tab \ u0009.
\ R ----> matches a carriage return \ u000d.
\ V ----> matches a vertical tab \ u000b.
\ F ----> matches a form feed \ u000c.
\ N ----> matches a new line \ u000a.
\ E ----> matches an escape \ u001b
$ Number ----> substitutes the last substring matched by group number (decimal ).
$ {Name} ----> substitutes the last substring matched by (? ) Group.
$ ----> Substitutes a single "$" literal.
$ & ----> Substitutes a copy of the entire match itself.
$ '----> Substitutes all the text of the input string before the match.
$ '----> Substitutes all the text of the input string after the match.
$ + ----> Substitutes the last group captured.
$ _ ----> Substitutes the entire input string.
(? (Expression) Yes | no) ----> matches yes part if expression matches and no part will be ommited.
Simple Example:
Let us start with small example, taking integer values:
When we are talking about integer, it always has fixed series, I. e. 0 to 9 and we will use the same to write this regular expression in steps.
A) Regular Expression starts with "^"
B) as we are using set of characters to be validated, we can use [].
C) So the expression will become "^ [1234567890]"
D) As the series is continues we can go for "-" which gives us to reduce the length of the expression. It becomes "^ [0-9]"
E) This will work only for one digit and to make it to work for N number of digits, we can use "*", now expression becomes "^ [0-9] *"
F) As with the starting ending of the expression shocould be done with "$", so the final expression becomes "^ [0-9] * $"
Note:Double quotes are not part of expression; I used it just to differentiate between the sentences.
Is this the way you need to write:
This is one of the way you can write regular expression and depending on the requirements and personal expertise, regular expression cocould be compressed much shorter, for example above regular expression cocould be already CED.
A) Regular Expression starts with "^"
B) as we are checking for the digits, there is a special character to check for digits "\ D"
C) and digits can follow digits, we use "*"
D) As expression ends with "$", the final regular expression will become
"^ \ D * $"
Digits can be validated with different ways of Regular Expressions:
1) ^ [1234567890] * $
2) ^ [0-9] * $
3) ^ \ D * $
Which one to choose?
Every one of above expressions will work in the same way, choose the way you are comfort, it is always recommended to have a smaller and self expressive and understandable, as these will effect when you write big regular expression.
Example on exclude options:
There are using situation which demands us to exclude only certain portion or certain characters,
Eg: a) take all alpha numeric and special symbols into t "& #8221;
B) take all digits into t "7"
Then we cannot prepare a big list which primary des all instead we use the symbol of all and exclude the characters/symbols which need to be validated.
Eg: "^ \ W [^ &] * $" is the solution to take all alpha numeric and special symbols into T "& #8221 ;.
Other examples:
A) There shoshould not be "1" as first digit ,?
^ [^ 1] \ D * $? This will exclude 1 as first digit.
B) There shoshould not be "1" at any place?
^ \ D [^ 1] * $? This will exclude the 1 at any place in the sequence.
Note:Here ^ operator is used not only to start the string but also used to negate the values.
Testing of regular expression:
There are several ways of testing this
A) You can write a Windows based program.
B) You can write a Web based application.
C) You can even write a service based application.
Windows base sample code:
Here are steps which will be used for regular expression checking in DOTNET:
A) Use System. Text. regularexpression. RegEx to include the RegEx class.
B) Create an RegEx object as follows:
RegEx regdollar = new system. Text. regularexpressions. RegEx ("^ [0-9] * $ ");
C) Call the ismatch (String object) of the RegEx call, which will return true or flase.
D) depending on the return state you can decide whether passed string is valid for regular expression or not.]
Here is the snap shot code as function:
Public Boolean isvalid (string regexpobj, string passedstring)
{
// This method is direct method without any exceptional throwing ..
RegEx regdollar = new system. Text. regularexpressions. RegEx (regexpobj );
Return regdollar. ismatch (passedstring );
}
With minor changes to the above function it can be used in Windows or Webbased or even as a service.
Another way -- online checking:
At last if you are fed up with above and you have Internet connection and you don't have time to write sample, use the following link to test online
Http://www.regexplib.com/RETester.aspx
More Info:
You can find more information on these type of characters
Http://msdn.microsoft.com/library/default.asp? Url =/library/en-US/cpgenref/html/cpconcharacterescapes. asp
Http://msdn.microsoft.com/library/default.asp? Url =/library/en-US/cpgenref/html/cpconcharacterclasses. asp
Http://msdn.microsoft.com/library/default.asp? Url =/library/en-US/cpgenref/html/cpcongroupingconstructs. asp
Http://msdn.microsoft.com/library/default.asp? Url =/library/en-US/cpgenref/html/cpconcharacterclasses. asp
-- Here is the end of article, hope this basic build will definetely useful for writing a big and good Regular Expression ---
Express your code with regular expressions :))
Posted on Thursday, October 23,200 AM
Feedback # Re: Learn How to Write a regular expression:
An excellent article. Great writeup Raju
Some of the finer points I believe you can add to this Article are.
1. The input is considered to be text which is parsed to return the matches.
2. The ^ RegEx literal matches the start of a line of input
3. The $ literal matches the end of a line of input
4. Hence the example about integers which you have mentioned wocould not match any thing other than to validate if a given input is of a numeric word as the complete input or not.
If the input is "I am 123" It will not match the input/return to you 123 in the input.
It will only return matches for inputs like "123" "234"
So if you are trying to convey something like "this pattern is going to return to you all the numbers in the given input" (I understand it that way after reading through your article) then it shoshould be modified as \ D * Thats more than Enf.
Another point I wowould like you to add is you can ask people who are new to RegEx to actually use tools like expresso (http://www.codeproject.com/dotnet/Expresso.asp) using which you can build expressions and test them immediately (the beauty of this tool is that it uses plain English to help you build the expression) A must have for a developer who is new to RegEx.
Regards,
Ansari
10/25/2003 7:45 am | Tameem Ansari # Re: Learn How to Write a regular expression:
Thanks Ansari for your input, I defintely agree with your four points, I tried to concentrated on easy way of understanding and this cocould be applied with small set of character's, like integers, and you are right I concentrated mostly on nemeric, limit T in the Section
"Example on exclude options"
Where I have given example for exclude of & from alpha numeric string. At the end of the string it has printed some junk
"& Amp; #8221;
Actually it is & within double quotes, I think site is not handling that part.
I shoshould have mentioned this tool, the tool (expresso) is cool and very much useful to play for the beginners, thanks for reminding. 10/27/2003 6:33 am | ramchander # Re: Learn How to Write a regular expression:
DOTNET nuke 2.2
10/5/2004 :59 am | bangtech, Inc. # Please send me the RegEx for validating URL in ASP
Please send me the RegEx for validating URL in ASP
10/7/2004 11 am | Deepak Chauhan # Re: Learn How to Write a regular expression:
An very good article!
By the way, cocould you tell me how to validate data field with supporting Unicode.
For example, I want to check the input name with only characters (a-Z, A-Z, 0-9 ).
So, I created the Regexp: ^ \ [a-zA-Z0-9] * $.
When I input the string, e.g. "Smith", it is OK. But when I put
"Freinke's t finished", the RegEx didn't work? Do you have any ideas?
Thanks so much!
Am | Nghia NGO # Re: Learn How to Write a regular expression:
Nghia NGO,
For your example you have used special character single quote as an input, which you have not supported ded in your regular expression, and with respect to Unicode supporting.
Look at the following lines from Unicode organization:
Unicode is a large Character Set-Regular Expression engines that are only adapted to handle small character sets will not scale well.
Unicode encompasses a wide variety of languages ages which can have very different characteristics than English or other Western European text.
The following link breifs outline on how you can write for your purpose with regarding to Unicode, they have given brief with example:
Http://www.unicode.org/reports/tr18/tr18-9.html
3/11/2005 8: 55 am | ramchander # Re: Learn How to Write a regular expression:
Hi, There usefull article. Thanks.
Is it possible to check the following expression:
1. Matches any word ([a-zA-Z0-9])
2. quatified: from 4 to 10 excluding A, B, and C words.
Thank you
5/10/2005 Pm | Ruslan # Regular Expression 5/16/2005 am | C # developer's blog # Re: Learn How to Write a regular expression:
I 've used this RegEx to find HTML remarks, even if they contain tags:
"(<! -- (?> [^ <>] + | <(? <Nest>) |> (? <-Nest> ))*(? (Nest) (?!)>) "
The only thing is, I want to ignore RTF color tags but still match
Rest of the expression. To find RTF color tags I use:
"(\ CF \ D {1 })"
(Which matches '\ CF2, \ CF1 etc'). I cannot match it into its own group.
Is their A RegEx for exclusive matching?
OK, this maybe deep-end so I might have to settle for RegEx. split...
5/16/2005 Pm | Dan # Regular Expression 5/16/2005 am | C # developer's blog # Re: Learn How to Write a regular expression:
I am trying to validate an email address but wowould like to exclude certain domains (e.g. hotmail, Yahoo etc .). I am currently using the foll. but it doesn' t seem to work.
"* \ W + ([-+.] \ W +) * @ [^ (Hotmail | Yahoo)] \ W + ([-.] \ W + )*\\. \ W + ([-.] \ W + )**"
Cocould you help? Thanks!
Nirmal
Nirmal.parikh@comcast.net
5/26/2005 12:31 am | Nirmal # Re: Learn How to Write a regular expression:
How do I match double quotes since I am specifying the pattern inside the quotes?
It doesn' t like it when I try to escape it:
For example, matching on a word inside quotes like below does not work.
Regexp myreg = new Regexp ("\" W + \"");
6/21/2005 :21 am | Ali # Re: Learn How to Write a regular expression:
Hi,
I have an email validation, but it does not allow hyphens in the domain name. I have tried inserting '-'into the [] like this: [-a-Za-Z], but it still returns false. how do I allow hypens ??
If (Str. match (/^ (. * | [A-Za-Z] \ W *) @ (\ [\ D {1, 3 }(\. \ D {1, 3}) {3}] | [A-Za-Z] \ W *(\. [A-Za-Z] \ W *) +) $/) = NULL)
{
Alert ("the e-mail address seems incorrect .");
Return false;
}
6/23/2005 :57 Pm | Shane # Re: Learn How to Write a regular expression:
How to Write a regular expression for (PHP) a password?
Password must have a capital letter, a digital, a small letter and no (IBM, sun, HP) Maximum length is 16.
Thanks
6/28/2005 Pm | George post comment