Application of regular Expressions in Ubb forum _ Regular Expressions

Source: Internet
Author: User
Tags numeric


First, the reader guide

The Reader guide helps you master the outline of this article. Lest you read most of it to understand this article is not suitable for you, causing you visual pollution.
If you are using ASP to write programs, or you are writing some such as BBS, message pu or form data inspection and other things that is worth a look.

If you are already familiar with the regular expression, then you do not have to look at the line, just look at the template I wrote, compare, take the essence of the line.
If you are in contact with regular expressions for the first time, then you'd better look at the line and test

When you have mastered the use of regular expressions skillfully, you will find it is fun.

Second, the concept of regular expressions

What is UBB code? What is a regular expression?

UBB code is a variant of HTML. In general, UBB Forums do not allow you to use HTML code, but instead HTML code with UBB code.
UBB code is a set of popular UBB tags composed of fixed code, the code has a uniform format. Users can implement the functionality that the user wants as long as they follow the code rules. Such as:
To show bold How to are your words, you should enter how to are you instead of typing <b>how are you</b>

You may ask: How does ASP convert how to are you to <b>how are you</b>?
The answer to this question is: use regular expressions.

Third, the use of regular expressions

Sometimes when we make Web form data processing (especially UBB forum), we need data validation and string substitution, especially UBB forum to do a lot of data security and string substitution

Mail in a general forum does not support HTML syntax this makes it impossible for users to modify fonts, maps, etc. This makes the forum lose a powerful way to attract users. It may be said that a powerful forum is important in attracting the number of users. This shows a UBB solution, that is, when the forum does not support HTML syntax, users can still customize the style of their own posts, textures, add links, paste pages and many other functions, may achieve the same effect to support HTML syntax, And this can make the forum relative to the HTML forum security greatly improved. Users are basically unable to do any malicious attacks on the forum.

Syntax rules and tags for regular expressions

Now we formally enter the expression of learning, I will be based on the example of a combination of regular expression of the use, after reading you will feel that writing UBB code so simple, as long as you follow me step-by-Step after reading this article you become a UBB master. The exciting thing is that you can write your own UBB tags and no longer have to go to other people to copy ready-made code and templates. Fortunately VBScritp5.0 gave us a "regular expression" object, as long as your server installed ie5.x, you can run.

Character Description:

^ symbol matches the beginning of a string. For example:
^ABC matches "ABC XYZ" and does not match "XYZ ABC"

The $ symbol matches the end of the string. For example:
abc$ matches "xyz ABC" and does not match "ABC XYZ".
Note: If you use both the ^ symbol and the $ symbol, an exact match will be made. For example:
^abc$ only matches "ABC"

The * symbol matches 0 or more preceding characters. For example:
ab* can match "AB", "ABB", "abbb", etc.

The + symbol matches at least one of the preceding characters. For example:
Ab+ can match "ABB", "ABBB" and so on, but does not match "AB".

The symbol matches 0 or 1 preceding characters. For example:
Ab?c? Can and only match "abc", "Abbc", "ABCC" and "ABBCC"

. A symbol matches any character other than a newline character. For example:
(.) + matches all strings except line breaks

X|y matches "x" or "Y". For example:
ABC|XYZ can match "abc" or "XYZ", while "AB (c|x) yz" matches "Abcyz" and "abxyz"

{n} matches the preceding character of exactly n times (n is a non-negative integer). For example:
A{2} can match "AA", but does not match "a"

{N,} matches the preceding character at least n times (n is a non-negative integer). For example:
A{3,} matches "AAA", "AAAA" and so on, but does not match "a" and "AA".
Note: A{1,} is equivalent to A +
A{0,} equivalent to A *

{M,n} matches at least m, at most n preceding characters. For example:
a{1,3} only matches "a", "AA" and "AAA".
Note: a{0,1} is equivalent to a?

[XYZ] Represents a character set that matches one of the characters in parentheses. For example:
[ABC] matches "a", "B", and "C"

[^XYZ] Represents a negative character set. Matches any character that is not in this bracket. For example:
[^ABC] can match any character except "A", "B" and "C"

[A-z] represents a range of characters that matches any character within a specified interval. For example:
[A-z] matches any lowercase letter character between "a" and "Z"

[^m-n] represents a character outside a range that matches characters that are not in the specified range. For example:
[M-n] matches any character except from "M" To "N"

\ symbol is an escape operator. For example:
\ n Line Feed
\f Page Break
\ r Carriage Return
\ t tab
\v Vertical Tab

\ \ Match "\"
\/Match "/"

\s any white characters, including spaces, tabs, page breaks, and so on. Equivalent to "[\f\n\r\t\v]"
\s any characters that are not blank. Equivalent to "^\f\n\r\t\v]"
\w any word characters, including letters and underscores. Equivalent to "[a-za-z0-9_]"
\w any non word characters. Equivalent to "[^a-za-z0-9_]"

\b Matches the end of a word. For example:
ve\b match the word "love" and so on, but do not match "very", "even" and so on

\b Matches the beginning of a word. For example:
ve\b matches the word "very" and so on, but does not match "love" and so on

\d matches a numeric character, equivalent to [0-9]. For example:
ABC\DXYZ Match "abc2xyz", "abc4xyz", etc., but do not match "abcaxyz", "abc-xyz", etc.

\d matches a non-numeric character, equivalent to [^0-9]. For example:
ABC\DXYZ Match "abcaxyz", "abc-xyz", etc., but do not match "abc2xyz", "abc4xyz", etc.

\num matches num (where num is a positive integer), and the reference returns to the remembered match. For example:
(.) \1 matches two consecutive identical characters.

\onum matches N (where n is a octal value less than 256). For example:
\o011 Matching tab

\xnum matches num (where num is a hexadecimal swap value less than 256). For example:
\x41 matching character "A"


V. Example analysis

1 Find the link address exactly in the string

((HTTP|HTTPS|FTP):(\/\/|\\\\) ((\w) +[.]) {1,} (net|com|cn|org|cc|tv| [0-9] {1,3}) (((\/[\~]*|\\[\~]*)
(\w) +) | [.] (\w) +) * (([?] (\w) +) {1}[=]*)) * ((\w) +) {1} ([\&] (\w) +[\=] (\w) +) *) *)

As we know, link addresses are generally in the form of HTTP or HTTPS or FTP. A preliminary summary is that the link address must meet the following conditions:

Condition 1
Start with http://or https://or ftp://(and of course there are other forms, only the main)

Condition 2
http://must be followed by a word character followed by "." After the word character. (Such a combination must occur one or more times). Followed by a "." The following is the domain name suffix (such as net or COM or CN, etc., if it is in the form of IP address can be a number)

Condition 3
After the full link address, you can also appear next level or more levels of the directory (also note that the address of the personal home page may appear "~" symbol)

Condition 4
The end of the link address can be with parameters. Like a typical number of pages? Pageno=2&action=display, etc.

Now we use the following code to match the above criteria one by one--

1, ((HTTP|HTTPS|FTP):(\/\/|\\\\) to meet the conditions 1
Indicates that the http://http:\\ https://https:\\ ftp://ftp:\\ all match (where some users may be able to "//" to "" \ "" Bad "error)
Note: "|" Indicates "or", "\" is an escape character. "\/\/" means "//", "\\\\" means "\"

2. ((\w) +[) {1,} (net|com|cn|org|cc|tv| [0-9] {1,3}) satisfies condition 2
"((\w) +[.]) {1,} "means a word word alphanumeric a point number can appear 1 or more times (this takes into account some users like to omit www and write http://www.w3c.com as http://w3c.com)
"(net|com|cn|org|cc|tv| [0-9] {1,3}) " Indicates that you must end with net or COM or cn or org or CC or TV or three digits below
[0-9] {1,3} represents a number below three digits, because no segment of an IP address can exceed 255

3, ((\/[\~]*|\\[\~]*) (\w) +) |[.] (\w) +) * Satisfies the condition 3
"(\/[\~]*|\\[\~]*)" means "/~" or "\~" (where "[\~]*" means ~ can or may not appear), because not every link address has a next level of directory
"(\w) +) | [.] (\w) +) "indicates that a word character must appear (that is, a directory or a file with an extension)
Note: Finally there is a "*" means that the above brackets can or may not appear, otherwise you can only match the next level of the directory of the link address.

4. (([?] (\w) +) {1}[=]*)) * ((\w) +) {1} ([\&] (\w) +[\=] (\w) +) *) *) satisfies condition 4
“((([?] (\w) +) {1}[=]*)) * ((\w) +) {1} "representation"? pageno=2 "string can or may not appear, if it occurs only once (because there is no two"?) "number appears).

"([\&] (\w) +[\=] (\w) +) *" indicates that a string such as "&action=display" can or will not appear (since not every page has more than two parameters).

Whole "(([?] (\w) +) {1}[=]*)) * ((\w) +) {1} ([\&] (\w) +[\=] (\w) +) *) * "representation"? Pageno=2&action=display "string may or may not appear (that is, the link address can have parameters or can have no parameters)


With the combination of the above, we can match a more comprehensive link address. Rather than using a simple "(http:\/\/\s+)" to match a link address, readers can test the comparisons themselves. Of course, this code has a lot of deficiencies, I hope you can continue to improve.

2 Replace the typical UBB label: [/b]
Our aim is to replace the [b] pair into <b></b> below to see the template we implement it
(\[b\]) (.+) (\[\/b\])
Here we use "(. +)" to match the entire string between, in the substitution of the time we want to write this
Str=checkexp (Re,str, "<b>$2</b>")
(Note: Checkexp is my custom function, which will be given later.) This function will replace [/b] According to the template we provide. )

You might ask what a "$" is here, and note that this $ is important, it represents the entire string of "(. +)" matches.
Why is it $ $ and not $ $? Because the "[b]" string that the representative (\[b\]) matches, and the "" string that the $ represents (\[\/b\]) matches, it is clear that here we need $ instead of $1$3.


VI ubb Regular Expression template instance
Here's a ubb function I wrote that basically makes your forum an excellent UBB code forum. Of course, by improving, you can get a more powerful UBB forum.

Function Rethestr (FACE,STR)
Dim re,str

Re= "\>"
Str=checkexp (Re,str, ">")

Re= "\<"
Str=checkexp (Re,str, "<")

Re= "\n\r\n/"
Str=checkexp (Re,str, "<P>")

RE=CHR (32)
Str=checkexp (Re,str, "")

Re= "\ r"
Str=checkexp (Re,str, "")

Re= "\[img\] ((http: (\/\/|\\\\)) {1} ((\w) +[.]) {1,3} (NET|COM|CN|ORG|CC|TV) (((\/[\~]*|\\[\~]*)
(\w) +) | [.] (\w) +) * (\w) +[.] {1} (gif|jpg|png)) \[\/img\] "' Find picture Address
Str=checkexp (Re,str, "

Re= "\[w\] (http: (\/\/|\\\\) ((\w) +[.]) {1,} (NET|COM|CN|ORG|CC|TV) (((\/[\~]*|\\[\~]*) (\w) +) | [.] (\w) +) *
(((([?] (\w) +) {1}[=]*)) * ((\w) +) {1} ([\&] (\w) +[\=] (\w) +) *) \[\/w\] "' Find frame Address
Str=checkexp (Re,str, "<iframe width= ' height= ' src= ' ' ></iframe>")

Re= "([^ (' >]]) (<br>) * ((HTTP|HTTPS|FTP):(\/\/|\\\\) ((\w) +[.]) {1,} (net|com|cn|org|cc|tv| ([0-9]{1,3})] (((\/[\~]*|\\[\~]*) (\w) +) | [.] (\w) +) * (([?] (\w) +) {1}[=]*)) * ((\w) +) {1} ([\&] (\w) +[\=] (\w) +) *) "' Find link address
Str=checkexp (Re,str, "$1$2 <a href= ' $ ' target=_blank>$3</a>")

Re= "([^ (http://|http:\\)]) ((WWW|CN) [.] (\w) +[.] {1,} (NET|COM|CN|ORG|CC) (((\/[\~]*|\\[\~]*) (\w) +) | [.] (\w) +) *
(((([?] (\w) +) {1}[=]*)) * ((\w) +) {1} ([\&] (\w) +[\=] (\w) +) *) "' finds addresses that do not start with http://
Str=checkexp (RE,STR, "$ <a href= ' http://$2 ' target=_blank>$2</a> ')"

Re= "([^ (=)]) ((\w) +[@]{1} ((\w) +[.]) {1,3} (\w) +) "' Find mailing address
Str=checkexp (Re,str, "<a href= ' mailto:$2 ' >$2</a>")

Re= "\[0-f]{6}" \] ((.) +) \[\/color\] "' Replaces font color
Str=checkexp (Re,str, "<font color= ' >$4</font>")

Re= "\[size= ([0-9]{1}) \] ((.) +) \[\/size\] "' Replace font size
Str=checkexp (Re,str, "<font size= ' >$2</font>")

Re= "\ ((.) +) \[\/font\] "' Replace font
Str=checkexp (Re,str, "<font face= ' >$3</font>")

Re= "(\[b\]) (. +) (\[\/b\])" Bold font
Str=checkexp (Re,str, "<b>$2</b>")

Re= "(\[u\]) (. +) (\[\/u\])" "Draw line
Str=checkexp (Re,str, "<u>$2</u>")

Re= "(\[li\]) (. +) (\[\/li\])" ' List
Str=checkexp (Re,str, "<li>$2</li>")

Re= "(\[quote\]) (. +) (\[\/quote\])" ' References
Str=checkexp (RE,STR, "<BLOCKQUOTE> quote:

Re= "\[email= (\w) +[@]{1} ((\w) +[.]) {1,3} (\w) +) \] (. +) (\[\/email\]) "' Mail
Str=checkexp (Re,str, "<a href=mailto:$1>$6</a>")

Re= "(\[center\]) (. +) (\[\/center\])" ' Centered
Str=checkexp (Re,str, "<center>$2</center>")

re=
Str=checkexp (RE,STR, "* * *")

re=, "Damn it."
Str=checkexp (RE,STR, "* * *")

re= "Sex"
Str=checkexp (RE,STR, "* * *")

Re= "TMD"
Str=checkexp (RE,STR, "* * *")

Re=
Str=checkexp (RE,STR, "* * *")

Rethestr=str
End Function

The UBB code is as follows:
[i] [/i] [u] [/u]
[url] [/URL] [Email=] [/email] [img] [/IMG]
Reference:
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

[Li] [/li] [font=impact] [Color=yellow]

The test code is as follows:
[img] [/img]http://cn.yahoo.com
aol.com 192.168.0.1
www.yahoo.com how are you[/b]
Page2000.xiloo.com/~page2000?pageno=2&action=del
Lucaihui@cmmail.com, everybody, http:\\page2000.shit.
<font Color=red>http://test.com</font>http://test
All in line with the expected results

(vii) ASP regular expression pair image functions are as follows:
Function Checkexp (PATRN,STRNG,TAGSTR)
Dim regex,matches

Set Regex=new RegExp ' Creates a new pair like
Regex.pattern=patrn ' Set template
Regex.ignorecase=true ' Search for case-sensitive true tables is indistinguishable from flase representations
Regex.global=true ' Search for the entire string

Matches=regex.replace (STRNG,TAGSTR) ' matches and replaces strings

checkexp=matches return function result
End Function


Save the top two functions as a page (such as ubbcode.asp), which makes up a complete UBB function.
Adding this function to your forum is a forum for supporting UBB code. Just call this function when you're done. Call forms such as
Text=rethestr (text)


The article I wrote a long time ago is not practical yet.
Hey, there's a mistake.
Ask me, I don't know, I don't see it for so long.
:)

I can't read this article.

Looks like UBB and UBB are in conflict.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.