Application of Regular Expressions in UBB Forum

Source: Internet
Author: User


I. Reader Guide

Reader guide helps you understand the synopsis of this article. Otherwise, you may not understand that this article is not suitable for you and may cause visual pollution.
If you are writing a program using ASP, or you are writing something like BBS, comment sheet, or form data check, it is worth watching.

If you are familiar with regular expressions, you don't have to look at the regular expressions in a row. You just need to look at the template I wrote, and then compare it to get the essence.
If you are still in touch with regular expressions for the first time, you 'd better read the regular expression line by line and test it one by one.

When you are familiar with the usage of regular expressions, you will find it enjoyable.

Ii. Concepts of Regular Expressions

What is UBB code? What is a regular expression?

UBB code is a variant of HTML. Generally, the UBB forum does not allow you to use HTML code. Instead, you can only use UBB code to replace HTML code.
UBB code is a set of fixed code consisting of popular UBB labels. The Code has a uniform format. Users only need to follow the code rules to implement the functions they want. For example:
To show how are you in bold, enter how are you instead of <B> how are you </B>.

You may ask: how does ASP convert how are you to <B> how are you </B>?
To answer this question, use a regular expression.

Iii. Usage of Regular Expressions

Sometimes we need data verification and string substitution when making website form data processing (especially for the UBB Forum), especially for the UBB forum, which requires a large amount of data security and string substitution.

Posts in common forums do not support HTML syntax, which prevents users from modifying fonts, textures, and other functions. This makes the Forum lose a powerful way to attract users. It may be said that a powerful forum is still very important to attract the number of users. In this way, there is a UBB solution, that is, when the Forum does not support HTML syntax, users can still customize their own stickers, textures, add links, paste web pages, and many other features, it may achieve the same effect of HTML syntax support, and this can greatly improve the security of the Forum against the HTML forum. Users cannot launch any malicious attacks on the forum.

Iv. regular expression syntax rules and tags

Now we are going to officially start learning regular expressions. I will explain the usage of regular expressions based on examples. After reading this, you will feel that writing UBB code is so easy, as long as you follow me step by step to read this article, you will become a master of UBB. The exciting thing is that you can write your own UBB tag, and you no longer need to copy the ready-made code and templates from someone else. Fortunately, VBScritp5.0 provides us with a "Regular Expression" object. As long as your server has IE5.x installed, you can run it.

Character Description:

^ Matches the start of a string. For example:
^ Abc matches "abc xyz" instead of "xyz abc"

$ Symbol matches the end of a string. For example:
Abc $ matches "xyz abc" instead of "abc xyz.
NOTE: If both the ^ and $ symbols are used, exact match is performed. For example:
^ Abc $ only matches "abc"

* The symbol matches 0 or more characters before it. For example:
AB * matches "AB", "abb", and "abbb ".

+ The symbol matches at least one character before it. For example:
AB + can match "abb" and "abbb", but does not match "AB ".

? The symbol matches 0 or 1 character. For example:
AB? C? Only "abc", "abbc", "abcc", and "abbcc" can be matched"

The. symbol matches any character except the line break. For example:
(.) + Match all strings except line breaks

X | y matches "x" or "y ". For example:
Abc | xyz can match "abc" or "xyz", while "AB (c | x) yz" matches "abcyz" and "abxyz"

{N} matches exactly the characters before n times (n is a non-negative integer. For example:
A {2} can match "aa", but does not match ""

{N,} matches the characters before at least n times (n is a non-negative integer. For example:
A {3,} matches "aaa" and "aaaa", but does not match "a" and "aa ".
Note: a {1,} is equivalent to a +
A {0,} is equivalent to *

{M, n} matches at least m characters, and at most n characters are prefixed. For example:
A {1, 3} Only matches "a", "aa", and "aaa ".
Note: a {0, 1} is equivalent to?

[Xyz] indicates a character set that matches one of the characters in brackets. For example:
[Abc] matches "a", "B", and "c"

[^ Xyz] indicates a negative character set. Match any character that does not exist in this bracket. For example:
[^ Abc] can match any character except "a", "B", and "c"

[A-z] indicates characters in a certain range. It matches any character in a specified range. For example:
[A-z] matches any lowercase letter character from "a" to "z"

[^ M-n] indicates a character out of a certain range, matching a character that is not in the specified range. For example:
[M-n] matches any character except "m" to "n"

The \ symbol is an escape operator. For example:
\ N linefeed
\ F paging character
\ R press ENTER
\ T Tab
\ V vertical Tab

\ Matches "\"
\/Match "/"

\ S any white characters, including spaces, tabs, and pagination characters. It is equivalent to "[\ f \ n \ r \ t \ v]".
\ S any non-blank characters. It is equivalent to "^ \ f \ n \ r \ t \ v]".
\ W any word character, including letters and underscores. Equivalent to "[A-Za-z0-9 _]"
\ W any non-word characters. Equivalent to "[^ A-Za-z0-9 _]"

\ B matches the end of a word. For example:
Ve \ B matches the word "love", but does not match "very" or "even ".

\ B matches the start of a word. For example:
Ve \ B matches the word "very", but does not match "love ".

\ D matches a numeric character, which is equivalent to [0-9]. For example:
Abc \ dxyz matches "abc2xyz", "abc4xyz", but does not match "abcaxyz", "abc-xyz", etc.

\ D matches a non-numeric character, which is equivalent to [^ 0-9]. For example:
Abc \ Dxyz matches "abcaxyz", "abc-xyz", but does not match "abc2xyz", "abc4xyz", etc.

\ NUM matches NUM (where NUM is a positive integer) and references it back to the remembered match. For example:
(.) \ 1 matches two consecutive identical characters.

\ ONUM matches n (where n is an octal value less than 256 ). For example:
\ O011 match tabs

\ XNUM matches NUM (NUM is a hexadecimal value less than 256 ). For example:
\ X41 matches the character ""


V. instance analysis

1) precisely find the link address in the string

(Http | https | ftp) :( \// |\\\\) (\ w) + [.]) {1,} (net | com | cn | org | cc | TV | [0-9] {1, 3}) (\/[\ ~] * | \ [\ ~] *)
(\ W) +) | [.] (\ w) +) * ([?] (\ W) +) {1} [=] *) * (\ w) +) {1} ([\ &] (\ w) + [\ =] (\ w) + )*)*)

We know that the link address usually appears in the form of http, https, or ftp. The link address must meet the following conditions:

Condition 1
It must start with http: //, https: //, or ftp: //. (Of course, there are other forms. Only the main ones are listed here)

Condition 2
Http: // it must be followed by a word character, followed by a "." (such a combination must appear once or multiple times ). Followed by "." is the domain name suffix (such as net, com, or cn, if it appears in the form of an IP address, it can be a number)

Condition 3
After the complete link address is displayed, you can also see the directory at the next level or more levels (note that the Personal Homepage Address may appear "~ "Symbol)

Condition 4
Parameters can be included at the end of the link address. For example, the typical page number? PageNo = 2 & action = display, etc.

Now we use the following code to match the above conditions one by one --

1. (http | https | ftp) :( \// |\\\\) condition 1 is met
Http: // http: \ https: // https: \ ftp: // ftp: \ all match (here, some users may lose "//" to "\" as an error that is prone to errors)
Note: "|" indicates "or", and "\" is an escape character. "\/" Indicates "//", and "\" indicates "\\"

2. (\ w) + [.]) {1,} (net | com | cn | org | cc | TV | [0-9] {1, 3}) Meet condition 2
"(\ W) + [.]) {1,} "indicates a word character plus a dot can appear once or multiple times (here some users prefer to omit www and write http://www.w3c.com as http://w3c.com)
"(Net | com | cn | org | cc | TV | [0-9] {1, 3 }) it must end with a number below net, com, cn, org, cc, TV, or three.
[0-9] {255} indicates the number below three digits, because any segment of the IP address cannot exceed

3. (\/[\ ~] * | \ [\ ~] *) (\ W) +) | [.] (\ w) +) * meets Condition 3
"(\/[\ ~] * | \ [\ ~] *) "Indicates that "/~ "Or "\~ ", (Where" [\ ~] * "Indicates ~ Can or can not appear), because not every link address has a directory of the next level.
"(\ W) +) | [.] (\ w) +)" indicates that a word character (that is, a directory or a file with an extension) must appear)
Note: There is another "*" In the end, which indicates that the above brackets can appear or do not appear. Otherwise, the link address of the next-level directory can only be matched.

4. ([?] (\ W) +) {1} [=] *) * (\ w) +) {1} ([\ &] (\ w) + [\ =] (\ w) +) *) Meet Condition 4
"([?] (\ W) +) {1} [=] *) * (\ w) +) {1} "represents "? The PageNo = 2 "string can or can not appear. If it appears, it can only appear once (because there cannot be two"?" ).

"([\ &] (\ W) + [\ =] (\ w) + )*) "indicates that a string such as" & action = display "can appear or not (because not every webpage has more than two parameters.

The entire "([?] (\ W) +) {1} [=] *) * (\ w) +) {1} ([\ &] (\ w) + [\ =] (\ w) +) * "indicates a form such as"? The PageNo = 2 & action = display "string can or does not appear (that is, the link address can have a parameter or no parameter)


By combining the above information, we can match a comprehensive link address. It is better to use a simple "(http: \/\ S +)" to match a link address. You can test and compare it yourself. Of course, there are still many shortcomings in this Code, and we hope you can continue to improve it.

2) Replace the typical UBB Tag: [/B]
Our goal is to replace [B] With <B> </B>. Let's look at the template for implementing it.
(\ [B \]) (. +) (\ [\/B \])
Here we use "(. +)" to match the entire string between the horses. We need to write it like this when replacing it.
Str = checkexp (re, str, "<B> $2 </B> ")
(Note: checkexp is my custom function, which will be provided later. This function will replace [/B] with the template we provide .)

Maybe you will ask what "$2" is here, but it is very important to pay attention to this $2. It represents "(. +) "the entire string of the matched string.
Why is it $2 instead of $1 or $3? Because $1 represents (\ [B \]) the matched "[B]" string, $3 represents (\ [\/B \]) the matched "" string, obviously, what we need here is $2 instead of $1 $3.


6) UBB regular expression template instance
Below is a UBB function I wrote. This function can basically make your Forum an excellent UBB code forum. Of course, after improvement, you can get a more powerful UBB forum.

Function ReThestr (face, str)
Dim re, str

Re = "\>"
Str = checkexp (re, str, "> ")

Re = "\ <"
Str = checkexp (re, str, "<")

Re = "\ n \ r \ n /"
Str = checkexp (re, str, "<P> ")

Re = chr (32)
Str = checkexp (re, str ,"")

Re = "\ r"
Str = checkexp (re, str ,"")

Re = "\ [img \] (http :( \// |\\\\) {1} (\ w) + [.]) {1, 3} (net | com | cn | org | cc | TV) (\/[\ ~] * | \ [\ ~] *)
(\ W) +) | [.] (\ w) +) * (\ w) + [.] {1} (gif | jpg | png) \ [\/img \] "'find the image address
Str = checkexp (re, str, " ")

Re = "\ [w \] (http :( \// |\\\\) (\ w) + [.]) {1,} (net | com | cn | org | cc | TV) (\/[\ ~] * | \ [\ ~] *) (\ W) +) | [.] (\ w) + )*
([?] (\ W) +) {1} [=] *) * (\ w) +) {1} ([\ &] (\ w) + [\ =] (\ w) +) *) \ [\/w \] "'find the frame address
Str = checkexp (re, str, "<iframe width = '000000' height = '000000' src = '$ 1'> </iframe> ")

Re = "([^ ('>)]) (<br>) * (http | https | ftp) :( \\// | \\\\) (\ w) + [.]) {1,} (net | com | cn | org | cc | TV | ([0-9] {1, 3}) (\/[\ ~] * | \ [\ ~] *) (\ W) +) | [.] (\ w) +) * ([?] (\ W) +) {1} [=] *) * (\ w) +) {1} ([\ &] (\ w) + [\ =] (\ w) +) *) "'find the link address
Str = checkexp (re, str, "$1 $2 <a href = '$ 3' target = _ blank> $3 </a> ")

Re = "([^ (http: // | http: \)]) (www | cn) [.] (\ w) + [.] {1,} (net | com | cn | org | cc) (\/[\ ~] * | \ [\ ~] *) (\ W) +) | [.] (\ w) + )*
([?] (\ W) +) {1} [=] *) * (\ w) +) {1} ([\ &] (\ w) + [\ =] (\ w) +) *) "'search for addresses not starting with http: //
Str = checkexp (re, str, "$1 <a href = 'HTTP: // $2 'target = _ blank> $2 </a> ")

Re = "([^ (=)]) (\ w) + [@] {1} (\ w) + [.]) {1, 3} (\ w) +) "'find the email address
Str = checkexp (re, str, "<a href = 'mailto: $ 2'> $2 </a> ")

Re = "\ [0-F] {6}) \] (.) +) \ [\/color \]" 'Replace the font color
Str = checkexp (re, str, "<font color = '$ 1'> $4 </font> ")

Re = "\ [size = ([0-9] {1}) \] (.) +) \ [\/size \]" 'Replace the font size
Str = checkexp (re, str, "<font size = '$ 1'> $2 </font> ")

Re = "\ (.) +) \ [\/font \]" 'Replace the font
Str = checkexp (re, str, "<font face = '$ 1'> $3 </font> ")

Re = "(\ [B \]) (. +) (\ [\/B \])" 'bold font
Str = checkexp (re, str, "<B> $2 </B> ")

Re = "(\ [u \]) (. +) (\ [\/u \])" 'offline
Str = checkexp (re, str, "<u >2 2 </u> ")

Re = "(\ [li \]) (. +) (\ [\/li \])" 'list
Str = checkexp (re, str, "<li >2 2 </li> ")

Re = "(\ [QUOTE \]) (. +) (\ [\/QUOTE \])" 'reference
Str = checkexp (re, str, "<BLOCKQUOTE> reference:

Re = "\ [email = (\ w) + [@] {1} (\ w) + [.]) {1, 3} (\ w) +) \] (. +) (\ [\/email \]) "'mail
Str = checkexp (re, str, "<a href = mailto: $1> $6 </a> ")

Re = "(\ [center \]) (. +) (\ [\/center \])" 'center
Str = checkexp (re, str, "<center >2 2 </center> ")

Re = "fuck"
Str = checkexp (re, str ,"***")

Re = "fuck"
Str = checkexp (re, str ,"***")

Re = "sex"
Str = checkexp (re, str ,"***")

Re = "TMD"
Str = checkexp (re, str ,"***")

Re = "shit"
Str = checkexp (re, str ,"***")

ReThestr = str
End function

The UBB Code is as follows:
[I] [/I] [u] [/u]
[Url] [/url] [email =] [/email] [img] [/img]
Reference:
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

[Li] [/li] [font = impact] [color = Yellow]

The test code is as follows:
[Img] [/img] http://cn.yahoo.com
Aol.com 192.168.0.1
Www.yahoo.com how are you [/B]
Page2000.xiloo.com /~ Page2000? PageNo = 2 & action = del
Lucaihui@cmmail.com Hello everyone http: \ page2000.shit
<Font color = red> http://test.com </font> http: // test
All meet the expected results

7) ASP Regular Expression functions are as follows:
Function CheckExp (patrn, strng, tagstr)
Dim regEx, Matches

Set regEx = New RegExp 'to create a New object
RegEx. Pattern = patrn 'setting Template
RegEx. IgnoreCase = true' search whether the true table is case-sensitive. flase indicates case-insensitive.
RegEx. Global = true' whether the search is applied to the entire string

Matches = regEx. replace (strng, tagstr) 'Matches and replaces the string

CheckExp = Matches returns the function result
End function


Save the above two functions as a page (such as ubbcode. asp) to form a complete UBB function.
Add this function to your forum to become a forum that supports UBB code. You only need to call this function when using it. The call format is as follows:
Text = ReThestr (text)


The article I wrote a long time ago is not practical yet.
XI ~~ Otherwise
I don't know if I have forgotten it for so long.
:)

However, this article does not allow the switch of smiling faces.

It seems that ubb and ubb are in conflict.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.