Application of ASP Regular expression in Ubb forum

Source: Internet
Author: User
Tags character set expression ftp integer numeric range regular expression domain name
Regular one, reader guide
The Reader guide helps you master the outline of this article. Lest you read most of it to understand this article is not suitable for you, causing you visual pollution.
If you are using ASP to write programs, or you are writing some such as BBS, message pu or form data inspection and other things that is worth a look.

If you are already familiar with the regular expression, then you do not have to look at the line, just look at the template I wrote, compare, take the essence of the line.
If you are in contact with regular expressions for the first time, then you'd better look at the line and test

When you have mastered the use of regular expressions skillfully, you will find it is fun.

Second, the concept of regular expressions

What is UBB code? What is a regular expression?

UBB code is a variant of HTML. In general, UBB Forums do not allow you to use HTML code, but instead HTML code with UBB code.
UBB code is a set of popular UBB tags composed of fixed code, the code has a uniform format. Users can implement the functionality that the user wants as long as they follow the code rules. Such as:
To show bold How to are your words, you should enter how to are you instead of typing <b>how are you</b>

You may ask: How does ASP convert how to are you to <b>how are you</b>?
The answer to this question is: use regular expressions.

Third, the use of regular expressions

Sometimes when we make Web form data processing (especially UBB forum), we need data validation and string substitution, especially UBB forum to do a lot of data security and string substitution

Mail in a general forum does not support HTML syntax this makes it impossible for users to modify fonts, maps, etc. This makes the forum lose a powerful way to attract users. It may be said that a powerful forum is important in attracting the number of users. This shows a UBB solution, that is, when the forum does not support HTML syntax, users can still customize the style of their own posts, textures, add links, paste pages and many other functions, may achieve the same effect to support HTML syntax, And this can make the forum relative to the HTML forum security greatly improved. Users are basically unable to do any malicious attacks on the forum.

Syntax rules and tags for regular expressions

Now we formally enter the expression of learning, I will be based on the example of a combination of regular expression of the use, after reading you will feel that writing UBB code so simple, as long as you follow me step-by-Step after reading this article you become a UBB master. The exciting thing is that you can write your own UBB tags and no longer have to go to other people to copy ready-made code and templates. Fortunately VBScritp5.0 gave us a "regular expression" object, as long as your server installed ie5.x, you can run.

Character Description:

^ symbol matches the beginning of a string. For example:
^ABC matches "ABC XYZ" and does not match "XYZ ABC"

The $ symbol matches the end of the string. For example:
abc$ matches "xyz ABC" and does not match "ABC XYZ".
Note: If you use both the ^ symbol and the $ symbol, an exact match will be made. For example:
^abc$ only matches "ABC"

The * symbol matches 0 or more preceding characters. For example:
ab* can match "AB", "ABB", "abbb", etc.

The + symbol matches at least one of the preceding characters. For example:
Ab+ can match "ABB", "ABBB" and so on, but does not match "AB".

The symbol matches 0 or 1 preceding characters. For example:
Ab?c? Can and only match "abc", "Abbc", "ABCC" and "ABBCC"

. A symbol matches any character other than a newline character. For example:
(.) + matches all strings except line breaks

X|y matches "x" or "Y". For example:
ABC|XYZ can match "abc" or "XYZ", while "AB (c|x) yz" matches "Abcyz" and "abxyz"

{n} matches the preceding character of exactly n times (n is a non-negative integer). For example:
A{2} can match "AA", but does not match "a"

{N,} matches the preceding character at least n times (n is a non-negative integer). For example:
A{3,} matches "AAA", "AAAA" and so on, but does not match "a" and "AA".
Note: A{1,} is equivalent to A +
A{0,} equivalent to A *

{M,n} matches at least m, at most n preceding characters. For example:
a{1,3} only matches "a", "AA" and "AAA".
Note: a{0,1} is equivalent to a?

[XYZ] Represents a character set that matches one of the characters in parentheses. For example:
[ABC] matches "a", "B", and "C"

[^XYZ] Represents a negative character set. Matches any character that is not in this bracket. For example:
[^ABC] can match any character except "A", "B" and "C"

[A-z] represents a range of characters that matches any character within a specified interval. For example:
[A-z] matches any lowercase letter character between "a" and "Z"

[^m-n] represents a character outside a range that matches characters that are not in the specified range. For example:
[M-n] matches any character except from "M" To "N"

\ symbol is an escape operator. For example:
\ n Line Feed
\f Page Break
\ r Carriage Return
\ t tab
\v Vertical Tab

\ \ Match "\"
\/Match "/"

\s any white characters, including spaces, tabs, page breaks, and so on. Equivalent to "[\f\n\r\t\v]"
\s any characters that are not blank. Equivalent to "^\f\n\r\t\v]"
\w any word characters, including letters and underscores. Equivalent to "[a-za-z0-9_]"
\w any non word characters. Equivalent to "[^a-za-z0-9_]"

\b Matches the end of a word. For example:
ve\b match the word "love" and so on, but do not match "very", "even" and so on

\b Matches the beginning of a word. For example:
ve\b matches the word "very" and so on, but does not match "love" and so on

\d matches a numeric character, equivalent to [0-9]. For example:
ABC\DXYZ Match "abc2xyz", "abc4xyz", etc., but do not match "abcaxyz", "abc-xyz", etc.

\d matches a non-numeric character, equivalent to [^0-9]. For example:
ABC\DXYZ Match "abcaxyz", "abc-xyz", etc., but do not match "abc2xyz", "abc4xyz", etc.

\num matches num (where num is a positive integer), and the reference returns to the remembered match. For example:
(.) \1 matches two consecutive identical characters.

\onum matches N (where n is a octal value less than 256). For example:
\o011 Matching tab

\xnum matches num (where num is a hexadecimal swap value less than 256). For example:
\x41 matching character "A"


V. Example analysis

1 Find the link address exactly in the string

((HTTP|HTTPS|FTP):(\/\/|\\\\) ((\w) +[.]) {1,} (net|com|cn|org|cc|tv| [0-9] {1,3}) (((\/[\~]*|\\[\~]*)
(\w) +) | [.] (\w) +) * (([?] (\w) +) {1}[=]*)) * ((\w) +) {1} ([\&] (\w) +[\=] (\w) +) *) *)

As we know, link addresses are generally in the form of HTTP or HTTPS or FTP. A preliminary summary is that the link address must meet the following conditions:

Condition 1
Start with http://or https://or ftp://(and of course there are other forms, only the main)

Condition 2
http://must be followed by a word character followed by "." After the word character. (Such a combination must occur one or more times). Followed by a "." The following is the domain name suffix (such as net or COM or CN, etc., if it is in the form of IP address can be a number)

Condition 3
After the full link address, you can also appear next level or more levels of the directory (also note that the address of the personal home page may appear "~" symbol)

Condition 4
The end of the link address can be with parameters. Like a typical number of pages? Pageno=2&action=display, etc.

Now we use the following code to match the above criteria one by one--

1, ((HTTP|HTTPS|FTP):(\/\/|\\\\) to meet the conditions 1
Indicates that the http://http:\\ https://https:\\ ftp://ftp:\\ all match (where some users may be able to "//" to "" \ "" Bad "error)
Note: "|" Indicates "or", "\" is an escape character. "\/\/" means "//", "\\\\" means "\"

2. ((\w) +[) {1,} (net|com|cn|org|cc|tv| [0-9] {1,3}) satisfies condition 2
"((\w) +[.]) {1,} "means a word word alphanumeric a point number can appear 1 or more times (this takes into account some users like to omit www and write http://www.w3c.com as http://w3c.com)
"(net|com|cn|org|cc|tv| [0-9] {1,3}) " Indicates that you must end with net or COM or cn or org or CC or TV or three digits below
[0-9] {1,3} represents a number below three digits, because no segment of an IP address can exceed 255

3, ((\/[\~]*|\\[\~]*) (\w) +) |[.] (\w) +) * Satisfies the condition 3



Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.