When I didn't know rfc1738, I always thought that the regular expression of the URL is very simple. I didn't expect that there are so many types of URLs, but I didn't even think that a common HTTP regular expression is not that simple.
The following is a regular HTTP expression I found:
- Http: // ([\ W-] + \.) + [\ W-] + (/[\ W -./? % & =] *)?
Copy code
Of course, this has already met the needs of most people, but if strict verification is required, it still needs to comply with rfc1738.
URLs include HTTP, FTP, news, nntpurl, telnet, Gopher, WAIS, mailto, file, prosperurl, and otherurl.
Well, I don't need to talk about it anymore.
- # Region HTTP
- String lowalpha = @ "[A-Z]";
- String hialpha = @ "[A-Z]";
- String alpha = string. Format (@ "({0} | {1})", lowalpha, hialpha );
- String digit = @ "[0-9]";
- String safe = @ "(\ $ |-| _ | \. | \ + )";
- String extra = @"(! | \ * | '| \ (| \) | ,)";
- String hex = string. format (@ "({0} | A | B | c | d | E | f)", digit );
- String escape = string. Format (@ "(% {0} {0})", Hex );
- String unreserved = string. Format (@ "({0} | {1} | {2} | {3})", Alpha, digit, safe, extra );
- String uchar = string. Format (@ "({0} | {1})", unreserved, escape );
- String reserved = @ "(; |/| \? |: | @ | & | = )";
- String xchar = string. Format (@ "({0} | {1} | {2})", unreserved, reserved, escape );
- String digits = string. Format (@ "({0} +)", digit );
- String alphadigit = string. Format (@ "({0} | {1})", Alpha, digit );
- String domainlabel = string. Format (@ "({0} | {0} ({0} |-) * {0})", alphadigit );
- String toplabel = string. Format (@ "({0} | {0} ({1} |-) * {1})", Alpha, alphadigit );
- String hostname = string. Format (@ "({0} \.) * {1})", domainlabel, toplabel );
- String hostnumber = string. Format (@ "{0} \. {0} \. {0} \. {0}", digits );
- String host = string. Format (@ "({0} | {1})", hostname, hostnumber );
- String Port = digits;
- String hostport = string. Format (@ "({0} (: {1}) {0, 1})", host, Port );
- String hsegment = string. Format (@ "({0} |; |: | @ | & | =) *)", uchar );
- String search = string. Format (@ "({0} |; |: | @ | & | =) *)", uchar );
- String hpath = string. Format (@ "{0} (/{0}) *", hsegment );
- String httpurl = string. Format (@ "http: // {0} (/{1 }(\? {2}) {0, 1} {0, 1} ", hostport, hpath, search );
- # Endregion
Copy code
- # Region FTP
- String user = string. Format (@ "({0} |;| \? | & | =) *) ", Uchar );
- String Password = string. Format (@ "({0} |;| \? | & | =) *) ", Uchar );
- String login = string. format (@ "({0} (: {1}) {0, 1} @) {0, 1} {2})", user, password, hostport );
- String fsegment = string. Format (@ "({0} | \? |: | @ | & | =) *) ", Uchar );
- String ftptype = @ "(a | I | d | A | I | D )";
- String fpath = string. Format (@ "({0} (/{0}) *)", fsegment );
- String ftpurl = string. format (@ "ftp: // {0} (/{1} (; type = {2}) {0, 1}) {0, 1}", login, fpath, ftptype );
- # Endregion
Copy code
- # Region news
- String group = string. format (@ "({0} ({0} | {1} |-| \. | \ + | _) *) ", Alpha, digit );
- String article = string. Format (@ "({0} |; |/| \? |: | & | =) + @ {1}) ", uchar, host );
- String grouppart = string. Format (@ "(\ * | {0} | {1})", group, article );
- String newsurl = string. Format (@ "(News: {0})", grouppart );
- # Endregion
Copy code
- # Region nntpurl
- String nntpurl = string. Format (@ "NNTP: // {0}/{1} (/{2}) {0, 1}", hostport, group, digits );
- # Endregion
Copy code
- # Region Telnet
- String telneturl = string. Format (@ "telnet: // {0}/{0, 1}", login );
- # Endregion
Copy code
- # Region Gopher
- String gtype = xchar;
- String selector = string. Format (@ "({0} *)", xchar );
- String gopherplus_string = string. Format (@ "({0} *)", xchar );
- String gopherurl = string. format (@ "Gopher: // {0} (/({1} ({2} (% 09 {3} (% 09 {4}) {0, 1 }}) {0, 1}) {0, 1}) {0, 1} ", hostport, gtype, selector, search, gopherplus_string );
- # Endregion
Copy code
- # Region wais
- String database = string. Format (@ "({0} *)", uchar );
- String wtype = string. Format (@ "({0} *)", uchar );
- String wpath = string. Format (@ "({0} *)", uchar );
- String waisdatabase = string. Format (@ "(WAIS: // {0}/{1})", hostport, database );
- String waisindex = string. Format (@ "(WAIS: // {0}/{1 }\? {2}) ", hostport, database, search );
- String waisdoc = string. format (@ "(WAIS: // {0}/{1}/{2}/{3})", hostport, database, wtype, wpath );
- String waisurl = string. Format (@ "{0} | {1} | {2}", waisdatabase, waisindex, waisdoc );
- # Endregion
Copy code
- # Region mailto
- String encoded822addr = string. Format (@ "({0} +)", xchar );
- String mailtourl = string. Format (@ "mailto: {0}", encoded822addr );
- # Endregion
Copy code
- # Region File
- String fileurl = string. Format (@ "file: // ({0} {0, 1} | localhost)/{1}", host, fpath );
- # Endregion
Copy code
- # Region prosperourl
- String fieldname = string. Format (@ "({0} | \? |: | @ | &) ", Uchar );
- String fieldvalue = string. Format (@ "({0} | \? |: | @ | &) ", Uchar );
- String fieldspec = string. Format (@ "(; {0} = {1})", fieldname, fieldvalue );
- String required gment = string. Format (@ "({0} | \? |: | @ | & | =) *) ", Uchar );
- String ppath = string. Format (@ "({0} (/{0}) *)", required gment );
- String prosperourl = string. Format (@ "Prospero: // {0}/{1} ({2}) *", hostport, ppath, fieldspec );
- # Endregion
Copy code
- # Region otherurl
- // Otherurl equal genericurl
- String urlpath = string. Format (@ "({0}) *)", xchar );
- String scheme = string. Format (@ "({0} | {1} | \ + |-| \.) +)", lowalpha, digit );
- String ip_schemepar = string. Format (@ "(// {0} (/{1}) {0, 1})", login, urlpath );
- String schemepart = string. Format (@ "({0}) * | {1})", xchar, ip_schemepar );
- String genericurl = string. Format (@ "{0 }:{ 1}", scheme, schemepart );
- String otherurl = genericurl;
- # Endregion
Copy code
With pattern, the rest is much simpler. It is nothing more than regular expression verification. Take HTTP as an example:
The pattern of HTTP is string httpurl. If the URL to be verified is URL, the URL verification code is as follows:
- RegEx = new RegEx (httpurl );
- Bool ismatchhttp = RegEx. ismatch (URL );
Copy code