PHP uses regular expressions to validate URL links

Source: Internet
Author: User
Tags regular expression trim

Before toss a short URL program, the process is very smooth, the only is in the verification of the Web site this step card shell, spent the whole process of half time, finally after a search, toss and test, only to find a perfect solution.

In the short URL program, the verification of the Web site is undoubtedly very important. And do not say a variety of security issues, is that some "fish in troubled Waters" site occupies a large number of short URLs are very annoying.

When it comes to validating URLs, I believe most people react to regular expressions at the first time, indeed, this is very scientific, but Henai himself is too slag, usually want to match a complex piece of HTML to be repeated debugging half-day, not to mention I even the structure of the Web site can not completely comb clear, so or bypass it, We don't have to find ourselves uncomfortable.

Esc_url_raw ()

Think of the usual development, verify that the URL is directly tuned to the WordPress core function, so consider directly to the WordPress Esc_url_raw () function to move over, save time and effort.

It turned out that I was too naïve. Copy of the process I found that it implicated too many other modules, a module to hook up a bit, want to completely move over is too large, to the end I can not tell where is where, you in the end with a few modules have relations ah? Do it like this, or directly based on WordPress forget.

Think again, in fact this function is intended to "filter" the Web site, as far as possible to the illegal web site into legal: no agreement? Let me add an agreement to it; illegal characters? Just kill it, don't get dirty, the browser or server can't parse it at all? What do I care?

It is just to make the Web site does not appear fundamental error is fine, do not let you this a small URL of my entire program to confuse the line, although in the WordPress environment, most of this is enough, but in our short URL program, this is obviously not strict enough.

Filter_var ()

His idea has not been able to run once, it has been stillborn, no way, had to go to help the almighty search engine adults. Looking for a long time, looking at the Stack Overflow on the road to the great God of a lengthy speech (in fact, I now seem to be in a lengthy speech ...). , finally decided to choose the most recommended Filter_var () function to try, actually I have heard of this thing before, but there is no opportunity to actually use, you can take advantage of the present study.

Filter_var () is a PHP with a function, as the name suggests, used to verify (filter) variables, the impression should not only be able to verify the URL, what mailboxes and so on are also playing.

If you want to use it to validate URLs, you only need to:

if (Filter_var ($url, Filter_validate_url)!== false)
Echo ' Can, this is very web site. '; Validation successful
Else
Echo ' What the hell is this? '; Validation failed, return FALSE.
It's very simple, however, it is important to note that failure to return FALSE does not mean that you can simply use the form of if (!filter_var (...)) to make a hasty decision that the URL is illegal, because it will return the filtered content when it succeeds, not TRUE, if the content is ' 0 ' then what do you do? Although we verify that the Web site can not be such a situation, but this habit is not to form a good, I was young when I had to modify the example code, in the Strpos () function on the fall (in fact, I now often think smart, and pay the price, and then continue to smart-aleck).

OK, after some testing, feel that there is no big problem, so the code handed out. After a while, others a small test, found similar to Baidu and Google's search results URLs can not be validated through, but also a wave of urgent research.

The conclusion is that Filter_var () will assume that all URLs with Chinese are illegal, although it can be enforced by means of escaping the Chinese part, but I have removed all of the URL escapes to avoid repetition, so I think it would be confusing. And it turns out that even in the absence of Chinese, it has some problems with the support of anchor points (that is, the pile behind the "#" in the URL), so decisively abandons them.

Happy End

Okay, here's the big play.

Helpless, had to continue to search the solution, and occasionally saw a Web site, into the feeling into the paradise, encounter, inside there is almost everything I need now.

How to correctly use regular expressions to validate URL links

The author of this page collects a number of different regular expressions and uses them to test a series of URLs, some of which need to be validated, and quite the opposite.

In the end, only a player named Diego Perini through the game, perfect to meet all the requirements, his regular expression has a full 502 characters, drilling a look very scary, anyway, I am completely understand ...

To make it easy to use, I encapsulated its regular expression into a PHP function that, if I wanted to use it, called directly:

/**
* Check whether the website is legal
*
* @link https://www.bgbk.org/regex-url/
* @link https://gist.github.com/dperini/729294
*
* @param string $url The URL to be detected.
* @return whether BOOL is a legitimate web site.
*/
function Is_url ($url) {
if (!trim ($url))
return false;

if (strlen ($url) < 10)
return false;

$pattern = ' _^:(?: https?| FTP)://) (?: \ S+ (?:: \s*)? (?:(?! 10 (?: \. \d{1,3}) {3}) (?! 127 (?: \. \d{1,3}) {3}) (?! 169\.254 (?: \. \d{1,3}) {2}) (?! 192\.168 (?: \. \d{1,3}) {2}) (?! 172\. (?: 1[6-9]|2\d|3[0-1]) (?:\. \d{1,3}) {2}) (?: [1-9]\d?| 1\d\d|2[01]\d|22[0-3]) (?: \. (?: 1?\d{1,2}|2[0-4]\d|25[0-5]) {2} (?:\. (?: [1-9]\d?| 1\D\D|2[0-4]\D|25[0-4]) | (:(?: [a-z\x{00a1}-\x{ffff}0-9]+-?) *[a-z\x{00a1}-\x{ffff}0-9]+) (?: \. (?: [a-z\x{00a1}-\x{ffff}0-9]+-?) *[a-z\x{00a1}-\x{ffff}0-9]+) * (?: \. (?: [A-z\x{00a1}-\x{ffff}]{2,})] (?:: \d{2,5})? (?:/ [^\s]*)? $_ius ';

$result = Preg_match ($pattern, $url);
$result = (bool) $result;

return $result;
}
Just like this:

if (Is_url ($url))
Echo ' Can, this is very web site. '; Validation successful
Else
Echo ' What the hell is this? '; Validation failed
Although I do not understand the regular expression, but it is obvious that the length of the performance, so I have a few simple prejudge the site, I hope to reduce the pressure on the server.

In addition, if you want to understand how it works, or if you want to use this regular expression in JavaScript, you can refer to the author's page on GitHub, which has the original version of JavaScript, and has a lot of annotations to make it easier for you to learn and use!

In addition, you can click to download a list of all the URLs of the tests, you can try a variety of regular expressions, or verify your methods, if you find a better way, be sure to share in the comments!

Complimentary materials

Finally attached to a page is also accidentally found, inside with a very clear way to introduce the composition of the Web site rules, interested students can click here to visit.

How to correctly use regular expressions to validate URL links

In JavaScript, use window.location. Plus the name of the attribute mentioned in the demo model, you can get to the part of the current page URL, such as Window.location.protocol, will generally return "http:" or "https:".

Of course, the Web site above only shows the basic elements of the site, if you want to know more, such as the URL of each section can not appear characters ah, the need to escape the text Ah, as well as a variety of conventions commonly known as rules and precautions, can be found here, very complete Oh!

URIs are each of the resources available on the Web-HTML documents, images, video clips, programs, and so on-that are positioned by a generic resource marker (uniform Resource Identifier, "URI"). Objects grouped:


^(([^:/?#]+):)? (//([^/?#]*))? ([^?#]*) (\? ([^#]*))? (#(.*))?
12 3 4


The test code is as follows:

<?php
$search = ' ~^ ([^:/?#]+):)? ( //([^/?#]*))? ([^?#]*) (\? ([^#]*))? (# (. *)) ~i ';
$url = ' http://www.jb51.net/pub/ietf/uri/#Gonn ';
$url = Trim ($url);
Preg_match_all ($search, $url, $RR);
printf ("<p> Output URL data is: </p><pre>%s</pre>\n", Var_export ($RR, TRUE));

/*
The groupings are as follows
$ = http:
$ = http
$ =//www.111cn.net
$ = www.111cn.net
$ =/pub/ietf/uri/
$ = <undefined>
$ = <undefined>
$ $ = #Gonn
$ = Gonn
*/
?>

The previous regular expression can get any part of the URL, and the following code is simpler:


<?php
//obtains the hostname
Preg_match ("/^ (http:\/\/)" from the URL? [^\/]+)/I "," http://www.jb51.net/index.html ", $matches);
$host = $matches [2];
//To obtain the following two paragraphs from the host name
Preg_match ("/[^\.\/]+\.[ ^\.\/]+$/", $host, $matches);
Echo Domain name is: {$matches [0]}\n];
?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.