Small Talk PHP regular extract image address _php tutorial

Source: Internet
Author: User
Fascinated by the regular, constantly experimenting with new tricks, first thanks to TNA's non-full output of RSS, and then again thanks to SH's compulsive learning. No TNA, I will not go to see the regular, more do not know that there is such an expression of cattle in the world, not SH's life and death said he did not know, I will not bite the bullet to ponder, to improve. To achieve the same purpose, the regular expression can be not unique, not to do, only you did not think. It can be said that the regular is playing the rules, I love this kind of thing. Nothing is more exciting and awesome than setting the rules to filter things.

Share some caution in using the regular extract image address in the PHP environment:

The HTML code of the image URL specification is nothing more than

Copy the Code code as follows:

Embarrassing 1 and embarrassing 2 is required, to pass XHTML certification embarrassing 4, embarrassing 5, embarrassing 6 is essential, 3 is the core content, of course, can not be less.

The shortest match I write is

Copy the Code code as follows:
(? <=img.+?src= "). *? (?=")

However, this does not work in PHP, it will appear:

Warning:preg_match_all () [Function.preg-match-all]: compilation Failed:lookbehind assertion is a fixed length at OFFSE T * * * *

Tangled for a long time, can not, why? Tried many times, finally found the problem in ("<=img.+?src=") this 0-wide assertion, in PHP, 0 wide assertion does not support similar "*", "+" These infinite times of things, so error, put ". +?" It would be better to fix the length instead. However, it is basically impossible to fix the length between "img" and "src=". Usually, the image address of the IMG and SRC will only be separated by a very simple space, but do not rule out some cases before SRC, img after the ALT, Titlte and other things.

So
Copy the Code code as follows:
(? <=img.src= "). *? (?=")

Or
Copy the Code code as follows:
(? <=img\ssrc= "). *? (?=")

May be, but there is no guarantee that 100% is not a problem.

You may ask, simple

Copy the Code code as follows:
(? <=src= "). *? (?=")

Can't you? Usually, you can, but, search the page of the pot friends should know, in addition to the image address with the beginning of SRC, JavaScript address also with SRC start! Moreover, too many of the unpredictable factors of the well-being implied therein, so this seemingly short perfect writing will not work.

You may also ask, smart short not, I put the suffix of the picture, it is always OK, such as

Copy the Code code as follows:
(? <=src= "). *?\. (jpg|jpeg|gif|png|bmp| Jpg| Jpeg| Gif| Png| BMP)

Indeed, the wording is quite honest, but have you ever seen a picture without a suffix? WWE.com, there are many examples of this.

RAW http://us.wwe.com/content/media/images/Headers/15559182
SmackDown http://us.wwe.com/content/media/images/Headers/15854138
NXT http://us.wwe.com/content/media/images/Headers/15929136
Superstars http://us.wwe.com/content/media/images/Headers/15815850

The URLs above are all pictures, but there are no traditional suffixes, and you don't have to be honest or get them.

What do we do? I can do that.

Copy the Code code as follows:

Unlike the above expression, the result of this array[0] is not what we want, we want the image address in array[2]. Why is it? Because we used 2 (. *), each "()" thing will automatically exist in a group, and array[0] represents the summary of the results, Array[1] contains all the things in IMG and SRC, array[2] before it turns to the image address we want. This matching method can match the traditional suffix of the picture, but also can match some of the no suffix image files, but also do not kill the wrong other src= files. Personal feeling is still good, hehe. Of course, if you have better suggestions, please leave a message, people all over the world will thank you!

What kind of picture do you want, a fixed format or something else? Specific analysis of the situation.

My advice is to:

If the format of the image address you want is an IMG space src=, please use: (? <=img.src= "). *? (? = "), the array is unique, you know.

Otherwise, please use

Another talk about PHP regular extract image address

The day before yesterday to write a small talk PHP regular extract image address, but in fact, extract src= inside the image address is not enough, because there is no guarantee that the address must be absolute address, full address, if that is relative? If the address is such as:

Albums/candids/thumb_p1050338.jpg
/content/media/touts/5271608/5271654/15320982

What's the best way to do that?

Sometimes in front of these addresses need to add http://example1.com/, and some even to add http://example1.com/example2/.../so, to write a law to meet all the requirements, it is impossible. Only by ear to the remedy. Sometimes, you need to move the knife from the front, sometimes need to cut from behind.

Today, I was surprised to know a truth, the original http://example.com/and http://example.com//////is the same!

Http://img3.douban.com/pics/nav/lg_main_a6.png

And

Http://img3.douban.com////pics////nav///lg_main_a6.png

You'll get there eventually.

So, for the first mention of the two relative address if you want to forcibly add a prefix to restore to an absolute address, and no matter whether there is a "/", just add a "/" good, "there is wrong, did not let go" well, more than a display will still normal, but less a "/", hey, you will not be successful At first I didn't realize it, copied a large piece of code, abruptly the same thing two copies, one add ".". It's a waste of time for me to come from Mars.

Release 2 address, public test the page to get pictures of the situation:

For any Web page, you need to login except: http://xyark.serw5.com/img.php
For Coppermine Photo Gallery system: http://xyark.serw5.com/g.php (If you think the JS page with the original image is needed, I have to embarrass you)

The page is an attempt to crawl any image, and the system page is designed to show what is called specific analysis. The children's shoes will know, the General page on some use Coppermine Photo Gallery System website is not feasible, why? That's the one that got the prefix! But the system page is a good way to avoid this problem.

If you find any bug in the test, please leave a message to inform. Please low-key test, thank you for your cooperation.

Note: The above topic is purely for the right, the light technology talk about technology, not for non-legitimate purposes. Any cup or tableware shall not be liable to you if it is not properly used.

Please indicate the original source and author information of the article and this statement in the form of a hyperlink.
Http://www.blogbus.com/xrspook-logs/85330456.html

http://www.bkjia.com/PHPjc/746618.html www.bkjia.com true http://www.bkjia.com/PHPjc/746618.html techarticle fascinated by the regular, constantly experimenting with new tricks, first thanks to TNA's non-full output of RSS, and then again thanks to SH's compulsive learning. Without TNA, I will not go to see the regular, more do not know the world ...

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.