Small talk PHP regular extract picture address _php tips

Source: Internet
Author: User

Fascinated by the regular, and constantly try new tricks, first of all thanks to TNA's incomplete output of RSS, and then again thank SH for compulsive learning. There is no TNA, I will not go to see the regular, but also do not know the world has such a cow expression; not sh dead and alive said he did not know, I will not bite the bullet to think, to improve. To achieve the same goal, the regular expression can not be unique, not to do, only you did not think. It can be said that, is playing a set of rules, I love this kind of thing. There's nothing more exciting and awesome than setting the rules for screening things.

Share some of the use of regular extracted picture addresses in PHP environments carefully:

The HTML code of the picture URL specification is nothing more than

Copy Code code as follows:


The 1 and the 2 is not necessary, to pass the XHTML authentication embarrassed 4, embarrassed 5, the embarrassed 6 is essential, the awkward 3 is the core content, certainly must not be less.

The shortest match I've written is the right one to talk about.

Copy Code code as follows:

(? <=img.+?src= "). *? (?=")

However, this one does not work in PHP, and it appears:

Warning:preg_match_all () [Function.preg-match-all]: compilation Failed:lookbehind assertion is isn't fixed length at OFFSE T * * * in * * *

Entangled for a long time, not all, why? Tried many times, finally found the problem in (? <=img.+?src= ") This 0-wide assertion, in PHP, 0 wide assertions do not support similar" * "," + "These infinite times, so the error, the". +? " It would be better to fix the length instead. However, to "IMG" and "src=" between the fixed length is basically impossible. Usually, the image address of IMG and SRC will only be separated by a very simple space, but do not rule out some cases before SRC, img after the ALT, Titlte and other things.

So

Copy Code code as follows:

(? <=img.src= "). *? (?=")

Or
Copy Code code as follows:

(? <=img\ssrc= "). *? (?=")

It may be possible, but it is not guaranteed that 100% is OK.

You may ask, simple

Copy Code code as follows:

(? <=src= "). *? (?=")

Can't you? Usually, you can, but, search through the page of the Basin friends should know, in addition to the picture address with SRC Beginning, javascript address also with SRC beginning! Moreover, too many powerful unpredictable factors implied, so this seemingly very short perfect writing is not workable.

You may also ask, clever short of no, I put the image suffix listed, always should be, such as

Copy Code code as follows:

(? <=src= "). *?\. (jpg|jpeg|gif|png|bmp| Jpg| Jpeg| Gif| Png| BMP)

Indeed, this is a very honest writing, but have you ever seen a picture with no suffix? WWE.com has many of these examples.

RAW http://us.wwe.com/content/media/images/Headers/15559182
SmackDown http://us.wwe.com/content/media/images/Headers/15854138
NXT http://us.wwe.com/content/media/images/Headers/15929136
Superstars http://us.wwe.com/content/media/images/Headers/15815850

The above URLs are all pictures, but there is no traditional suffix, you are not honest and useless, or can not get them.

What do we do? I can do that.

Copy Code code as follows:

Unlike the above expression, the array[0 in this result is not what we want, the picture address we want is in array[2]. Why, then? Because we used 2 (. *), each "()" thing will automatically exist in a group, and array[0] represents the summary of the results, Array[1] contains all the things in IMG and SRC, array[2] Only then turn to the picture address we want. This matching method can match the traditional suffix of the picture, but also can match some no suffix of the picture file, without killing the wrong other src= files. Personal feeling is still good, hehe. Of course, if you have a better suggestion, please leave a message, the people of the world will thank you!

What kind of picture do you want, a fixed format or something? Concrete analysis of the specific situation.

My advice is:

If you want the format of the image address is an IMG space src=, please use: (? <=img.src= "). *? (? = "), array only, you know.

Otherwise, please use

Talk again PHP regular fetch picture address

The day before yesterday wrote a small talk about PHP is to extract the image address, but in fact, extract src= inside the picture address is not enough, because it can not guarantee that the address must be an absolute address, the full address, if that is relative? If the address is such as:

Albums/candids/thumb_p1050338.jpg
/content/media/touts/5271608/5271654/15320982

What's the best way to do that?

Sometimes in front of these addresses need to add http://example1.com/, some even to add http://example1.com/example2/.../so, to write out a law to meet all the requirements, it is impossible. Can only play to the right remedy. Sometimes you need to move the knife from the front and sometimes you need to cut it from behind.

Today, I was surprised to know a truth, originally http://example.com/and http://example.com//////is the same!

Yun_qi_img/lg_main_a6.png

And

Yun_qi_img/lg_main_a6.png

In the end you can reach

Thus, for the first mentioned two relative addresses, if you want to forcibly add a prefix to return to an absolute address, no matter whether there is a "/" in front, just add a "/" is good, "there is killing the wrong, did not let go" well, more than one display will still be normal, but one less "/", hey, you don't want to succeed. At the beginning I didn't realize that this kind of thing, copied a large piece of code, the same thing abruptly get two copies, a plus "./.", one does not add. It's a waste of time for me to come here on Mars.

Release 2 addresses, the public survey of the Web page to get pictures of the situation:

Except for any Web page that needs to be logged in: http://xyark.serw5.com/img.php
For Coppermine Photo Gallery system: http://xyark.serw5.com/g.php (If you think the JS page that pops up the original is also needed, I'll have to embarrass you)

The page is an attempt to crawl any picture, the system page is designed to show what is called concrete analysis. Try the children's shoes will know, PU page for some use Coppermine Photo Gallery system site is not feasible, why? That's the prefix! But the system page is a good way to avoid this problem.

If you find any bugs in the test, please let us know. Please low-key test, thank you for your cooperation.

Note: The above topic is purely based on the right, light technology to talk about technology, can not be used for improper purposes. It is not the responsibility of the tableware to cause any cups or utensils without proper use. When the

is reproduced, please indicate the original source and author information of the article in the form of a hyperlink and
Http://www.blogbus.com/xrspook-logs/85330456.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.