PHP Regular fetch picture address

Source: Internet
Author: User
Tags regular expression relative

  recently in the development of the program need to get the image of the extracted content in the address, here to share a simple method, the need for friends can refer to the following

Fascinated by the regular, and constantly try new tricks, first of all thanks to TNA's incomplete output of RSS, and then again thank SH for compulsive learning. There is no TNA, I will not go to see the regular, but also do not know the world has such a cow expression; not sh dead and alive said he did not know, I will not bite the bullet to think, to improve. To achieve the same goal, the regular expression can not be unique, not to do, only you did not think. It can be said that, is playing a set of rules, I love this kind of thing. There's nothing more exciting and awesome than setting the rules for screening things.   Share some of the use of regular extract image address in PHP environment carefully:   Picture URL specification of the HTML code is no more than the   code is as follows: <img title= "2" src= "http:// www.xlanda.net/wp-admin/%E5%9B%A73 "alt=" 4 "title=" 2 "width=" 5 "height=" 6 "/>      embarrassed 1 and embarrassed 2 are non-essential, To pass the XHTML certification of 4, 5, and 6 is essential, the 3 is the core content, of course, can not be less.   On the positive, I write the shortest match is the     code as follows: (? <=img.+?src= "). *? (? = ")     However, this is not in PHP, will appear:   Warning:preg_match_all () [Function.preg-match-all]: Compilation failed: Lookbehind assertion isn't fixed length at offset * * * *   Tangled for a long time, no, why? Tried many times, finally found the problem in (? <=img.+?src= ") This 0-wide assertion, in PHP, 0 wide assertions do not support similar" * "," + "These infinite times, so the error, the". +? " It would be better to fix the length instead. However, to "IMG" and "src=" between the fixed length is basically impossible. Usually, the image address of IMG and SRC will only be separated by a very simple space, but do not rule out some cases before SRC, img after the ALT, Titlte and other things.   So     code is as follows: (? <=img.src= "). *? (? = ")   or code as follows: (? <=imgssrc=").*? (? = ")     may be possible, but not guaranteed 100% is OK.   You may ask, the simple     code is as follows: (? <=src= "). *? (? = ")     not? Usually, you can, but, search through the page of the Basin friends should know, in addition to the picture address with SRC Beginning, javascript address also with SRC beginning! Moreover, too many powerful unpredictable factors implied, so this seemingly very short perfect writing is not workable.   You may ask, smart and short, I put the image suffix listed, it should be OK, such as the   code as follows: (? <=src= "). (jpg|jpeg|gif|png|bmp| Jpg| Jpeg| Gif| Png| BMP)     Indeed, the writing is quite honest, but have you ever seen a picture with no suffix? WWE.com has many of these examples.   RAW http://us.wwe.com/content/media/images/headers/15559182  smackdown http:// us.wwe.com/content/media/images/headers/15854138  NXT http://us.wwe.com/content/media/images/Headers/ 15929136  Superstars http://us.wwe.com/content/media/images/headers/15815850    above URLs are all pictures, but there are no traditional suffixes, It's no use being honest, or you can't get them. What about  ? You can also do this   code as follows: <img (. *?) Src= "(. *?) (? = ")     and the above expression is different, this time the result of array[0] is not what we want, we want the picture address in array[2]. Why, then? Because we used 2 (. *), each "()" thing will automatically exist in a group, and array[0] represents the summary of the results, Array[1] contains all the things in IMG and SRC, array[2] Only then turn to the picture address we want. This matching method can match the traditional suffix of the picture, but also can match some no suffix of the picture file, without killing the wrong other src= files. APeople feel is still good, hehe. Of course, if you have a better suggestion, please leave a message, the people of the world will thank you!   What kind of picture do you want, a fixed format or something? Concrete analysis of the specific situation.   My advice is:   If you want the format of the image address is an IMG space src=, please use: (? <=img.src= "). *? (? = "), array only, you know.   Otherwise, please use <IMG (. *?) Src= "(. *?) (? = "), remember to pay attention to the useful content of the array location Oh!   Talk about PHP to extract the picture address   the day before yesterday wrote a small talk about PHP is to extract the image address, but in fact, extract src= inside the picture address is not enough, because can not guarantee that the address must be an absolute address, complete address, if that is relative? If address such as:   albums/candids/thumb_p1050338.jpg /content/media/touts/5271608/5271654/15320982   Then what should be done?   Sometimes in front of these addresses need to add http://example1.com/, some even to add http://example1.com/example2/.../so, to write out a law to meet all the requirements, it is impossible. Can only play to the right remedy. Sometimes you need to move the knife from the front and sometimes you need to cut it from behind.   Today, I was surprised to know a truth, originally http://example.com/and http://example.com//////is the same!   Lg_main_a6.png   and   lg_main_a6.png   eventually you can reach    . So, for the two relative addresses that were first mentioned, if you want to force a prefix back into an absolute address, No matter whether there is a "/" in front, just add a "/" is good, "there is killing the wrong, did not let go" well, more than one display will still be normal, but one less "/", hey, you don't want to succeed. At the beginning I didn't realize that this kind of thing, copied a large piece of code, the same thing abruptly get two copies, a plus "./.", one does not add. It's a waste of time for me to come here on Mars.   Release 2 addresses, check the Web page to get picture of the situation:   for any page, need to log in except: http://xyark.serw5.com/img.php  For Coppermine Photo Gallery system: http://xyark.serw5.com/g.php (If you think it's necessary to eject the original JS page, I'll have to embarrass you)   page is an attempt to crawl any image, The system page is designed to show what is called a specific case analysis. Try the children's shoes will know, PU page for some use Coppermine Photo Gallery system site is not feasible, why? That's the prefix! But the system page is a good way to avoid this problem.   If you find any bugs during the test, please let us know. Please low-key test, thank you for your cooperation.   NOTE: The above topic is purely based on the right, light technology to talk about technology, can not be used for improper purposes. It is not the responsibility of the tableware to cause any cups or utensils without proper use.    

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.