Principles and Methods of php regular expression matching image paths

Source: Internet
Author: User
Tags php regular expression
Next, I will introduce you to the principles and implementation methods of the php regular expression matching image path. if you need to know more, please refer. It is not enough to extract the image address from src, because it cannot be guaranteed that the address must be an absolute address. what if it is relative? Such...

Next, I will introduce you to the principles and implementation methods of the php regular expression matching image path. if you need to know more, please refer.

Extracting the image address in src = is not enough, because it cannot be guaranteed that the address must be an absolute address. what if it is relative? If the address is:
Albums/Candids/thumb_P1050338.jpg
/Content/media/touts/5271608/5271654/15320982/
So how should we be good?
Sometimes in front of these addresses need to add http://example1.com/, some even to add http://example1.com/example2/.../ so, to write out a rule to meet all the requirements, it is simply a fantasy. Only the right remedy can be taken. Sometimes, you need to use a knife in front and cut it off from the back.
Today, I was surprised to know a truth, the original http://example.com/and http://example.com////// is the same!
Http://img3.douban.com/pics/nav/lg_main_a6.png
And
Http://img3.douban.com////pics////nav///lg_main_a6.png
Eventually you can reach

The html code of the image URL specification is nothing more


1
?? And ?? Is not required. if you want to pass XHTML authentication ?? ,?? ,?? ,?? Required.

When talking about regular expressions in regular expressions, the shortest match I wrote is


1 (? <= Img. +? Src = ").*? (? = ")
However, this article does not work in php, and will appear:


1 Warning: preg_match_all () [function. preg-match-all]: Compilation failed: lookbehind assertion is not fixed length at offset ***** in ***
I have been struggling for a long time, but I cannot do it. why? I tried it many times and finally found the problem in (? <= Img. +? Src = ") in this zero-width assertion, in php, the zero-width assertion does not support anything like" * "or" + "infinitely, so an error is reported, set ". +?" Change to fixed length. However, it is basically impossible to set the length between "img" and "src =. Generally, img and src of an image address are separated by a very simple space, but it is not ruled out that img has alt, titlte, and other things before src.

So


1 (? <= Img. src = ").*? (? = ")
Or

1 (? <= Imgssrc = ").*? (? = ")

Yes, but it is not guaranteed that 100% is okay.

You may ask, simple

1 (? <= Src = ").*? (? = ")
No? Generally, you can. However, those who have searched the page should know that, apart from starting the image address with src, the javascript address also starts with src! In addition, too many unpredictable factors are implied, so this seemingly short and perfect writing won't work.
You may ask, I can't make it short and clever. I can list the image suffix, as shown in figure


1 (? <= Src = ").*?. (Jpg | jpeg | gif | png | bmp | JPG | JPEG | GIF | PNG | BMP)
Indeed, this statement is very honest, but have you ever seen images without suffixes? Wwe.com has many such examples.

RAW http://us.wwe.com/content/media/images/Headers/15559182
SmackDown http://us.wwe.com/content/media/images/Headers/15854138
Http://us.wwe.com/content/media/images/Headers/15929136 NXT
Superstars http://us.wwe.com/content/media/images/Headers/15815850

The above URLs are all images, but they do not have any traditional suffixes. it is useless to be honest, but you still cannot get them.

What should we do? You can also


1. Unlike the above expression, the content of array [0] in this result is not what we want. The image address we want is in array [2. Why? Because we use two (.*?), Each "()" item automatically exists in a group, and array [0] indicates a summary of the results. array [1] contains everything in img and src, array [2] indicates the image address we want. This matching method can match images with traditional suffixes or image files without suffixes without killing other src = files. I personally feel good, huh, huh. Of course, if you still have better suggestions, please leave a message immediately. the rest of the world will thank you!
What kind of image do you want? is it a fixed format or something else? Let's take a look at the specific situation.
My suggestion is:
If the format of the image address you want is img space src =, use :(? <= Img. src = ").*? (? = "), The array is unique, you know.
Otherwise, use


And use regular expressions for a long time in the project.


/"'S] *)/I


I use kindeditor to save the article, but I need to retrieve the address of the nth image as the logo image of the article. the article code (html of the content) is saved to a field in the database, then save the image address to another field. I used the above regular expression to solve the problem.

In my note, the above address is to directly obtain the value of the src attribute in the img label. access this path on the php page using this regular expression. if you can find the image, you can directly use it. if not, you can use preg_match_all to save all the addresses to the array and then process the path, for example, you can obtain the file name (excluding the path), recompose the url, and delete the image.

My example:


Preg_match_all ("/"'s] *)/I ", str_ireplace (" "," ", $ content), $ arr );


The content is escaped by php, so I need to remove it first, str_ireplace ("", "", $ content ), then save the matched content to the $ arr array (two-dimensional ).
$ Arr [1] is the array that stores the path.



Address:

Reprinted at will, but please attach the article address :-)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.