Recently, when developing a program, you need to obtain the image address in the extracted content. here is a simple example. if you need a friend, you can refer to the regular expression and try new tricks, first, I would like to thank TNA for its incomplete RSS output, and then I would like to thank SH for its forced learning. Without TNA, I won't go to regular expressions, but I don't even know that there is such a cool expression in the world. it's not that SH's life and death say that he doesn't know or know, and I don't know how to fix it. For the same purpose, the expression of regular expressions can be unique and cannot be done. only you did not expect it. It can be said that regular expressions are just about setting patterns. I love such things. Nothing makes me more excited and feel awesome than setting regular filtering.
In the php environment, be careful when using regular expressions to extract image addresses:
The html code of the image URL specification is nothing more
The code is as follows:
Token 1 and token 2 are not required. if you want to pass the XHTML authentication of token 4, token 5, and token 6, commit 3 is the core content, of course, it cannot be less.
When talking about regular expressions in regular expressions, the shortest match I wrote is
The code is as follows:
(? <= Img. +? Src = ").*? (? = ")
However, this article does not work in php, and will appear:
Warning: preg_match_all () [function. preg-match-all]: Compilation failed: lookbehind assertion is not fixed length at offset ***** in ***
I have been struggling for a long time, but I cannot do it. why? I tried it many times and finally found the problem in (? <= Img. +? Src = ") in this zero-width assertion, in php, the zero-width assertion does not support anything like" * "or" + "infinitely, so an error is reported, set ". +?" Change to fixed length. However, it is basically impossible to set the length between "img" and "src =. Generally, img and src of an image address are separated by a very simple space, but it is not ruled out that img has alt, titlte, and other things before src.
So
The code is as follows:
(? <= Img. src = ").*? (? = ")
Or
The code is as follows:
(? <= Img \ ssrc = ").*? (? = ")
Yes, but it is not guaranteed that 100% is okay.
You may ask, simple
The code is as follows:
(? <= Src = ").*? (? = ")
No? Generally, you can. However, those who have searched the page should know that, apart from starting the image address with src, the javascript address also starts with src! In addition, too many unpredictable factors are implied, so this seemingly short and perfect writing won't work.
You may ask, I can't make it short and clever. I can list the image suffix, as shown in figure
The code is as follows:
(? <= Src = ").*? \. (Jpg | jpeg | gif | png | bmp | JPG | JPEG | GIF | PNG | BMP)
Indeed, this statement is very honest, but have you ever seen images without suffixes? Wwe.com has many such examples.
RAW http://us.wwe.com/content/media/images/Headers/15559182
SmackDown http://us.wwe.com/content/media/images/Headers/15854138
Http://us.wwe.com/content/media/images/Headers/15929136 NXT
Superstars http://us.wwe.com/content/media/images/Headers/15815850
The above URLs are all images, but they do not have any traditional suffixes. it is useless to be honest, but you still cannot get them.
What should we do? You can also
The code is as follows:
Unlike the above expression, the content of array [0] in this result is not what we want. The image address we want is in array [2. Why? Because we use two (.*?), Each "()" item automatically exists in a group, and array [0] indicates a summary of the results. array [1] contains everything in img and src, array [2] indicates the image address we want. This matching method can match images with traditional suffixes or image files without suffixes without killing other src = files. I personally feel good, huh, huh. Of course, if you still have better suggestions, please leave a message immediately. the rest of the world will thank you!
What kind of image do you want? is it a fixed format or something else? Let's take a look at the specific situation.
My suggestion is:
If the format of the image address you want is img space src =, use :(? <= Img. src = ").*? (? = "), The array is unique, you know.
Otherwise, use
Php regular expression extraction Image address
I wrote a remark about php regular expression extraction Image address the day before yesterday, but in fact, extracting the image address in src = is not enough, because it cannot be guaranteed that the address must be an absolute address, complete address, what if that is relative? If the address is:
Albums/Candids/thumb_P1050338.jpg
/Content/media/touts/5271608/5271654/15320982/
So how should we be good?
Sometimes in front of these addresses need to add http://example1.com/, some even to add http://example1.com/example2/.../ so, to write out a rule to meet all the requirements, it is simply a fantasy. Only the right remedy can be taken. Sometimes, you need to use a knife in front and cut it off from the back.
Today, I was surprised to know a truth, the original http://example.com/and http://example.com////// is the same!
Http://img3.douban.com/pics/nav/lg_main_a6.png
And
Http://img3.douban.com////pics////nav///lg_main_a6.png
Eventually you can reach
Therefore, if you want to forcibly add a prefix to the two relative addresses mentioned at the beginning and restore them to an absolute address, you just need to add a "/", regardless of whether there is a "/" in the front, "If there is a kill error, you have not let it go", but one more display will still be normal, but one more "/" will be missing. Hey, you don't want to succeed. At the beginning, I didn't realize this kind of thing. I copied a large piece of code and made two copies of the same thing, one with ".." and the other with No. I am from Mars, a waste of time.
Release two addresses for public beta:
Except for any web pages that require login: http://xyark.serw5.com/img.php
For the Coppermine Photo Gallery system: http://xyark.serw5.com/g.php (if you think the js page that pops up the source image also needs it, I have to crash you)
A general page is an attempt to capture any image. a system page is used to show what is called a specific analysis. The children's shoes I have tried will know that the pages won't work for some websites that use the Coppermine Photo Gallery system. why? It's the ghost with the prefix! However, the system page can avoid this problem.
If you find any bugs during the test, leave a message. Please keep a low profile test. thank you for your cooperation.
Note: the above topics focus solely on regular expressions and technology and cannot be used improperly. I am not liable for any cup or tableware caused by improper use.
During reprinting, please use hyperlinks to indicate the original source and author information of the article and this statement
Http://www.blogbus.com/xrspook-logs/85330456.html