Recently, a large amount of data was collected, but the collected data basically needs to be filtered out from the data retained by the original site. IMG is a place. There seems to be no need to show many examples of these applications on the website, but the site has not written logs for a few days, so let's get a picture from the img tag, there are a lot of javascript scripts and useless information, you must replace what you want, such as alt. Let's take a look at the content to be filtered. I just copied it:
The code is as follows:
Sdfsdfsdf500) {this. resized = true; this. style. width = 500;} "> sfsdfsdfasdfsadf500) {this. resized = true; this. style. width = 500;} "> sdfsadf500) {this. resized = true; this. style. width = 500;} "> sdfsdf
The above should be replaced and formed as follows:
The code is as follows:
Where src = "http://www.xxx.com/upimg/080330/120D1232295023X0.gif" src = "http://www.xxx.com/upimg/080330/120D1232295023X0.gif" this address to be retained, because the pictures are source address
The general method is: first read the IMG label in the content, then extract the SRC of each IMG label, combine it into its own content, and finally replace it.
Preg_match_all is the function I want. it can create a three-dimensional array of the content matched by the regular expression. you can traverse and search for them and replace them. if you are not familiar with it, please refer to the manual, I will not introduce it here. Function code:
The code is as follows:
Function replace ($ str)
{
Preg_match_all ("/] +>/isU", $ str, $ arr );
For ($ I = 0, $ j = count ($ arr [0]); $ I <$ j; $ I ++ ){
$ Str = str_replace ($ arr [0] [$ I], ", $ str );
}
Return $ str;
}