Php obtains the regular expression of the image address on the webpage.

Source: Internet
Author: User
Tags file url mkdir regular expression trim


1. Get the address
The main function of this function is to use a regular expression to match the image address in the page source code. The regular expression used here is:

The code is as follows: Copy code

//iU

The homepage obtains the html code of the request page through the file reading function provided by PHP, and then uses a regular expression to match the src address. Here are two notes:

■ File_get_content

Only static page content can be obtained. That is to say, if you see images displayed in Javascript on the page, the image information cannot be obtained through this tool.
■ Some websites impose some restrictions on the file_get_content function. If a webpage is not opened through a browser, the server rejects the request. In this case, we need to add a configuration information to the php program, the collected program can simulate the UA (user agent) of a message server. The specific method can be achieved through the following code: // The current simulation is a browser in the Window environment.
Ini_set ('User _ agent', 'mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; 4399Box. 560;. NET4.0C;. NET4.0E )');
After the above two points are processed, it is no problem to obtain the webpage source code. The only thing to do is to use a regular expression to match the image address.

Example 1

The code is as follows: Copy code

/**
* Obtain the image path in the replacement document.
* @ Param string $ xstr content of the webpage
* @ Param string $ the name of the file in which the keyword creates the photo. I wrote upimg.
* @ Param string $ oriweb URL is generally null
* @ Return string
 *
*/
Function replaceimg ($ xstr, $ keyword, $ oriweb ){
$ Basedir = dirname (_ FILE __);
 
// Save path
$ D = date ('ymm', time ());
$ Dirslsitss = $ basedir. '/.../uploads/'. $ keyword. '/'. $ d; // whether the category exists
If (! Is_dir ($ dirslsitss )){
@ Mkdir ($ dirserver load balancer, 0777 );
    }
 
// Match the image's src
Preg_match_all ('# ] *> # I ', $ xstr, $ match );
 
Foreach ($ match [1] as $ imgurl ){
 
$ Imgurl = $ imgurl;
 
If (is_int (strpos ($ imgurl, 'http '))){
$ Arcurl = $ imgurl;
} Else {
$ Arcurl = $ oriweb. $ imgurl;
        }
$ Img = file_get_contents ($ arcurl );
 
 
If (! Empty ($ img )){
 
// Save the image to the server
$ Fileimgname = time (). "-". rand (, 9999). ". jpg ";
$ Filecachs = $ dirslsitss. "/". $ fileimgname;
$ Fanhuistr = file_put_contents ($ filecachs, $ img );
$ Saveimgfile = "/uploads/$ keyword". "/". $ d. "/". $ fileimgname;
 
 
$ Xstr = str_replace ($ imgurl, $ saveimgfile, $ xstr );
        }
    }
Return $ xstr;
}

Some friends may also know that file_get_contents has poor performance. We can use curl to obtain

The code is as follows: Copy code


/*
* Function: php perfectly downloads remote images and saves them to a local device.
* Parameter: file url, saving the file directory, saving the file name, and using the download method
* When the name of the saved file is null, the original name of the remote file is used.
*/
Function getImage ($ url, $ save_dir = '', $ filename ='', $ type = 0 ){
If (trim ($ url) = ''){
Return array ('File _ name' => '', 'SAVE _ path' =>'', 'error' => 1 );
 }
If (trim ($ save_dir) = ''){
$ Save_dir = './';
 }
If (trim ($ filename) = '') {// Save the file name
$ Ext = strrchr ($ url ,'.');
If ($ ext! Using '.gif '& $ ext! Images '.jpg '){
Return array ('File _ name' => '', 'SAVE _ path' =>'', 'error' => 3 );
  }
$ Filename = time (). $ ext;
    }
If (0! = Strrpos ($ save_dir ,'/')){
$ Save_dir. = '/';
 }
// Create a save Directory
If (! File_exists ($ save_dir )&&! Mkdir ($ save_dir, 0777, true )){
Return array ('File _ name' => '', 'SAVE _ path' =>'', 'error' => 5 );
 }
// Method used to obtain remote files
If ($ type ){
$ Ch = curl_init ();
$ Timeout = 5;
Curl_setopt ($ ch, CURLOPT_URL, $ url );
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
Curl_setopt ($ ch, CURLOPT_CONNECTTIMEOUT, $ timeout );
$ Img = curl_exec ($ ch );
Curl_close ($ ch );
} Else {
Ob_start ();
Readfile ($ url );
$ Img = ob_get_contents ();
Ob_end_clean ();
    }
// $ Size = strlen ($ img );
// File size
$ Fp2 = @ fopen ($ save_dir. $ filename, 'A ');
Fwrite ($ fp2, $ img );
Fclose ($ fp2 );
Unset ($ img, $ url );
Return array ('File _ name' => $ filename, 'SAVE _ path' => $ save_dir. $ filename, 'error' => 0 );
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.