1, using curl to achieve off-site acquisition
For details, please refer to my previous note: http://www.jb51.net/article/46432.htm
2. Code Conversion
First, by looking at the source code to find the site used by the code, through the Mb_convert_encoding function to transcode;
Specific Use method:
Copy the Code code as follows:
The source character is $str
The following known source code is GBK and converted to Utf-8
Mb_convert_encoding ($str, "UTF-8", "GBK");
The following unknown original code, automatically detected by auto, the conversion code is UTF-8
Mb_convert_encoding ($str, "UTF-8", "Auto");
3, in order to better avoid obstacles such as line breaks and spaces, it is necessary to first clear the collected source of line breaks, space characters and tabs
Copy the Code code as follows:
Method one, replace with Str_replace
$contents = Str_replace ("\ r \ n", ", $contents); Clear line break
$contents = Str_replace ("\ n", "", $contents); Clear line break
$contents = Str_replace ("\ T", "', $contents); Clear tabs
$contents = Str_replace ("", "', $contents); Clear whitespace
Method two, replacing with a regular expression
$contents = Preg_replace ("/([\r\n|\n|\t|] +)/",", $contents);
4. Use regular expression matching to find the code snippet to be obtained and implement the match using Preg_match_all
Copy the Code code as follows:
Function Explanation:
int Preg_match_all (string pattern, string subject, array matches [, int flags])
pattern is the regular expression
Subject is the original text to be searched
Matches is an array for storing output results
Flags is a stored pattern, including:
Preg_pattern_order; The entire array is a two-dimensional array, $arr 1[0] is an array of matched strings consisting of the bounds, $arr 1[1] To remove the array of matching strings formed by the boundary
Preg_set_order; The entire array is a two-dimensional array, $arr 2[0][0] is the first matching string consisting of a boundary, $arr 2[0][1] is the first matching string that is to be removed from the boundary, and so on.
Preg_offset_capture; The entire array is a three-dimensional array, $arr 3[0][0][0] is the first matching string that includes the bounds, $arr 3[0][0][1] is the offset to the boundary of the first matching string (the boundary is not counted), and so on, $arr 2[1][0][0] is the first matching string consisting of a boundary, $arr 3[1][0][1] is the offset to the boundary of the first matching string (boundary count);
Practical application
Preg_match_all ('/ (. *?) <\/p>/', $contents, $out, Preg_set_order);
$out will get all the matching elements
$out [0][0] will be included
Full-length characters, including
$out [0][1] will be included only (. *?) The segment of the character to match in parentheses
By analogy, the nth matching field can be obtained in the following way
$out [N-1][1]
Jo Zheng A large number of parentheses in the expression, the method of obtaining the M-match point in the sentence is
$out [N-1][m]
5, get to find the character, to remove the HTML tag, using PHP's own function strip_tags can be easily implemented
Copy the Code code as follows:
Cases
$result =strip_tags ($out [0][1]);
http://www.bkjia.com/PHPjc/728086.html www.bkjia.com true http://www.bkjia.com/PHPjc/728086.html techarticle 1, using curl to achieve off-site acquisition specific please refer to my previous note: http://www.jb51.net/article/46432.htm 2, the code conversion first by looking at the source code to find the collection of the website used by the compilation ...