A little note (encoding conversion and regular matching) based on data processing after Preg_match_all acquisition _php tutorial

Source: Internet
Author: User
1, using curl to achieve off-site acquisition

For details, please refer to my previous note: http://www.jb51.net/article/46432.htm

2. Code Conversion
First, by looking at the source code to find the site used by the code, through the Mb_convert_encoding function to transcode;

Specific Use method:

Copy the Code code as follows:
The source character is $str

The following known source code is GBK and converted to Utf-8
Mb_convert_encoding ($str, "UTF-8", "GBK");

The following unknown original code, automatically detected by auto, the conversion code is UTF-8
Mb_convert_encoding ($str, "UTF-8", "Auto");

3, in order to better avoid obstacles such as line breaks and spaces, it is necessary to first clear the collected source of line breaks, space characters and tabs

Copy the Code code as follows:
Method one, replace with Str_replace
$contents = Str_replace ("\ r \ n", ", $contents); Clear line break
$contents = Str_replace ("\ n", "", $contents); Clear line break
$contents = Str_replace ("\ T", "', $contents); Clear tabs
$contents = Str_replace ("", "', $contents); Clear whitespace

Method two, replacing with a regular expression
$contents = Preg_replace ("/([\r\n|\n|\t|] +)/",", $contents);

4. Use regular expression matching to find the code snippet to be obtained and implement the match using Preg_match_all

Copy the Code code as follows:
Function Explanation:
int Preg_match_all (string pattern, string subject, array matches [, int flags])
pattern is the regular expression
Subject is the original text to be searched
Matches is an array for storing output results
Flags is a stored pattern, including:
Preg_pattern_order; The entire array is a two-dimensional array, $arr 1[0] is an array of matched strings consisting of the bounds, $arr 1[1] To remove the array of matching strings formed by the boundary
Preg_set_order; The entire array is a two-dimensional array, $arr 2[0][0] is the first matching string consisting of a boundary, $arr 2[0][1] is the first matching string that is to be removed from the boundary, and so on.
Preg_offset_capture; The entire array is a three-dimensional array, $arr 3[0][0][0] is the first matching string that includes the bounds, $arr 3[0][0][1] is the offset to the boundary of the first matching string (the boundary is not counted), and so on, $arr 2[1][0][0] is the first matching string consisting of a boundary, $arr 3[1][0][1] is the offset to the boundary of the first matching string (boundary count);

Practical application
Preg_match_all ('/ (. *?) <\/p>/', $contents, $out, Preg_set_order);
$out will get all the matching elements
$out [0][0] will be included

Full-length characters, including
$out [0][1] will be included only (. *?) The segment of the character to match in parentheses

By analogy, the nth matching field can be obtained in the following way
$out [N-1][1]

Jo Zheng A large number of parentheses in the expression, the method of obtaining the M-match point in the sentence is
$out [N-1][m]

5, get to find the character, to remove the HTML tag, using PHP's own function strip_tags can be easily implemented

Copy the Code code as follows:
Cases
$result =strip_tags ($out [0][1]);

http://www.bkjia.com/PHPjc/728086.html www.bkjia.com true http://www.bkjia.com/PHPjc/728086.html techarticle 1, using curl to achieve off-site acquisition specific please refer to my previous note: http://www.jb51.net/article/46432.htm 2, the code conversion first by looking at the source code to find the collection of the website used by the compilation ...

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.