PHP HTML code string intercept code _php tutorial

Source: Internet
Author: User
Tags tidy
And the data is the HTML code string, such as this:
Zhang San commented on a long list of articles shared by John Doe
Interception is to intercept the DIV tag inside the thing, and to keep the HTML tag, just to do the text in the processing. For example, I may just intercept "John Doe" the word "li", but if this is put to the front end, "John Doe" front of the A tag is not closed, so after interception to ensure that the HTML syntax is correct.
This problem is really not very good to engage, let me depressed for two days. Note that this is just a string, except that the content is HTML code and there is no DOM. If it is done in the front-end processing, the direct Dom gets, and then the inside of the node processing, and finally the innerHTML and other things out of the output is done. Now can not, have to change a train of thought. The idea of a colleague is this:
Iterates through each character of a string. Set a tag, hit the tag at the beginning of the tag < 1, the next character is not counted, and then hit > and then start counting. In the label inside the string processing, but also to determine whether the current character encoding is likely to be Chinese, in general, PHP UTF-8 encoded in the length of the characters are 3, so if you encounter a Chinese character encoding, it is necessary to skip two not count ... Speaking of which, my head is starting to get bigger. Personally think this method is very uncomfortable, first of all, this exquisite logic is not easy to control, and UFT-8 encoded in Chinese can be produced in the length of 3 or 4, so the rigor of the code is questionable.
My personal idea is to use Tidy (see PHP Manual for specific usage). I studied the Tidy yesterday and found this thing to be very useful. First, convert the string to a Tidy object so that:
$tidy = tidy_parse_string ($str, Array (), ' utf8′); The last one is set to encode, note, here is UTF8, not utf-8, there is no middle that connection.
Then get the body in the $tidy (because after the conversion $tidy will automatically addand other labels):
$body = Tidy_get_body ($tidy);
This time you can use Var_dump to see some $body structure, you will find that it turns each label into a corresponding object, which has the corresponding properties. For example, for example, SDF, some of the properties that correspond to this statement are:
Name=> "a"
Value = "SDF"
child=> array{[0]=> A text node object, value is SDF}
attribute=array{"href" = "#"}
... other properties
As you can see, we can actually deal with the value of the text node below the A-tag corresponding node, so that it does not break any HTML integrity. Originally I thought that change the value of the text node in the a tag, and the value of a tag will also change, so I directly return to the a tag corresponding to the value of the node OK, did not think that look, hey, so after processing the text or to spell out the new HTML.
Knowing the structure of the tidy object, everything is done, as long as the traversal of all nodes, for this requirement, is to find the div tag, and then start processing the nodes inside. The code is as follows:
if (Mb_strwidth ($subchild->value, ' utf-8′) >= $len)
{
$subchild->value = mb_strimwidth ($subchild->value, 0, $len, ' ... ', ' utf-8′ ');
$trimed _str. = $subchild->value;
Break
}
Else
{
$trimed _str. = $subchild->value;
$len = $len-mb_strwidth ($subchild->value, ' utf-8′);
}
The $subchild inside is a child node. Note that Mb_strwidth is used here to get the string length. Seriously recommend this mb_strwidth, very good, it will be Chinese as a two-character length processing, just meet the demand here! And when the string is intercepted using the Mb_strimwidth, this function will also treat Chinese as a two-character length processing, mb_ the beginning of the function is very useful ah.
I will not write the specific code, because it is written for a demand, did not make a common form. Someday I'll have time to make a general release.
In addition, unfortunately Firefox does not support the Text-overflow attribute, or do not have to go back so hard to truncate. If you have a better way, welcome to the present! greatly appreciated.

http://www.bkjia.com/PHPjc/319859.html www.bkjia.com true http://www.bkjia.com/PHPjc/319859.html techarticle and the data is the HTML code string, such as: Div class= "aaa" a href= "/aaa.php?id=1″ Zhang San/a commented on a href="/aaa.php?id=444″ John Doe/A shared a href= "bbb.htm ...

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.