PHP HTML code string intercept code

PHP HTML code string intercept code _php tutorial

Last Update:2016-07-21 Source: Internet

Author: User

Tags tidy

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

And the data is the HTML code string, such as this:
Zhang San commented on a long list of articles shared by John Doe
Interception is to intercept the DIV tag inside the thing, and to keep the HTML tag, just to do the text in the processing. For example, I may just intercept "John Doe" the word "li", but if this is put to the front end, "John Doe" front of the A tag is not closed, so after interception to ensure that the HTML syntax is correct.
This problem is really not very good to engage, let me depressed for two days. Note that this is just a string, except that the content is HTML code and there is no DOM. If it is done in the front-end processing, the direct Dom gets, and then the inside of the node processing, and finally the innerHTML and other things out of the output is done. Now can not, have to change a train of thought. The idea of a colleague is this:
Iterates through each character of a string. Set a tag, hit the tag at the beginning of the tag < 1, the next character is not counted, and then hit > and then start counting. In the label inside the string processing, but also to determine whether the current character encoding is likely to be Chinese, in general, PHP UTF-8 encoded in the length of the characters are 3, so if you encounter a Chinese character encoding, it is necessary to skip two not count ... Speaking of which, my head is starting to get bigger. Personally think this method is very uncomfortable, first of all, this exquisite logic is not easy to control, and UFT-8 encoded in Chinese can be produced in the length of 3 or 4, so the rigor of the code is questionable.
My personal idea is to use Tidy (see PHP Manual for specific usage). I studied the Tidy yesterday and found this thing to be very useful. First, convert the string to a Tidy object so that:
$tidy = tidy_parse_string ($str, Array (), ' utf8′); The last one is set to encode, note, here is UTF8, not utf-8, there is no middle that connection.
Then get the body in the $tidy (because after the conversion $tidy will automatically addand other labels):
$body = Tidy_get_body ($tidy);
This time you can use Var_dump to see some $body structure, you will find that it turns each label into a corresponding object, which has the corresponding properties. For example, for example, SDF, some of the properties that correspond to this statement are:
Name=> "a"
Value = "SDF"
child=> array{[0]=> A text node object, value is SDF}
attribute=array{"href" = "#"}
... other properties
As you can see, we can actually deal with the value of the text node below the A-tag corresponding node, so that it does not break any HTML integrity. Originally I thought that change the value of the text node in the a tag, and the value of a tag will also change, so I directly return to the a tag corresponding to the value of the node OK, did not think that look, hey, so after processing the text or to spell out the new HTML.
Knowing the structure of the tidy object, everything is done, as long as the traversal of all nodes, for this requirement, is to find the div tag, and then start processing the nodes inside. The code is as follows:
if (Mb_strwidth ($subchild->value, ' utf-8′) >= $len)
{
$subchild->value = mb_strimwidth ($subchild->value, 0, $len, ' ... ', ' utf-8′ ');
$trimed _str. = $subchild->value;
Break
}
Else
{
$trimed _str. = $subchild->value;
$len = $len-mb_strwidth ($subchild->value, ' utf-8′);
}
The $subchild inside is a child node. Note that Mb_strwidth is used here to get the string length. Seriously recommend this mb_strwidth, very good, it will be Chinese as a two-character length processing, just meet the demand here! And when the string is intercepted using the Mb_strimwidth, this function will also treat Chinese as a two-character length processing, mb_ the beginning of the function is very useful ah.
I will not write the specific code, because it is written for a demand, did not make a common form. Someday I'll have time to make a general release.
In addition, unfortunately Firefox does not support the Text-overflow attribute, or do not have to go back so hard to truncate. If you have a better way, welcome to the present! greatly appreciated.

http://www.bkjia.com/PHPjc/319859.html www.bkjia.com true http://www.bkjia.com/PHPjc/319859.html techarticle and the data is the HTML code string, such as: Div class= "aaa" a href= "/aaa.php?id=1″ Zhang San/a commented on a href="/aaa.php?id=444″ John Doe/A shared a href= "bbb.htm ...



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More