Php html code string truncation code

Last Update:2013-10-16 Source: Internet

Author: User

Tags tidy

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The data is an HTML code string, for example:
<Div class = "aaa"> <a href = "/aaa. php? I commented on <a href = "/aaa. php? Id = 444 "> Li Si </a> <a href?#bbb.html"> a long string of articles </a> </div>
When intercepting a div, you must intercept the content inside the div tag and keep the HTML Tag, only to process the text. For example, I may just intercept the word "Li" from "Li Si", but if I put it on the front end, the tag in front of "Li Si" is not closed, therefore, make sure that the HTML syntax is correct after the interception.
This problem is indeed not very good, and I have been depressed for two days. Please note that this is only a string, but the content is HTML code and there is no DOM. If the processing is done at the front end, you can get the DOM directly, then process the nodes in it, and finally output the innerHTML and so on. It's not going to work now. You have to change your mind. My colleague thought like this:
Traverses each character of a string. Set a tag. When a tag starts to be marked <, it is set to 1. The subsequent characters are not counted, and then the count starts after the tag is met>. When processing the character string inside the tag, we must first judge whether the current character encoding is possible Chinese, in general PHP UTF-8 encoding of Chinese characters are 3, so if you encounter a Chinese character encoding, you must skip two non-records ...... Here, I started to grow my head. I personally think this method is very uncomfortable, first of all this exquisite logic is not easy to control, and the length of the Chinese produced in the UFT-8 code may be 3 or 4 so the rigor of the Code is questionable.
My personal idea is to use Tidy (refer to the PHP manual for specific usage ). I studied the Tidy yesterday and found that it is quite useful. First, convert the string to a Tidy object, as shown in the following code:
$ Tidy = tidy_parse_string ($ str, array (), 'utf8'); // The last one is set encoding. Note that utf8 is used, not UTF-8, and there is no intermediate line.
Then obtain the body in $ tidy (because after the conversion, $ tidy automatically adds the $ Body = tidy_get_body ($ tidy );
At this time, you can use var_dump to view some $ body structures. It will find that each label is converted into a corresponding object with corresponding attributes. For example, for <a href = "#"> sdf </a>, the attributes of such a statement include:
Name => ""
Value => "<a href =" # "> sdf </a>"
Child => array {[0] => A text Node object whose value is sdf}
Attribute = array {"href" => "#"}
..... Other attributes
We can see that we can process the value of the text node under the corresponding node of tag a separately, so that no HTML integrity will be damaged. I thought that after the value of the text node in tag a is changed, the value of tag a will also change. Then I directly returned the value of the node corresponding to tag a to be OK, well, after processing the text, you still need to spell out a new HTML by yourself.
After knowing the structure of the Tidy object, everything is easy to do. Just traverse all the nodes and find the div label for this requirement, and then start to process the nodes. The Code is as follows:
If (mb_strwidth ($ subchild-> value, 'utf-8') >=$ len)
{
$ Subchild-> value = mb_strimwidth ($ subchild-> value, 0, $ len ,'... ', 'Utf-8 ′);
$ Trimed_str. = $ subchild-> value;
Break;
}
Else
{
$ Trimed_str. = $ subchild-> value;
$ Len = $ len-mb_strwidth ($ subchild-> value, 'utf-8 ′);
}
$ Subchild in it is a subnode. Note that mb_strwidth is used to obtain the string length. We strongly recommend this mb_strwidth, which is very useful. It treats Chinese as two characters in length, just in line with the requirements here! In addition, mb_strimwidth is used to intercept strings. This function also treats Chinese characters as two character lengths. functions starting with mb _ are really easy to use.
I will not write the specific code, because it is written for a requirement and is not made into a common form. One day I have time to make it generic and then release it.
In addition, FireFox does not support the text-overflow attribute, otherwise it will not have to be truncated as hard as the background. If you have a better method, please feel free to raise it! I am very grateful.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More