Requirement: will intercept a paragraph of text to a certain physical length display, note, to intercept is not the number of bytes of the string, the UFT-8 of the encoding of Chinese characters is 3 bytes or 4 bytes, when displayed, the Chinese character occupies two characters in length, and the English character only occupies one character, and the full angle is different. The data is an HTML code string, for example:
Zhang San commented on a long string of things in an article shared by Li Si
To intercept the content inside the p tag, and retain the HTML tag, you only need to process the text in the tag. For example, I may just intercept the word "li" from "Li Si", but if I put it on the front end, the tag in front of "Li Si" is not closed, therefore, make sure that the HTML syntax is correct after the interception.
This problem is indeed not very good, and I have been depressed for two days. Please note that this is only a string, but the content is HTML code and there is no DOM. If the processing is done at the front end, you can get the DOM directly, then process the nodes in it, and finally output the innerHTML and so on. It's not going to work now. you have to change your mind. My colleague thought like this:
Traverses each character of a string. Set a tag. when a tag starts to be marked <, it is set to 1. the subsequent characters are not counted, and then the count starts after the tag is met>. When processing the character string inside the tag, we must first judge whether the current character encoding is possible Chinese, in general PHP UTF-8 encoding of Chinese characters are 3, so if you encounter a Chinese character encoding, you must skip two non-records ...... Here, I started to grow my head. I personally think this method is very uncomfortable, first of all this exquisite logic is not easy to control, and the length of the Chinese produced in the UFT-8 code may be 3 or 4 so the rigor of the code is questionable.
My personal idea is to use Tidy (refer to the PHP Manual for specific usage ). I studied the Tidy yesterday and found that it is quite useful. First, convert the string to a Tidy object, as shown in the following code:
$ Tidy = tidy_parse_string ($ str, array (), 'utf8'); // The last one is set encoding. Note that utf8 is used, not UTF-8, and there is no intermediate line.
Then obtain the body in $ tidy (because $ tidy is automatically added after conversion ).):
$ Body = tidy_get_body ($ tidy );
At this time, you can use var_dump to view some $ body structures. it will find that each label is converted into a corresponding object with corresponding attributes. For example, for sdf, some attributes of such a statement are:
Name => ""
Value => "sdf"
Child => array {[0] => a text node object whose value is sdf}
Attribute = array {"href" => "#"}
..... Other attributes
We can see that we can process the value of the text node under the corresponding node of tag a separately, so that no HTML integrity will be damaged. I thought that after the value of the text node in tag a is changed, the value of Tag a will also change. then I directly returned the value of the node corresponding to tag a to be OK, well, after processing the text, you still need to spell out a new HTML by yourself.
After knowing the structure of the Tidy object, everything is easy to do. just traverse all the nodes and find the p tag for this requirement, and then start to process the nodes. The code is as follows:
If (mb_strwidth ($ subchild-> value, 'utf-8') >=$ len)
{
$ Subchild-> value = mb_strimwidth ($ subchild-> value, 0, $ len ,'... ', 'Utf-8 ′);
$ Trimed_str. = $ subchild-> value;
Break;
}
Else
{
$ Trimed_str. = $ subchild-> value;
$ Len = $ len-mb_strwidth ($ subchild-> value, 'utf-8 ′);
}
$ Subchild in it is a subnode. Note that mb_strwidth is used to obtain the string length. We strongly recommend this mb_strwidth, which is very useful. it treats Chinese as two characters in length, just in line with the requirements here! In addition, mb_strimwidth is used to intercept strings. this function also treats Chinese characters as two character lengths. functions starting with mb _ are really easy to use.
I will not write the specific code, because it is written for a requirement and is not made into a common form. One day I have time to make it generic and then release it.
In addition, FireFox does not support the text-overflow attribute, otherwise it will not have to be truncated as hard as the background. If you have a better method, please feel free to raise it! I am very grateful.