What we are going to bring to you today is aboutRequirements: A paragraph of text interception of a certain physical length display, note that to intercept is not the number of bytes of the string, UFT-8 encoding of the character is 3 bytes or 4 bytes, while the display of Chinese will be two characters in length, English characters only one, full angle and different.
And the data is the HTML code string, such as this:
- < Div class= "AAA">
- < a href= "/aaa.php?" ID = 1 ″ >
- Tom
- < /A >
- Commented on
- < a href= "/aaa.php?" ID = 444 ″ >
- John doe
- < /A >
- Share of
- < a href= "bbb.html">
- An article article a long list of things
- < /A >
- < /div >
PHP HTML interception code is to intercept the DIV tag inside the thing, and to keep the HTML tag, only the text in the processing. For example, I may just intercept "John Doe" the word "li", but if this is put to the front end, "John Doe" front of the A tag is not closed, so after interception to ensure that the HTML syntax is correct.
This problem is really not very good to engage, let me depressed for two days. Note that this is just a string, except that the content is HTML code and there is no DOM. If it is done in the front-end processing, the direct Dom gets, and then the inside of the node processing, and finally the innerHTML and other things out of the output is done. Now can not, have to change a train of thought. The idea of a colleague is this:
Iterates through each character of a string. Set a tag, hit the tag at the beginning of the tag < 1, the next character is not counted, and then hit > and then start counting. In the label inside the string processing, but also to determine whether the current character encoding is likely to be Chinese, in general, PHP UTF-8 encoded in the length of the characters are 3, so if you encounter a Chinese character encoding, it is necessary to skip two not count ... Speaking of which, my head is starting to get bigger. Personally think this method is very uncomfortable, first of all, this exquisite logic is not easy to control, and UFT-8 encoded in Chinese can be produced in the length of 3 or 4, so the rigor of the code is questionable.
My personal idea is to use Tidy (see PHP Manual for specific usage). I studied the Tidy yesterday and found this thing to be very useful. First, convert the string to a Tidy object so that:
- $ Tidy tidy_parse_string
($str, Array (), ' utf8′);
- The last one is to set the encoding, note that
Then get the body in the $tidy (because after the conversion $tidy will automatically add the label):
$body = Tidy_get_body ($tidy);
This time you can use Var_dump to see some $body structure, you will find that it turns each label into a corresponding object, which has the corresponding properties. For example, for example, SDF, some of the properties that correspond to this statement are:
Name=> "a"
Value = "SDF"
child=> array{[0]=> A text node object, value is SDF}
attribute=array{"href" = "#"}
... other properties
As you can see, we can actually deal with the value of the text node below the A-label node, so that the PHP HTML intercept code will not break any HTML integrity. Originally I thought that change the value of the text node in the a tag, and the value of a tag will also change, so I directly return to the a tag corresponding to the value of the node OK, did not think that look, hey, so after processing the text or to spell out the new HTML.
Knowing the structure of the tidy object, everything is done, as long as the traversal of all nodes, for this requirement, is to find the div tag, and then start processing the nodes inside. The code is as follows:
- if (mb_strwidth ($subchild->value,
' utf-8′) >= $len)
- {
- $subchild- > value = Mb_strimwidth
($subchild->value, 0, $len, ' ... ', ' utf-8′ ');
- $trimed _str . = $subchild- > value;
- Break
- }
- Else
- {
- $trimed _str . = $subchild- > value;
- $ Len = $len-mb_strwidth ($subchild->
Value, ' utf-8′ ');
- }
The $subchild inside is a child node. Note that Mb_strwidth is used here to get the string length. Seriously recommend this mb_strwidth, very good, it will be Chinese as a two-character length processing, just meet the demand here! and PHP HTML interception code when the use of Mb_strimwidth, this function will also treat Chinese as a two-character length processing, mb_ the beginning of the function is very useful ah.
Specific PHP HTML intercept code code I will not write it out, because it is written for a demand, did not make a common form. Someday I'll have time to make a general release.
In addition, unfortunately Firefox does not support the Text-overflow attribute, or do not have to go back so hard to truncate. If you have a better way, welcome to the present! greatly appreciated.
http://www.bkjia.com/PHPjc/446217.html www.bkjia.com true http://www.bkjia.com/PHPjc/446217.html techarticle what we will bring to you today is about the need: to intercept a certain amount of physical length display, note that the number of bytes to intercept is not a string, UFT-8 encoding Chinese characters are ...