Details on how to implement php html code Truncation

Source: Internet
Author: User
Tags tidy

What we will bring to you today is aboutRequirement: will intercept a paragraph of text to a certain physical length display, note, to intercept is not the number of bytes of the string, the UFT-8 of the encoding of Chinese characters is 3 bytes or 4 bytes, when displayed, the Chinese character occupies two characters in length, and the English character only occupies one character, and the full angle is different.

The data is an HTML code string, for example:

 
 
  1. <Div class = "aaa">
  2. <A href = "/aaa. php? Id = 1 ">
  3. Zhang San
  4. </A>
  5. Commented
  6. <A href = "/aaa. php? Id = 444 ">
  7. Li Si
  8. </A>
  9. Shared
  10. <A href=”bbb.html ">
  11. A long string of articles
  12. </A>
  13. </Div>

When intercepting code in php html, You need to intercept the content inside the div tag and keep the HTML Tag, only to process the text. For example, I may just intercept the word "Li" from "Li Si", but if I put it on the front end, the tag in front of "Li Si" is not closed, therefore, make sure that the HTML syntax is correct after the interception.
This problem is indeed not very good, and I have been depressed for two days. Please note that this is only a string, but the content is HTML code and there is no DOM. If the processing is done at the front end, you can get the DOM directly, then process the nodes in it, and finally output the innerHTML and so on. It's not going to work now. You have to change your mind. My colleague thought like this:

Traverses each character of a string. Set a tag. When a tag starts to be marked <, it is set to 1. The subsequent characters are not counted, and then the count starts after the tag is met>. When processing the character string inside the tag, we must first judge whether the current character encoding is possible Chinese, in general PHP UTF-8 encoding of Chinese characters are 3, so if you encounter a Chinese character encoding, you must skip two non-records ...... Here, I started to grow my head. I personally think this method is very uncomfortable, first of all this exquisite logic is not easy to control, and the length of the Chinese produced in the UFT-8 code may be 3 or 4 so the rigor of the Code is questionable.

My personal idea is to use Tidy (refer to the PHP manual for specific usage ). I studied the Tidy yesterday and found that it is quite useful. First, convert the string to a Tidy object, as shown in the following code:

 
 
  1. $ Tidy = tidy_parse_string
    ($ Str, array (), 'utf8 ′);
  2. // The last one is encoding. Note that,
    This is utf8, not UTF-8, and there is no link in the middle.

Then obtain the body in $ tidy (because after the conversion, $ tidy automatically adds the $ Body = tidy_get_body ($ tidy );
At this time, you can use var_dump to view some $ body structures. It will find that each label is converted into a corresponding object with corresponding attributes. For example, for <a href = "#"> sdf </a>, the attributes of such a statement include:

Name => ""
Value => "<a href =" # "> sdf </a>"
Child => array {[0] => A text Node object whose value is sdf}
Attribute = array {"href" => "#"}
..... Other attributes

As you can see, we can process the value of the text node under the corresponding node of the tag separately, so that the php html code will not damage any HTML integrity. I thought that after the value of the text node in tag a is changed, the value of tag a will also change. Then I directly returned the value of the node corresponding to tag a to be OK, well, after processing the text, you still need to spell out a new HTML by yourself.
After knowing the structure of the Tidy object, everything is easy to do. Just traverse all the nodes and find the div label for this requirement, and then start to process the nodes. The Code is as follows:

 
 
  1. if(mb_strwidth($subchild->value, 
    ‘utf-8′) >= $len)   
  2. {   
  3. $subchild->value = mb_strimwidth
    ($subchild->value, 0, $len, ‘…', ‘utf-8′);   
  4. $trimed_str .= $subchild->value;   
  5. break;   
  6. }   
  7. else   
  8. {   
  9. $trimed_str .= $subchild->value;   
  10. $len = $len - mb_strwidth($subchild->
    value, ‘utf-8′);   
  11. }  


$ Subchild in it is a subnode. Note that mb_strwidth is used to obtain the string length. We strongly recommend this mb_strwidth, which is very useful. It treats Chinese as two characters in length, just in line with the requirements here! In addition, mb_strimwidth is used when php html intercepts code. This function also treats Chinese characters as two character lengths. functions starting with mb _ are really easy to use.

I will not write the specific PHP HTML code, because it is written for a requirement and is not made into a common form. One day I have time to make it generic and then release it.
In addition, FireFox does not support the text-overflow attribute, otherwise it will not have to be truncated as hard as the background. If you have a better method, please feel free to raise it! I am very grateful.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.