One study of Cuyahoga code: Cuyahoga. Corel Project

Source: Internet
Author: User

Text. CSS defines a class for text processing. It has only one static method, truncatetext (string Fulltext, int numberofcharacters)
Source codeAs follows:

Public   Static   String Truncatetext ( String Fulltext, Int Numberofcharacters)
{
String Text;
If (Fulltext. Length > Numberofcharacters)
{
Int Spacepos = Fulltext. indexof ( "   " , Numberofcharacters );
If (Spacepos >   - 1 )
{
Text=Fulltext. substring (0, Spacepos)+ "";
}
Else
{
Text=Fulltext;
}
}
Else
{
Text=Fulltext;
}
RegEx regexstriphtml =   New RegEx ( " <[^>] +> " , Regexoptions. ignorecase | Regexoptions. Compiled );
Text = Regexstriphtml. Replace (text, "   " );
Return Text;
}

This static method receives two parameters. The first is the string to be processed (this string may be in HTML format), and the second is the number of characters to be retained. The processing process is as follows:
1. First, determine whether the length is greater than the number of characters to be retained.
2. if the length of the string to be processed is greater than the number of characters to be retained, search for the next space character starting from the index position of the number of characters to be retained, and then retrieve the character from the first character to the space character
3. Remove the HTML tag from the string using a regular expression.
Here are two tips:
The first int spacepost = Fulltext. indexof ("", numberofcharacters)
This removes the first space from a certain position. The purpose of this space is to break the word. For example, to get 25 characters, however, a word "programe" has appeared since 23rd characters. The problem now is that if you hard get the first 25 characters, the word "programe" will be removed. The method here is flexible, delay the position to the end of the word so that it can be complete.
The second is the regular expression "<[^>] +>". This expression represents all the tags. The ^> In the middle indicates all characters except>, in addition, the "+" at the end limits <> there must be a character. After replacing the regular expression, the HTML mark in the retrieved string will be removed again.

The above method is good, but I think it is not perfect. For general HTML text with non-paragraph format, he can handle it well, because the HTML mark is missing, the format of formatted text is removed. However, if it is a string with a non-text mark (such as IMG, button), the result will be unsatisfactory.
This type of scenario is widely used. I believe many friends who develop news systems have met it, because we need to generate news summaries.
However, the method I suggest is to add a summary column and field as the most flexible and scalable content of the abstract, this allows the editing staff to enter abstract content freely. There are two advantages: first, the abstract content is more accurate, and second, the number of words in the abstract content is easier to control and more flexible)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.