Character escape Technology

Source: Internet
Author: User

In the project test phase, the tester will enter some special characters, such as: <Table>, <. /or & lt; such characters, the page will have an error, if it is an export, the exported excel will also have a problem, or the pages that are directly output will convert the & lt;, & gt;, & amp; and & nbsp; entered by the user into <,>, &, and spaces, the reason is that Java code does not escape special characters.

Because <,>, and & in HTML have special meanings (the first two characters are used for the link sign and "escape" characters), they cannot be used directly. When these three characters are used, their escape sequences should be used.

& Amp or & #38;
& Lt; & #60;
>'S definition order column & gt; & #62;
The former is the character escape sequence, and the latter is the numerical escape sequence.
For example, & lt; font & gt; is displayed as <font>. If it is written directly, it is considered as a link signature.
Note:
A. There cannot be spaces between characters in the escape sequence;
B. The escape sequence must end;
C. Separate & is not considered as the start of escape.
D. Case Sensitive
Another character that needs to be escaped is double quotation marks ("). Its Turn-to-order column is & quot; or & #34;
Note that you must escape &. Some friends only escape <,> or even quotation marks, but do not escape &. & is the beginning of HTML Escape characters, if you use a character similar to "<" in the XML document, the parser will encounter an error because the parser considers this as the beginning of a new element. Therefore, escape & is required.
The solution is to define tools.

/**
* Replace some specified characters in a string
* @ Param strdata string original string
* @ Param RegEx string the string to be replaced
* @ Param replacement string substitution string
* @ Return string the replaced string
*/
Public static string replacestring (string strdata, string RegEx,
String replacement)
{
If (strdata = NULL)
{
Return NULL;
}
Int index;
Index = strdata. indexof (RegEx );
String strnew = "";
If (index> = 0)
{
While (index> = 0)
{
Strnew + = strdata. substring (0, index) + replacement;
Strdata = strdata. substring (index + RegEx. Length ());
Index = strdata. indexof (RegEx );
}
Strnew + = strdata;
Return strnew;
}
Return strdata;
}

/**
* Replace special characters in a string
*/
Public static string encodestring (string strdata)
{
If (strdata = NULL)
{
Return "";
}
Strdata = replacestring (strdata, "&", "& amp ;");
Strdata = replacestring (strdata, "<", "& lt ;");
Strdata = replacestring (strdata, ">", "& gt ;");
Strdata = replacestring (strdata, "'", "& apos ;");
Strdata = replacestring (strdata, "/" "," & quot ;");
Return strdata;
}

/**
* Restores special characters in a string.
*/
Public static string decodestring (string strdata)
{
Strdata = replacestring (strdata, "& lt;", "<");
Strdata = replacestring (strdata, "& gt;", "> ");
Strdata = replacestring (strdata, "& apos ;","'");
Strdata = replacestring (strdata, "& quot ;","/"");
Strdata = replacestring (strdata, "& amp ;","&");
Return strdata;
}
The first function, replacestring, escapes a single character. The second function, encodestring, contains the following special characters: &, <,>, ', And/. The third function, decodestring, restores these special characters.
Therefore, you can call tools. encodestring () to escape the code.
It should be noted that exporting Excel is special, because if the content like <Table> is reserved in Excel, it will cause confusion in the Excel table, if you use this tool class for escape, the content like & lt; Table & gt; will be displayed, which will be considered as a string, and it is not good to escape such a string. After testing, if there are "& lt; & gt; & amp; & nbsp;" or other HTML tags, excel considers this content as a string and does not cause errors, so I can rest assured that only content like <Table> will cause errors. So how can we solve this problem? Since only <and> can cause errors, why not convert the half angle brackets into full angle brackets? After testing, the discovery can be solved, and it is a very clever method, so the solution is found in this way.
Another point is that no matter how many consecutive spaces the user inputs, the display on the page is a space, which is inherent in HTML, because HTML only recognizes & nbsp; as a standard space, the solution is to escape common spaces as & nbsp;, but it involves a wide range and has a great influence, which is a factor we must consider, it is not only like the display tag and custom tag, but also the struts tag must be modified. As a result, the workload is heavy and I don't know if it will cause such a problem, it does not affect normal operation, so after repeated consideration, I decided not to modify it.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.