Get Unicode encoding
Package com.xs.test;
public class Test {public
static void Main (string[] args) throws Exception {
int decimal = (int) ');
System.out.println (decimal); Unicode decimal encoding
String hex = integer.tohexstring (decimal);
System.out.println (hex); Unicode hexadecimal encoding
System.out.println ("Medium". Contains ("\u4e2d"));
}
Output results:
20013
4e2d
True
Filter Special characters ZWNJ (zero-width non-joiner)
Character Zwnj is an invisible special character, which is often stored as garbled in the database, so it should be filtered out.
File A.xml contains two special characters ZWNJ:
<?xml version= "1.0" encoding= "UTF-8"?>
<body> People's Republic of China </body>
Filter out character Zwnj:
Package com.xs.test;
Import Java.io.File;
Import java.io.IOException;
Import Org.apache.commons.io.FileUtils;
public class Test2 {public
static void Main (string[] args) throws IOException {
String path = Test2.class.getReso Urce ("A.xml"). GetFile ();
String content = fileutils.readfiletostring (new File (path), "UTF-8");
SYSTEM.OUT.PRINTLN (content);
Char C1 = Content.charat (Content.indexof (' Hua ') + 1);
int unicode1 = (int) C1;
String hexUnicode1 = integer.tohexstring (unicode1);
System.out.println (hexUnicode1);
char C2 = Content.charat (Content.indexof (' min ') + 1);
int unicode2 = (int) c2;
String HexUnicode2 = integer.tohexstring (unicode2);
System.out.println (HEXUNICODE2);
System.out.println ("-------------------------after Filtration");
System.out.println (Content.replaceall ("\u200c", ""));
}
Output results:
<?xml version= "1.0" encoding= "UTF-8"?>
<body> Chinese People's Republic </body>
200c
200c
------ After-------filtration------------
<?xml version= "1.0" encoding= "UTF-8"?>
<body> People's Republic of China </body>