Conversion between a hexadecimal Unicode encoded string and a Chinese String
The URL obtained from a library client project is as follows:
String baseurl = "http://innopac.lib.xjtu.edu.cn/availlim/search~S1*chx?/X{u848B}{u4ECB}{u77F3}&searchscope=1&SORT=DZ/X{u848B}{u4ECB}{u77F3}&searchscope=1&SORT=DZ&extended=0&SUBKEY=%E8%92%8B%E4%BB%8B%E7%9F%B3/51%2C607%2C607%2CB/browse"
If you directly use this URL to send an httpget request, an exception is thrown: invalid characters. That is, the URL cannot contain {}
{} What is the content in the brackets, and finally found that it is the hexadecimal Unicode encoding of Chinese characters, the above {u848B} {u4ECB} {u77F3} is the Chinese character "Chiang Kai-shek ".
In this case, you need to convert the hexadecimal Unicode encoded string into a Chinese string. The Code is as follows:
/*** Convert a Chinese String to a hexadecimal Unicode encoded String ** @ param s * Chinese String * @ return */public static String stringToUnicode (String s) {String str = ""; for (int I = 0; I <s. length (); I ++) {int ch = (int) s. charAt (I); if (ch> 255) str + = "\ u" + Integer. toHexString (ch); elsestr + = "\" + Integer. toHexString (ch);} return str;}/*** convert a hexadecimal Unicode encoded string to a Chinese string and convert \ u848B \ u4ECB \ u77F3 to Chiang Kai-shek, note the format ** @ param str * eg: \ u848B \ u4ECB \ u77F3 * @ return Chiang Kai-shek */public static String unicodeToString (String str) {Pattern pattern = Pattern. compile ("(\ u (\ p {XDigit} {4})"); Matcher matcher = pattern. matcher (str); char ch; while (matcher. find () {ch = (char) Integer. parseInt (matcher. group (2), 16); str = str. replace (matcher. group (1), ch + "");} return str ;}
Then, it is easy to process the URL. First, replace "}" in the URL with "", and then replace "{" with "\". then, convert \ u848B \ u4ECB \ u77F3 into Chinese characters.
<Pre name = "code" class = "java">/*** replace {} in the URL \, then convert Unicode to Chinese characters ** @ param baseUrl * String baseurl = * "http://innopac.lib.xjtu.edu.cn/availlim/search ~ S1 * chx? /X {u848B} {u4ECB} {u77F3} & searchscope = 1 & SORT = DZ/X {u848B} {u4ECB} {u77F3} & searchscope = 1 & SORT = DZ & extended = 0 & SUBKEY = % E8 % 92% 8B % E4 % BB % 8B % E7 % 9F % B3/51% 2C607% 2C607% 2CB/browse "*; * @ return */public static String replaceUni2Chinese (String baseUrl) {Log. d (TAG, "original URL -->" + baseUrl); if (baseUrl. contains ("{") {Log. d (TAG, "the original URL contains Chinese characters"); String removeLast = baseUrl. replace ("}", ""); // System. out. println ("Remove parentheses -->" + removeLast); String replaceBefore = removeLast. replace ("{", "\"); // System. out. println ("Replace the brackets -->" + replaceBefore); String result = unicodeToString (replaceBefore); Log. d (TAG, "After unicode is converted to a string: -->" + result); return result;} else {Log. d (TAG, "no Chinese characters in the original URL"); return baseUrl ;}}