Business Background:
The server is written in Java, and the client is written in C #. The interaction between the server and the client requests the server through the http protocol. The http request requires that the URL be transcoded as GBK. However, it was found that the server received the GBK character submitted by the client, and occasionally garbled characters. Therefore, the same string is converted to GBK by URL, and only a little bit is different. Example: String: Please refer to the following example: the real estate in the Eastern Region will be released. The transcoding result is as follows: % daS % 82 m % e6 % 82% c5R % e2O % b8 % f3 % bct % 95N % cc % 96% 9ee % 98I % 85% 5e % cc % 96% e4k % b0l % 8fS % 84% a2 % 96% 7c % 85% 5e % 9e % b3 % 98I % 85% 5e % 8c % 8d % 98I % ccm % 84% 93% fcS % e8A % aef % 90a Java transcoding result: % DA % 53% 82% 6D % E6 % 82% C5 % 52% E2 % 4F % B8 % F3 % BC % 74% 95% 4E 4E % CC % 96% 9E % 65% 98% 49% 5E % CC % 96% E4 % 6B % B0 % 6C % 8F % 53% 84% A2 % 96% 7C % 85% 5E % 9E % B3 % 98% 49% 85% 5E % 8C % 8D % 98% 49% CC % 6D % 84% 93% FC % 53% E8 % 41% AE % 66% 61 comparison results: I haven't sent the message for a long time. What is the problem, finally checked the GBK character range http://blog.csdn.net/gaoqingyu/article/details/5709958 found that JAVA transfer out is correct, C # transfer out is wrong. Every man's character transferred out of GBK should have four bytecode, C # Some words are transferred out of only three characters (of course every two characters will be combined with % ), staring at the byte code transferred from C # and comparing it with GBK, we found that some of the last letters in C # are greater than F, if the GBK character set does not have the maximum value, it is impossible for F to have NMSWZ and other letters. Assume that the last letter of the three letters must be in hexadecimal notation. Then I tried to turn it around, and it was exactly the same as that of Java. The following is the C # code: 1 public static void ConvertURLGBKEnCode () 2 {3 string str2 = "please refer to the relevant industry regions of the eastern region and their actual business regions "; 4 5 string str = string. empty; 6 string urlEnCodeStr = string. empty; 7 Regex reg = new Regex (@ "[\ u4e00-\ u9fa5]"); // Regular Expression 8 StringBuilder sb1 = new StringBuilder (); 9 for (int I = 0; I <= str2.Length-1; I ++) 10 {11 string tempStr = str2 [I]. toString (); 12 urlEnCodeStr = System. web. httpUtility. urlEncode (tempStr, Encoding. getEncoding ("GBK"); 13 14 // if it is a man, encode the URL and perform hexadecimal conversion 15 if (reg. isMatch (tempStr) 16 {17 // determine the transcoded characters. if there are only four digits, it indicates that the last character must be converted to a hexadecimal string and spliced 18 if (urlEnCodeStr. length <= 4) 19 {20 StringBuilder sb = new StringBuilder (); 21 string firstStr = urlEnCodeStr. substring (0, 3); 22 string LastStr = urlEnCodeStr. substring (3, 1); 23 sb. append (firstStr + "%"); 24 byte [] targetData = Encoding. getEncoding ("GBK "). getBytes (LastStr); 25 for (int j = 0; j <targetData. length; j ++) 26 {27 sb. append (targetData [j]. toString ("x2"); 28} 29 sb1.Append (sb. toString (); 30} 31 else32 {33 sb1.Append (urlEnCodeStr); 34} 35} 36 else37 {38 sb1.Append (urlEnCodeStr); 39} 40} 41 42 Console. writeLine (sb1.ToString (); 43 44 Console. readLine (); 45}View Code
The problem of uncommon Chinese characters and traditional Chinese characters has been solved, but it is depressing to have just ran some special characters.
Run the above C # code: if (reg. change IsMatch (tempStr) to if (urlEnCodeStr. length> 1) the special characters in GBK can be converted smoothly. However, the conversion of GBK characters is not uncertain.
For example, the symbol "-"
C # Conversion Result: "-"; Java: "-"
Character :"("
C # Conversion Result: "(", but the result of Java running is "% 28". I have checked this left bracket and it is not a character in GBK, it should not be a special character. It should be displayed directly, like letters, numbers, and "-". But I don't know why, Java will convert it to "% 28 ".
The conversion of GBK characters is still to be studied.