Solve the Problems Related to the differences between C # (uncommon words, traditional Chinese characters) and Java URL transcoding GBK results

Source: Internet
Author: User

Business Background:

The server is written in Java, and the client is written in C #. The interaction between the server and the client requests the server through the http protocol. The http request requires that the URL be transcoded as GBK. However, it was found that the server received the GBK character submitted by the client, and occasionally garbled characters. Therefore, the same string is converted to GBK by URL, and only a little bit is different. Example: String: Please refer to the following example: the real estate in the Eastern Region will be released. The transcoding result is as follows: % daS % 82 m % e6 % 82% c5R % e2O % b8 % f3 % bct % 95N % cc % 96% 9ee % 98I % 85% 5e % cc % 96% e4k % b0l % 8fS % 84% a2 % 96% 7c % 85% 5e % 9e % b3 % 98I % 85% 5e % 8c % 8d % 98I % ccm % 84% 93% fcS % e8A % aef % 90a Java transcoding result: % DA % 53% 82% 6D % E6 % 82% C5 % 52% E2 % 4F % B8 % F3 % BC % 74% 95% 4E 4E % CC % 96% 9E % 65% 98% 49% 5E % CC % 96% E4 % 6B % B0 % 6C % 8F % 53% 84% A2 % 96% 7C % 85% 5E % 9E % B3 % 98% 49% 85% 5E % 8C % 8D % 98% 49% CC % 6D % 84% 93% FC % 53% E8 % 41% AE % 66% 61 comparison results: I haven't sent the message for a long time. What is the problem, finally checked the GBK character range http://blog.csdn.net/gaoqingyu/article/details/5709958 found that JAVA transfer out is correct, C # transfer out is wrong. Every man's character transferred out of GBK should have four bytecode, C # Some words are transferred out of only three characters (of course every two characters will be combined with % ), staring at the byte code transferred from C # and comparing it with GBK, we found that some of the last letters in C # are greater than F, if the GBK character set does not have the maximum value, it is impossible for F to have NMSWZ and other letters. Assume that the last letter of the three letters must be in hexadecimal notation. Then I tried to turn it around, and it was exactly the same as that of Java. The following is the C # code: 1 public static void ConvertURLGBKEnCode () 2 {3 string str2 = "please refer to the relevant industry regions of the eastern region and their actual business regions "; 4 5 string str = string. empty; 6 string urlEnCodeStr = string. empty; 7 Regex reg = new Regex (@ "[\ u4e00-\ u9fa5]"); // Regular Expression 8 StringBuilder sb1 = new StringBuilder (); 9 for (int I = 0; I <= str2.Length-1; I ++) 10 {11 string tempStr = str2 [I]. toString (); 12 urlEnCodeStr = System. web. httpUtility. urlEncode (tempStr, Encoding. getEncoding ("GBK"); 13 14 // if it is a man, encode the URL and perform hexadecimal conversion 15 if (reg. isMatch (tempStr) 16 {17 // determine the transcoded characters. if there are only four digits, it indicates that the last character must be converted to a hexadecimal string and spliced 18 if (urlEnCodeStr. length <= 4) 19 {20 StringBuilder sb = new StringBuilder (); 21 string firstStr = urlEnCodeStr. substring (0, 3); 22 string LastStr = urlEnCodeStr. substring (3, 1); 23 sb. append (firstStr + "%"); 24 byte [] targetData = Encoding. getEncoding ("GBK "). getBytes (LastStr); 25 for (int j = 0; j <targetData. length; j ++) 26 {27 sb. append (targetData [j]. toString ("x2"); 28} 29 sb1.Append (sb. toString (); 30} 31 else32 {33 sb1.Append (urlEnCodeStr); 34} 35} 36 else37 {38 sb1.Append (urlEnCodeStr); 39} 40} 41 42 Console. writeLine (sb1.ToString (); 43 44 Console. readLine (); 45}View Code

 

The problem of uncommon Chinese characters and traditional Chinese characters has been solved, but it is depressing to have just ran some special characters.

Run the above C # code: if (reg. change IsMatch (tempStr) to if (urlEnCodeStr. length> 1) the special characters in GBK can be converted smoothly. However, the conversion of GBK characters is not uncertain.

For example, the symbol "-"

C # Conversion Result: "-"; Java: "-"

Character :"("

C # Conversion Result: "(", but the result of Java running is "% 28". I have checked this left bracket and it is not a character in GBK, it should not be a special character. It should be displayed directly, like letters, numbers, and "-". But I don't know why, Java will convert it to "% 28 ".

The conversion of GBK characters is still to be studied.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.