How to handle Chinese garbled Characters During character conversion

Source: Internet
Author: User

JNI has been used in the past two days. In Java programs, DLL of massive dictionary is called. When using JNI's getstringchars function and newstring function, I encountered a Chinese garbled problem and spent one night. Some documents are summarized as follows:

1. Related Concepts
Java uses 16-bit Unicode encoding (UTF-16) to represent strings, both Chinese and English are 2 bytes; JNI internally uses UTF-8 encoding to represent strings, UTF-8 is Variable Length Encoding Unicode, generally ASCII characters are 1 byte, Chinese is 3 byte; C/C ++ uses the original data, ASCII is a byte, the Chinese character is generally gb2312 encoded and represents a Chinese character in two bytes.

The operation is clear when the concept is clarified. The following describes the direction of the upstream stream separately.

1. Java --> C/C ++

In this case, Java uses a UTF-16-encoded string when calling, JVM passes this string to JNI, C/C ++ to get the input is jstring, at this time, two functions provided by JNI can be used. One is getstringutfchars, which will get a UTF-8-encoded string, and the other is getstringchars, which will get a UTF-16-encoded string. Regardless of the function, if the string contains Chinese characters, it must be further converted to gb2312 encoding. As follows:
String
UTF-16)
|
[Java] |
-------------------- JNI call
[CPP] |
V
Jstring
UTF-16)
|
+ -------- + --------- +
| Getstringchars | getstringutfchars
|
V v
Wchar_t * char *
(Utf_16) (UTF-8)

2. C/C ++ --> JAVA

The string that JNI returns to Java, C/C ++ should first take charge of turning this string into UTF-8 or UTF-16 format, and then encapsulate it into jstring through newstringutf or newstring, return to Java.

String
UTF-16)
^
|
[Java] |
-------------------- JNI returned
[CPP] |
Jstring
UTF-16)
^
|
+ -------- + --------- +
^
|
| Newstring | newstringutf
Wchar_t * char *
(Utf_16) (UTF-8)

If the string does not contain Chinese characters, but only the standard ASCII code, you can use getstringutfchars/newstringutf, because in this case, the UTF-8 encoding and ASCII encoding are consistent, conversion is not required.

However, if a string contains Chinese characters, encoding and conversion in the C/C ++ Section is required. We need two conversion functions: encode utf8/16 to gb2312, and convert gb2312 to utf8/16.

It should be noted that both Linux and Win32 support wchar, which is in fact a 16-bit Unicode code UTF16. Therefore, if the wchar type is fully used in our C/C ++ program, in theory, this type of conversion is not required. However, in fact, we cannot completely replace char with wchar, so for most applications, conversion is still necessary.

 

II. A Conversion Method

Use the wide char type for conversion.

Char * jstringtowindows (jnienv * ENV, jstring jstr)
{// Utf8/16 to gb2312
Int length = (ENV)-> getstringlength (jstr );
Const jchar * jcstr = (ENV)-> getstringchars (jstr, 0 );
Char * RTN = (char *) malloc (length * 2 + 1 );
Int size = 0;
Size = widechartomultibyte (cp_acp, 0, (lpcwstr) jcstr, length, RTN, (length * 2 + 1), null, null );
If (size <= 0)
Return NULL;
(ENV)-> releasestringchars (jstr, jcstr );
RTN [size] = 0;
Return RTN;
}

Jstring windowstojstring (jnienv * ENV, const char * Str)
{// Convert gb2312 to utf8/16
Jstring RTN = 0;
Int slen = strlen (STR );
Unsigned short * buffer = 0;
If (slen = 0)
RTN = (ENV)-> newstringutf (STR );
Else
{
Int length = multibytetowidechar (cp_acp, 0, (lpcstr) STR, slen, null, 0 );
Buffer = (unsigned short *) malloc (length * 2 + 1 );
If (multibytetowidechar (cp_acp, 0, (lpcstr) STR, slen, (lpwstr) buffer, length)> 0)
RTN = (ENV)-> newstring (jchar *) buffer, length );
}
If (buffer)
Free (buffer );
Return RTN;
}

 

Supplement (header files to be included ):

/* Contains */
# Include <stdio. h>
# Include <stdlib. h>
# Include <malloc. h>
# Include <memory. h>
# Include <windows. h>

 

 

 

Http://hi.baidu.com/menghaisheng/blog/item/33e19cfc5a237354d6887d07.html

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.