Transferred from: HTTP://BLOG.SINA.COM.CN/FANGAOSJTU
These two days are learning to use JNI, in a Java program, a DLL that calls a mass dictionary. The use of JNI getstringchars function and newstring function, encountered the problem of Chinese garbled, tossing a night. Some information has been consulted and summarized as follows:
I. Related Concepts
- Inside Java is the use of 16bit Unicode encoding (UTF-16) to represent strings, whether Chinese or English is 2 bytes;
- The internal JNI is the use of UTF-8 encoding to represent strings, UTF-8 is a variable length encoding Unicode, the general ASCII character is 1 bytes, Chinese is 3 bytes;
- C + + uses raw data, ASCII is a byte, Chinese is generally GB2312 encoded, with two bytes to represent a Chinese character.
Clear the concept, the operation is more clear. Below, according to the direction of the character stream to explain separately
1, Java--C + +
In this case, Java calls using a UTF-16 encoded string, the JVM will pass this string to jni,c/c++ to get input is jstring, this time, you can take advantage of the two types of JNI functions, one is Getstringutfchars, This function will get a UTF-8 encoded string, and the other is Getstringchars, which will get UTF-16 encoded string. Regardless of that function, the resulting string, if contained in Chinese, needs to be further converted into GB2312 encoding. As follows:
String
(UTF-16)
|
[Java] |
--------------------JNI Calls
[CPP] |
V
Jstring
(UTF-16)
|
+--------+---------+
| Getstringchars | Getstringutfchars
| |
V V
wchar_t* char*
(UTF_16) (UTF-8)
2, C + +--Java
JNI returns the string to Java, which should be the first responsibility to convert the string into a UTF-8 or UTF-16 format, and then encapsulate it as jstring by Newstringutf or newstring and return it to Java.
String
(UTF-16)
^
|
[Java] |
--------------------JNI return
[CPP] |
Jstring
(UTF-16)
^
|
+--------+---------+
^ ^
| |
| newstring | Newstringutf
wchar_t* char*
(UTF_16) (UTF-8)
If the string does not contain Chinese characters, only the standard ASCII code, then the use of Getstringutfchars/newstringutf can be done, because in this case, the UTF-8 encoding and ASCII encoding is consistent, do not need to convert.
However, if there are Chinese characters in the string, then the encoding conversion in the C + + section is a must. We need two conversion functions, one is to turn the code of UTF8/16 into GB2312, and the other is to turn GB2312 into UTF8/16.
Here's a note: both Linux and Win32 support WCHAR, which is in fact a Unicode encoding UTF16 width of 16bit, so if we use the WCHAR type completely in our C/w + + program, this conversion is theoretically not required. In practice, however, it is not possible to replace char entirely with WCHAR, so the conversion is still necessary for most applications today.
Two. A conversion method
Use the wide char type to convert.
char* jstringtowindows (jnienv *env, jstring jstr)
{//UTF8/16 converted to gb2312
int length = (env)->getstringlength (JSTR);
Const jchar* JCSTR = (env)->getstringchars (jstr, 0);
char* RTN = (char*) malloc (length*2+1);
int size = 0;
Size = WideCharToMultiByte (CP_ACP, 0, (LPCWSTR) jcstr, Length, RTN, (length*2+1), NULL, NULL);
if (size <= 0)
return NULL;
(env)->releasestringchars (JSTR, JCSTR);
Rtn[size] = 0;
return RTN;
}
Jstring windowstojstring (jnienv* env, const char* STR)
{//gb2312 converted to UTF8/16
Jstring RTN = 0;
int slen = strlen (str);
unsigned short * buffer = 0;
if (Slen = = 0)
RTN = (env)->newstringutf (str);
Else
{
int length = MultiByteToWideChar (CP_ACP, 0, (LPCSTR) str, slen, NULL, 0);
Buffer = (unsigned short *) malloc (length*2 + 1);
if (MultiByteToWideChar (CP_ACP, 0, (LPCSTR) str, Slen, (LPWSTR) buffer, length) >0)
RTN = (env)->newstring ((jchar*) buffer, length);
}
if (buffer)
Free (buffer);
return RTN;
}
The processing method of character conversion in Chinese characters in JNI