JNI has been used in the past two days. In Java programs, DLL of massive dictionary is called. When using JNI's getstringchars function and newstring function, I encountered a Chinese garbled problem and spent one night. Some documents are summarized as follows:
1. Related Concepts
Java uses 16-bit Unicode encoding (UTF-16) to represent strings, both Chinese and English are 2 bytes; JNI internally uses UTF-8 encoding to represent strings, UTF-8 is Variable Length Encoding Unicode, generally ASCII characters are 1 byte, Chinese is 3 byte; C/C ++ uses the original data, ASCII is a byte, the Chinese character is generally gb2312 encoded and represents a Chinese character in two bytes.
The operation is clear when the concept is clarified. The following describes the direction of the upstream stream separately.
1. Java --> C/C ++
In this case, Java uses a UTF-16-encoded string when calling, JVM passes this string to JNI, C/C ++ to get the input is jstring, at this time, two functions provided by JNI can be used. One is getstringutfchars, which will get a UTF-8-encoded string, and the other is getstringchars, which will get a UTF-16-encoded string. Regardless of the function, if the string contains Chinese characters, it must be further converted to gb2312 encoding. As follows:
String
UTF-16)
|
[Java] |
-------------------- JNI call
[CPP] |
V
Jstring
UTF-16)
|
+ -------- + --------- +
| Getstringchars | getstringutfchars
|
V v
Wchar_t * char *
(Utf_16) (UTF-8)
2. C/C ++ --> JAVA
The string that JNI returns to Java, C/C ++ should first take charge of turning this string into UTF-8 or UTF-16 format, and then encapsulate it into jstring through newstringutf or newstring, return to Java.
String
UTF-16)
^
|
[Java] |
-------------------- JNI returned
[CPP] |
Jstring
UTF-16)
^
|
+ -------- + --------- +
^
|
| Newstring | newstringutf
Wchar_t * char *
(Utf_16) (UTF-8)
If the string does not contain Chinese characters, but only the standard ASCII code, you can use getstringutfchars/newstringutf, because in this case, the UTF-8 encoding and ASCII encoding are consistent, conversion is not required.
However, if a string contains Chinese characters, encoding and conversion in the C/C ++ Section is required. We need two conversion functions: encode utf8/16 to gb2312, and convert gb2312 to utf8/16.
It should be noted that both Linux and Win32 support wchar, which is in fact a 16-bit Unicode code UTF16. Therefore, if the wchar type is fully used in our C/C ++ program, in theory, this type of conversion is not required. However, in fact, we cannot completely replace char with wchar, so for most applications, conversion is still necessary.
II. A Conversion Method
Use the wide char type for conversion.
Char * jstringtowindows (jnienv * ENV, jstring jstr)
{// Utf8/16 to gb2312
Int length = (ENV)-> getstringlength (jstr );
Const jchar * jcstr = (ENV)-> getstringchars (jstr, 0 );
Char * RTN = (char *) malloc (length * 2 + 1 );
Int size = 0;
Size = widechartomultibyte (cp_acp, 0, (lpcwstr) jcstr, length, RTN, (length * 2 + 1), null, null );
If (size <= 0)
Return NULL;
(ENV)-> releasestringchars (jstr, jcstr );
RTN [size] = 0;
Return RTN;
}
Jstring windowstojstring (jnienv * ENV, const char * Str)
{// Convert gb2312 to utf8/16
Jstring RTN = 0;
Int slen = strlen (STR );
Unsigned short * buffer = 0;
If (slen = 0)
RTN = (ENV)-> newstringutf (STR );
Else
{
Int length = multibytetowidechar (cp_acp, 0, (lpcstr) STR, slen, null, 0 );
Buffer = (unsigned short *) malloc (length * 2 + 1 );
If (multibytetowidechar (cp_acp, 0, (lpcstr) STR, slen, (lpwstr) buffer, length)> 0)
RTN = (ENV)-> newstring (jchar *) buffer, length );
}
If (buffer)
Free (buffer );
Return RTN;
}
Supplement (header files to be included ):
/* Contains */
# Include <stdio. h>
# Include <stdlib. h>
# Include <malloc. h>
# Include <memory. h>
# Include <windows. h>
Http://hi.baidu.com/menghaisheng/blog/item/33e19cfc5a237354d6887d07.html