The processing method of character conversion in Chinese characters in JNI

Last Update:2015-09-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Transferred from: HTTP://BLOG.SINA.COM.CN/FANGAOSJTU

These two days are learning to use JNI, in a Java program, a DLL that calls a mass dictionary. The use of JNI getstringchars function and newstring function, encountered the problem of Chinese garbled, tossing a night. Some information has been consulted and summarized as follows:

I. Related Concepts

Inside Java is the use of 16bit Unicode encoding (UTF-16) to represent strings, whether Chinese or English is 2 bytes;
The internal JNI is the use of UTF-8 encoding to represent strings, UTF-8 is a variable length encoding Unicode, the general ASCII character is 1 bytes, Chinese is 3 bytes;
C + + uses raw data, ASCII is a byte, Chinese is generally GB2312 encoded, with two bytes to represent a Chinese character.

Clear the concept, the operation is more clear. Below, according to the direction of the character stream to explain separately

1, Java--C + +

In this case, Java calls using a UTF-16 encoded string, the JVM will pass this string to jni,c/c++ to get input is jstring, this time, you can take advantage of the two types of JNI functions, one is Getstringutfchars, This function will get a UTF-8 encoded string, and the other is Getstringchars, which will get UTF-16 encoded string. Regardless of that function, the resulting string, if contained in Chinese, needs to be further converted into GB2312 encoding. As follows:

String
(UTF-16)
|
[Java] |
--------------------JNI Calls
[CPP] |
V
Jstring
(UTF-16)
|
+--------+---------+
| Getstringchars | Getstringutfchars
| |
V V
wchar_t* char*
(UTF_16) (UTF-8)

2, C + +--Java

JNI returns the string to Java, which should be the first responsibility to convert the string into a UTF-8 or UTF-16 format, and then encapsulate it as jstring by Newstringutf or newstring and return it to Java.

String
(UTF-16)
^
|
[Java] |
--------------------JNI return
[CPP] |
Jstring
(UTF-16)
^
|
+--------+---------+
^                  ^
| |
| newstring | Newstringutf
wchar_t* char*
(UTF_16) (UTF-8)

If the string does not contain Chinese characters, only the standard ASCII code, then the use of Getstringutfchars/newstringutf can be done, because in this case, the UTF-8 encoding and ASCII encoding is consistent, do not need to convert.

However, if there are Chinese characters in the string, then the encoding conversion in the C + + section is a must. We need two conversion functions, one is to turn the code of UTF8/16 into GB2312, and the other is to turn GB2312 into UTF8/16.

Here's a note: both Linux and Win32 support WCHAR, which is in fact a Unicode encoding UTF16 width of 16bit, so if we use the WCHAR type completely in our C/w + + program, this conversion is theoretically not required. In practice, however, it is not possible to replace char entirely with WCHAR, so the conversion is still necessary for most applications today.

Two. A conversion method

Use the wide char type to convert.

char* jstringtowindows (jnienv *env, jstring jstr)
{//UTF8/16 converted to gb2312
int length = (env)->getstringlength (JSTR);
Const jchar* JCSTR = (env)->getstringchars (jstr, 0);
char* RTN = (char*) malloc (length*2+1);
int size = 0;
Size = WideCharToMultiByte (CP_ACP, 0, (LPCWSTR) jcstr, Length, RTN, (length*2+1), NULL, NULL);
if (size <= 0)
return NULL;
(env)->releasestringchars (JSTR, JCSTR);
Rtn[size] = 0;
return RTN;
}

Jstring windowstojstring (jnienv* env, const char* STR)
{//gb2312 converted to UTF8/16
Jstring RTN = 0;
int slen = strlen (str);
unsigned short * buffer = 0;
if (Slen = = 0)
RTN = (env)->newstringutf (str);
Else
{
int length = MultiByteToWideChar (CP_ACP, 0, (LPCSTR) str, slen, NULL, 0);
Buffer = (unsigned short *) malloc (length*2 + 1);
if (MultiByteToWideChar (CP_ACP, 0, (LPCSTR) str, Slen, (LPWSTR) buffer, length) >0)
RTN = (env)->newstring ((jchar*) buffer, length);
}
if (buffer)
Free (buffer);
return RTN;
}

The processing method of character conversion in Chinese characters in JNI

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The processing method of character conversion in Chinese characters in JNI

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support