Summary of JNI Chinese processing problems

Source: Internet
Author: User

Due to the working relationship, JNI must be used to call methods and transmit data between C ++ and Java programs. However, JNI used to work in an English environment and is encoded in Chinese (similar to other languages) I am not paying much attention to the problem. I recently took some time to study it and sorted out my experiences as follows for your discussion or reference.
Before further discussion, we need to explain the following basic knowledge:

  1. Inside Java, all string encodings use Unicode, that is, UCS-2. Unicode uses two bytes to indicate the encoding of each character. Unicode has a feature
    Character: it contains all the character fonts in the world. Therefore, the language in each region can establish a unicode ing relationship with Unicode, and Java uses this to achieve conversion between different languages;
  2. UTF-8 is another encoding scheme different from UCS-2/UCS-4, where UTF stands
    Format, which is encoded in a variable length mode. The encoding length can be 1 ~ 3 (it is said that the maximum length is 6 in theory, but I don't understand it ).
    By
    For UCS-2/UCS-4 encoding, the encoding produced string will contain some special characters such as/0 (that is, 0x0, all 0 ~ The first character of the Unicode encoding of 256
    In some cases (such as during transmission or resolution), this will cause us some trouble and waste too much space for ordinary English letters. In addition, it is said that UTF-8 has Unicode not
    (Do not understand !), Therefore, Unicode is often used as an intermediate code for logical representation. For more information about Unicode/UTF-8, see reference 1;

Java Chinese garbled characters may occur in many cases: different applications, different platforms, and so on. However, the above issues have been discussed in a large number of excellent articles, which are not discussed in depth here, for details, see 2, 3, 4, and 5. The following is a brief summary:

  1. When we use the default encoding method to save the source file, the file content is actually encoded and saved according to our system settings. This setting value is file. encoding, which can be obtained through the following program:
    public class Encoding {
    public static void main(String[] args) {
    System.out.println(System.getProperty("file.encoding"));
    }
    }

    When the encoding parameter is not specified by javac, if the region is set incorrectly, the encoding/decoding error may occur. This problem may occur when compiling a file transmitted from another environment;

  2. 2. Although the runtime string exists in the unicode format within Java (that is, during runtime, but the information in the class file is stored in the form of a UTF-8 (Unicode is only used as a logical representation of the intermediate code)
    ;
  3. For Web applications, take Tomcat as an example. The JSP Conversion Tool (jspc) provided by the JSP/servlet engine searches for JSP files using <% @ page
    Contenttype = "text/html;
    Charset = <JSP-charset> "%> the specified charset. If <JSP-charset &
    GT;, the system's default file. Encoding (this value is GBK on the Chinese platform) can be obtained through the regional of the control panel.
    Modify options; jspc is equivalent to "javac-encoding"
    <JSP-charset> "command to explain all the characters in the JSP file, including Chinese and ASCII characters, and then convert these characters into Unicode characters
    , And then converted to the UTF-8 format, save as a Java file.
    I used to accidentally Save the JSP file into a UTF-8, And the charset used inside the file is gb2312, the result is always unable to display the Chinese normally when running, and then transferred to the default encoding method
    Normal. As long as the file storage format is consistent with the charset settings at the beginning of JSP, it can be displayed normally (but the file is saved as a UTF-16, I have not succeeded in the test );
  4. In an XML file, encoding indicates the encoding method of the file. If this parameter is set to be different from the actual encoding method of the file, the decoding may fail, so it should always
    Encoding is set to the same value as the file encoding method, while the charset of JSP/html indicates which character set is used to decode the string read from the file.
    The character string should be understood as a binary or hexadecimal string, which may be mapped to different characters according to different charsets ).
    I have discussed the specific meaning of encoding on the Internet with others: If encoding refers to the encoding method of the file, so how can the application that reads the file correctly interpret the file without knowing the encoding settings?
    As discussed and personally understood, the handler (such as jspc) always reads the input file by ISO8859-1 and then checks the several bytes at the beginning of the file (that is, byte order
    For details about Mark and Bom, refer to Tomcat source code $ source_dir/Jasper/jasper2/src/share/org.
    /Apache/Jasper/xmlparser/xmlencodingdetector. Java's getencodingname method, in the JSP
    Page character of Specification
    The encoding section also describes in detail) the format in which the file is saved. When the encoding option is parsed, if the encoding settings are different from the actual storage format of the file,
    Will try to convert, but this conversion may be in the file is actually encoded in a single byte such as ISO8859-1/UTF-8, and encoding is set to Unicode, UTF-16, and so on
    Encoding Error.

The following focuses on the issues that need to be paid attention to during data transmission between C ++ and Java programs in JNI.

In JNI, jstring adopts UCS-2 encoding, which is consistent with the string encoding method in Java. However, in C ++, the string is Char (8 digits) or
Wchar_t (16-bit, Unicode encoding is the same as jchar, but not all development platforms use Unicode encoding. For details, refer to section 6). The following program proves this (Encoding
Translation environment: vc6 ):

# Include <iostream>
Using namespace STD;

Int main ()
{
Locale LOC ("Chinese-simplified ");
// Locale LOC ("CHS ");
// Locale LOC ("Zhi ");
// Locale LOC (". 936 ");
Wcout. imbue (LOC );

Wcout <L "Chinese" <Endl; // if no l exists, a problem may occur.

Wchar_t wch [] = {0x4e2d, 0x6587, 0x0}; // unicode encoding of the "Chinese" Character
Wcout <wch <Endl;

Return 0;
}

JNI provides several methods to convert between jstring and Char/wchar_t.

jsize GetStringLength(jstring str)
const jchar *GetStringChars(jstring str, jboolean *isCopy)
void ReleaseStringChars(jstring str, const jchar *chars)

In addition, in order to facilitate the transmission and storage of UTF-8, JNI also provides several methods to operate UTF format:

jsize GetStringUTFLength(jstring str)
const char* GetStringUTFChars(jstring str, jboolean *isCopy)
void ReleaseStringUTFChars(jstring str, const char* chars)

Getstringchars returns a unicode encoded string, while getstringutfchars returns a UTF-8 encoded string.
To create a jstring, use the following method:

jstring NewJString( JNIEnv * env, LPCTSTR str )
{
if (!env || !str)
return 0;

int slen = strlen(str);
jchar * buffer = new jchar[slen];
int len = MultiByteToWideChar(CP_ACP, 0, str, strlen(str), buffer, slen);

if (len > 0 && len < slen)
buffer[len] = 0;

jstring js = env->NewString(buffer, len);
delete [] buffer;
return js;
}

To convert a jstring object into a char string array, you can:

int JStringToChar( JNIEnv * env, jstring str, LPTSTR desc, int desc_len )
{
int len = 0;

if (desc == NULL || str == NULL)
return -1;

// Check buffer size
if (env->GetStringLength(str) * 2 + 1 > desc_len)
{
return -2;
}
memset(desc, 0, desc_len);

const wchar_t * w_buffer = env->GetStringChars(str, 0);
len = WideCharToMultiByte(CP_ACP, 0, w_buffer, wcslen(w_buffer) + 1, desc, desc_len, NULL, NULL);
env->ReleaseStringChars(str, w_buffer);

if (len > 0 && len < desc_len)
desc[len] = 0;

return strlen(desc);
}

Of course, according to the above analysis, you can also directly use the returned results of getstringchars as the wchar_t string for operations. Or, if you want to, you can
The getstringutfchars result is converted to a ucs2 encoded string through multibytetowidechar, and then passed
Widechartomultibyte is converted to a multibyte string.

Const char * pstr = env-> getstringutfchars (STR, false );
Int nlen = multibytetowidechar (cp_utf8, 0, pstr,-1, null, null); // get the string length encoded by the UTF-8
Lpwstr lpwsz = new wchar [nlen];
Multibytetowidechar (cp_utf8, 0, pstr,-1, lpwsz, nlen); // The conversion result is a ucs2 encoded string.
Int nlen1 = widechartomultibyte (cp_acp, 0, lpwsz, nlen, null );
Lpstr lpsz = new char [nlen1];
Widechartomultibyte (cp_acp, 0, lpwsz, nlen, lpsz, nlen1, null, null); // convert the ucs2 encoded string into multiple bytes

Cout <"out:" <lpsz <Endl;

Delete [] lpwsz; Delete [] lpsz;

Of course, I believe that few people want or need to do so.
Note that the return value of getstringchars is jchar, while that of getstringutfchars is const char *.
In addition to the above method, when the conversion between jstring and char * is frequently required, we also have an option, that is, the following class. This class is originally named Roger S.
The idea provided by Renault's foreigners is very good, but it is not easy to use, because the author focuses on the UTF format string, but in practice, we often use
ACP (ANSI code page) string. The original author's program is as follows:

class UTFString {
private:

UTFString (); // Default ctor - disallowed

public:

// Create a new instance from the specified jstring
UTFString(JNIEnv* env, const jstring& str) :
mEnv (env),
mJstr (str),
mUtfChars ((char* )mEnv->GetStringUTFChars (mJstr, 0)),
mString (mUtfChars) { }

// Create a new instance from the specified string
UTFString(JNIEnv* env, const string& str) :
mEnv (env),
mString (str),
mJstr (env->NewStringUTF (str.c_str ())),
mUtfChars ((char* )mEnv->GetStringUTFChars (mJstr, 0)) { }

// Create a new instance as a copy of the specified UTFString
UTFString(const UTFString& rhs) :
mEnv (rhs.mEnv),
mJstr (mEnv->NewStringUTF (rhs.mUtfChars)),
mUtfChars ((char* )mEnv->GetStringUTFChars (mJstr, 0)),
mString (mUtfChars) { }

// Delete the instance and release allocated storage
~UTFString() { mEnv->ReleaseStringUTFChars (mJstr, mUtfChars); }

// assign a new value to this instance from the given string
UTFString & operator =(const string& rhs) {
mEnv->ReleaseStringUTFChars (mJstr, mUtfChars);
mJstr = mEnv->NewStringUTF (rhs.c_str ());
mUtfChars = (char* )mEnv->GetStringUTFChars (mJstr, 0);
mString = mUtfChars;
return *this;
}

// assign a new value to this instance from the given char*
UTFString & operator =(const char* ptr) {
mEnv->ReleaseStringUTFChars (mJstr, mUtfChars);
mJstr = mEnv->NewStringUTF (ptr);
mUtfChars = (char* )mEnv->GetStringUTFChars (mJstr, 0);
mString = mUtfChars;
return *this;
}

// Supply operator methods for converting the UTFString to a string
// or char*, making it easy to pass UTFString arguments to functions
// that require string or char* parameters.
string & GetString() { return mString; }
operator string() { return mString; }
operator const char* () { return mString.c_str (); }
operator jstring() { return mJstr; }

private:

JNIEnv* mEnv; // The enviroment pointer for this native method.
jstring mJstr; // A copy of the jstring object that this UTFString represents
char* mUtfChars; // Pointer to the data returned by GetStringUTFChars
string mString; // string buffer for holding the "value" of this instance
};

I changed it:

class JNIString {
private:

JNIString (); // Default ctor - disallowed

public:

// Create a new instance from the specified jstring
JNIString(JNIEnv* env, const jstring& str) :
mEnv (env) {
const jchar* w_buffer = env->GetStringChars (str, 0);
mJstr = env->NewString (w_buffer,
wcslen (w_buffer)); // Deep Copy, in usual case we only need
// Shallow Copy as we just need this class to
// provide some convenience for handling jstring

mChars = new char[wcslen (w_buffer) * 2 + 1];
WideCharToMultiByte (CP_ACP, 0, w_buffer, wcslen (w_buffer) + 1, mChars, wcslen (w_buffer) * 2 + 1,
NULL, NULL);
env->ReleaseStringChars (str, w_buffer);

mString = mChars;
}

// Create a new instance from the specified string
JNIString(JNIEnv* env, const string& str) :
mEnv (env) {
int slen = str.length ();
jchar* buffer = new jchar[slen];
int len = MultiByteToWideChar (CP_ACP, 0, str.c_str (), str.length (), buffer, slen);

if (len > 0 && len < slen)
buffer[len] = 0;

mJstr = env->NewString (buffer, len);
delete [] buffer;

mChars = new char[str.length () + 1];
strcpy (mChars, str.c_str ());

mString.empty ();
mString = str.c_str ();
}

// Create a new instance as a copy of the specified JNIString
JNIString(const JNIString& rhs) :
mEnv (rhs.mEnv) {
const jchar* wstr = mEnv->GetStringChars (rhs.mJstr, 0);
mJstr = mEnv->NewString (wstr, wcslen (wstr));
mEnv->ReleaseStringChars (rhs.mJstr, wstr);

mChars = new char[strlen (rhs.mChars) + 1];
strcpy (mChars, rhs.mChars);

mString = rhs.mString.c_str ();
}

// Delete the instance and release allocated storage
~JNIString() { delete [] mChars; }

// assign a new value to this instance from the given string
JNIString & operator =(const string& rhs) {
delete [] mChars;

int slen = rhs.length ();
jchar* buffer = new jchar[slen];
int len = MultiByteToWideChar (CP_ACP, 0, rhs.c_str (), rhs.length (), buffer, slen);

if (len > 0 && len < slen)
buffer[len] = 0;

mJstr = mEnv->NewString (buffer, len);
delete [] buffer;

mChars = new char[rhs.length () + 1];
strcpy (mChars, rhs.c_str ());

mString = rhs.c_str ();

return *this;
}

// Supply operator methods for converting the JNIString to a string
// or char*, making it easy to pass JNIString arguments to functions
// that require string or char* parameters.
string & GetString() { return mString; }
operator string() { return mString; }
operator const char* () { return mString.c_str (); }
operator jstring() { return mJstr; }

private:

JNIEnv* mEnv; // The enviroment pointer for this native method.
jstring mJstr; // A copy of the jstring object that this JNIString represents
char* mChars; // Pointer to a ANSI code page char array
string mString; // string buffer for holding the "value" of this instance (ANSI code page)
};

The latter not only changed UTF-oriented encoding to ANSI-oriented encoding, but also removed the operator = (const char * PTR) definition, because operator
= (Const string & RHs) can replace the former as needed without any additional encoding. (Because according to the C ++ specification, const
Reference can be automatically converted. For more information, see my other article "about const
References ")
If you want to, add jnistring (jnienv * ENV, const wstring & Str) and operator to jnistring.
= (Const wstring & RHs) operator overload is perfect, :). It's very easy. Just add it to your friends.
The following is an example of using this class (the actual code used for demonstration is very small, most of them are some routine code ,:)):

# Include <iostream>
# Include <string>
# Include <assert. h>
# Include <JNI. h>

Using namespace STD;

Int main (){
Int res;
JavaVM * JVM;
Jnienv * env;
Javavminitargs vm_args;
Javavmoption options [3];

Options [0]. optionstring = "-djava. compiler = none ";
Options [1]. optionstring = "-djava. Class. Path =..."; //... is specially for this project
Options [2]. optionstring = "-verbose: JNI ";
Vm_args.version = jni_version_1_4;
Vm_args.noptions = 3;
Vm_args.options = options;
Vm_args.ignoreunrecognized = jni_true;
Res = jni_createjavavm (& JVM, (void **) & ENV, & vm_args );

If (RES <0 ){
Fprintf (stderr, "can't create Java VM/N ");
Return 1;
}

Jclass CLS = env-> findclass ("JNI/test/demo ");
Assert (0! = CLs );

Jmethodid mid = env-> getmethodid (CLS, "", "(ljava/lang/string;) V ");
Assert (0! = Mid );

Wchar_t * P = l "China ";
Jobject OBJ = env-> newobject (CLS, mid, env-> newstring (reinterpret_cast (P), wcslen (p )));
Assert (0! = OBJ );

Mid = env-> getmethodid (CLS, "getmessage", "() ljava/lang/string ;");
Assert (0! = Mid );

Jstring STR = (jstring) ENV-> callobjectmethod (OBJ, mid );

// Use jnistring for easier handling.
Jnistring jnistr (ENV, STR );
Cout <"jnistring:" <jnistr. getstring () <Endl;

Jnistr = "Chinese ";
Cout <jnistr. getstring () <Endl;

JVM-> destroyjavavm ();
Fprintf (stdout, "Java VM destory./N ");

Return 0;
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.