Conversion of wide character codes in Android NDK and icu library usage

Source: Internet
Author: User

Original stickers http://topic.csdn.net/u/20101022/16/1b2e0cec-b9d2-42ea-8d9c-4f1bb8320a54.html? R = 70149216, read and implement it, and record it for reuse.

For the java layer, the String class can be used to convert various encodings. There are no open tools available under the ndk, but icu4c can be used.

ICU4C is the C language implementation version of IBM's International Development component ICU. It is also implemented in the android system. No available APIs are available in the ndk. You need to load the dynamic library to call the conversion function.

In android, the icu library path is "/system/lib/libicuuc. so". The conversion function is ucnv_convert _? _?. The question mark varies with function names of different versions. In the 2.2 simulator libicuuc. so, this function is named ucnv_convert_4_2 and ucnv_convert_3_8 is used in the 2.1 simulator. It seems that the function should be treated separately based on different versions, and no unified solution has been found.

Function prototype:

Void ucnv_convert (const char *, const char *, char *, int32_t, const char *, int32_t, int32_t *);

 

Usage:

// Declare the function pointer <br/> void (* ucnv_convert) (const char *, const char *, char *, int32_t, const char *, int32_t, int32_t *) = 0; <br/> // load the dynamic library <br/> void * pDL = dlopen ("/system/lib/libicuuc. so ", RTLD_LAZY); <br/> // here, android2.2 is used as an example. The function name is ucnv_convert_4_2 <br/> ucnv_convert = (void (*) (const char *, const char *, char *, int32_t, const char *, int32_t, int32_t *) dlsym (pDL, "ucnv_convert_4_2 "); <br/> // loaded successfully. <br/> if (ucnv_convert) {<br/> char * cbuf = "... "; <br/> char buffer [100]; <br/> int errcode = 0; <br/> // utf8 is the target encoding, ucs2 is the original character encoding <br/> // buffer is the buffer for storing the converted characters, 100 bytes <br/> // cbuf is the string pointer to be converted <br/> // errcode is incorrectly encoded, for more information, see <br/> ucnv_convert ("utf8", "ucs2", buffer, 100, cbuf, strlen (cbuf), & errcode); <br/>}

 

The converted string is placed in the buffer. If an error occurs, the error code is put in errcode.

 

As shown in the title, there is also a wide character in the ndk, that is, the wchar_t problem. It is also troublesome to transplant it with other platforms.

In linux, wchar_t is 4 bytes by default, while in windows (including CE and MOBILE) and symbian, both are 2 bytes. The solution is to add the compilation switch-fshort-wchar to the LOCAL_CFLAGS file, such as LOCAL_CFLAGS: =-fshort-wchar. In this way, the compiler is forced to process wchar_t in two bytes, but there will be warning during compilation.

In this way, although the compiler processes two bytes, the pre-compiled library libc is still 4 bytes, which will make wcslen and other functions unusable (in fact, wcslen in ndk is actually useless ), the solution can re-compile libc, but the simplest is to implement wcslen by yourself.

The following code is on the copy network. If you forget the specific information, you can convert wchar_t to a char string so that you can use the icu library to convert it at will.

 

// Obtain the wchar_t String Length <br/> int wlen (const wchar_t * strString) {<br/> int I = 0; <br/> while (1) {<br/> if (strString [I] = 0) {<br/> break; <br/>} else {<br/> I ++; <br/>}< br/> return I; <br/>}</p> <p> char * W2C (const wchar_t * pw, char * pc) <br/>{< br/> * pc ++ = * pw> 8; <br/> * pc = * pw; <br/> return 0; <br/>}< br/> // conversion string <br/> char * wstr2cstr (const wchar_t * pwstr, char * pcstr, size_t len) {<br/> char * Ptemp = pcstr; <br/> if (pwstr! = NULL & pcstr! = NULL) {<br/> size_t wstr_len = wlen (pwstr); <br/> len = (len> wstr_len )? Wstr_len: len; <br/> while (len --> 0) {<br/> W2C (pwstr, pcstr); <br/> pwstr ++; <br/> pcstr + = 2; <br/>}< br/> * pcstr = '/0'; <br/> return ptemp; <br/>}< br/> return 0; <br/>}

 

Wstr2cstr can be converted. There is also a matter of byte order. In the W2C function, whether the conversion of a wchar_t to char is a low position or a high position is a matter of fact, I am afraid it depends on the encoding before and after conversion.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.