Some advice from Wince-unicode encoding

Last Update:2018-12-07 Source: Internet

Author: User

Tags string back

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

When we spend most of our time applying existing applications Program Port to Microsoft Windows CE. Generally, this plan is not too difficult. We started with Microsoft Win32 Code Of course, Windows CE is based on Win32 application interfaces (APIS. It is advantageous that our application (Raima Data Manager) has easy-to-use interfaces and contains a library consisting of approximately 150 sub-functions written in C, it can be used to create, manage, and access databases.
By setting up an application, we thought porting it to Windows CE was a relatively simple C language programming exercise. However, we will soon encounter some difficulties. Starting from a careless error, for example, using the Microsoft Windows NT library on a Windows NT-based Windows CE simulator, and then violating the Windows CE programming rules, for example, "Do not assign an odd memory address to Unicode (International Standard Organization 10646) characters ".
About 90% of problems are more or less related to Unicode. Although Unicode programming is not difficult, it is easy to make mistakes when writing code for single-byte characters (I have had many errors ).
The following suggestions are based on our experience in writing Raima Data Manager on Windows CE, but I believe they are worth learning before any other Windows CE program. After all, when most Windows developers create the first Windows CE application, they actually use the existing Win32 knowledge.
Do not use the Windows NT library on the simulator
The first mistake discussed here is too stupid, but I am stuck in it. Maybe you will. When Microsoft Vc ++ (Version 5.0) is used to create a Windows CE program, you will find that the include path and library path) and the executable program path is automatically adjusted to match the selection of the target environment. Therefore, for example, when creating an application for the Windows CE simulator, you will find that the include path does not point to the Win32 inclusion file (in the VC directory ), instead, it points to the Windows CE inclusion file (under the wce directory ). Never modify it.
Because Windows CE runs in Windows NT, the program running on the simulator can call functions in any Windows NT dynamic link library (DLL), even if the DLL is not a member of the simulator. Obviously, this is not good because the same functions may be unavailable on handheld PCs (H/PCS) or Windows CE devices, and your software will eventually run on these devices.
When you load a non-Unicode application into a Windows CE Simulator for the first time, you will find that many functions in use are not supported, such as the American National Institute of Standards (ANSI) strcpy (). This may lead you to link to the Windows NT Runtime Library to solve all the problems.
If you are programming with Windows CE at the beginning, you may be able to include files and library files obviously. The answer is, you should not use include files and library files that are used when writing common Win32 or non-Windows CE programs.
Do not confuse tchars and bytes
If you are writing a non-Unicode application on Windows CE, you may want to convert all strings from a single character (chars) to a wide character (widechars) (for example, the C variable type whcar_t ). Almost all Windows CE-supported Win32 and Runtime library functions require wide character variables. Windows 95 does not support Unicode. However, to make the program code portable, you should try to use the tchar type defined in tchar. H, instead of using wchar_t directly.
Whether tchar is defined as wchar_t or char depends on whether the pre-processor's symbol Unicode is defined. Similarly, all macros related to string processing functions, such as _ tcsncpy macro, are defined as Unicode function wcsncpy or ANSI function strncpy, depending on whether Unicode is defined.
In existing Windows applications, some code may imply a single-byte length. This is often used to allocate memory to strings, for example:
Int myfunc (char * P)
{
Char * pszfilename;
Pszfilename = malloc (maxfilelen );
If (pszfilename)
Strncpy (pszfilename, P, maxfilelen );
/* ETC */

In this Code, the allocated memory block should be written (maxfilelen * sizeof (char), but most programmers prefer to simplify it to maxfilelen, because sizeof (char) for all platforms) the value is equal to 1. However, when you use tchars to replace multiple characters, it is easy to forget this inherent concept and write the code into the following form:
Int myfunc (tchar * P)
{
Tchar * pszfilename;
Pszfilename = (tchar *) malloc (maxfilelen );
If (pszfilename)
Tcsncpy (pszfilename, P, maxfilelen );
/* ETC */

This is not acceptable. It will immediately cause an error. The error here is that the variable size specified in the malloc function is bytes, but the third variable used in the _ tcsncpy function is specified as tchars rather than bytes. When Unicode is defined, a tchar is equal to two bytes ).
the above Code segment should be rewritten to:
int myfunc (tchar * P)
{< br> tchar * pszfilename;
pszfilename = (tchar *) malloc (maxfilelen * sizeof (tchar);
If (pszfilename)
tcsncpy (pszfilename, P, maxfilelen );
/* ETC */
do not place Unicode strings into odd memory addresses.
on Intel series Processors, you can store any variable or array in an odd number of memory addresses without causing any fatal errors. But on H/PC, isn't this always possible? You must be cautious with data types larger than one byte, including wchar_t defined as unsigned short. When you try to access them, placing them in odd addresses will cause overflow.
the editor often reminds you of these issues. You cannot manage the stack variable addresses, and the editor will check whether these addresses match the variable types. Similarly, the Runtime Library must ensure that the memory allocated from the heap always meets a word boundary, So you generally do not have to worry about those two points. However, if the application contains code that uses the memcpy () function to copy the memory area, or some type of pointer arithmetic is used to determine the memory address, the problem may occur. Consider the following example:
int send_name (tchar * pszname)
{< br> char * P, * q;
int nlen = (_ tcslen (pszname) + 1) * sizeof (tchar);
P = maloc (header_size + nlen);
If (P)
{< br> q = P + header_size;
_ tcscpy (tchar *) Q, pszname );
}< br>/* ETC */

This code allocates memory from the heap and copies a string, leaving a header_size at the beginning of the string. If Unicode is defined, the string is a widechar string. If header_size is an even number, this code will work normally. However, if header_size is an odd number, this code will fail, because Q points to an odd address.
Note that this issue does not occur when you test this code on an Intel series processor Windows CE simulator.
In this example, you only need to make sure that the header_size is an even number to avoid the problem. However, you may not be able to do this in some cases. For example, if a program inputs data from a single PC, you may have to use a pre-defined binary format, although it is not suitable for H/PC. In this case, you must use functions that use character pointers to control strings rather than tchar pointers. If you know the length of the string, you can use memcpy () to copy the string. Therefore, a function that analyzes Unicode strings one byte may be enough to determine the length of a string in widechars.
Translation between ANSI and Unicode strings
If your Windows CE application interface is on a Windows PC, you may have to operate the ANSI string data (for example, char string) in the PC ). Even if you only use Unicode strings in the program, this is a fact.
You cannot process an ANSI string on Windows CE because their library functions are not manipulated. The best solution is to convert an ANSI string to a unicode string on the H/PC, and then convert the Unicode string back to the ANSI string to use the PC. To complete these conversions, you can use multibytetowidechar () and widechartomultibyte () Win32 API functions.
Split (hack) for String Conversion in Windows CE 1.0)
In Windows CE 1.0, these WIN32API functions are not completed yet. So if you want to support both Ce 1.0 and Ce 2.0, you must use other functions. You can use wsprintf () to convert an ANSI string to a unicode string. The first parameter uses a widechar string and recognizes "% s" (uppercase), which means a string. Since wsscanf () and wsprintfa () are not available, you must find another way to convert Unicode strings back to ANSI strings. Because Windows CE 1.0 is not supported by NLS, you may have to turn to hack, as shown below:
/*
Definition/prototypes of conversion functions
Multi-byte (ANSI) to widechar (UNICODE)
Atow () converts from ANSI to widechar
Wtoa () converts from widechar to ANSI
*/
# If (_ win32_wce >=101)
# Define atow (stra, strw, lenw )\
Multibytetowidechar (cp_acp, 0, stra,-1, strw, lenw)
# Define wtoa (strw, stra, Lena )\
Widechartomutibyte (cp_acp, 0, strw,-1, stra, Lena, null, null)
# Else/* _ win32_wce >=101 )*/
/*
Multibytetowidechar () and widechartomultibyte () not supported o-n Windows CE 1.0
*/
Int atow (char * stra, wchar_t * strw, int lenw );
Int wtoa (wchar_t * strw, char * stra, int Lena );
Endif/* _ win32_wce >=101 */
# If (_ win32_wce <101)
Int atow (char * stra, wchar_t * strw, int lenw)
{
Int Len;
Char * pA;
Wchar_t * PW;
/*
Start with Len = 1, not Len = 0, as string length returned
Must include null Terminator, as in multibytetowidechar ()
*/
For (Pa = stra, PW = strw, Len = 1; lenw; PA ++, PW ++, lenw --, Len ++)
{
* PW = (lenw = 1 )? 0: (wchar_t) (* pA );
If (! (* PW ))
Break;
}
Return Len;
}
Int wtoa (wxhar_t * strw, char * stra, int Lena)
{
Int Len;
Char * pA;
Wchar_t * PW;
/*
Start with Len = 1, not Len = 0, as string length returned
Must include null Terminator, as in widechartomultibyte ()
*/
For (Pa = stra, PW = strw, Len = 1; Lena; PA ++, PW ++, lena --, Len ++)
{
Pa = (LEN = 1 )? 0: (char) (PW );
If (! (* PA ))
Break;
}
Return Len;
}
# Endif/* _ win32_wce <101 */

This method is easier to implement for Windows CE 1.0 than to use the wsprintf () function, because it is more difficult to limit the length of the string pointed to by the target pointer by using the wsprintf () function.
Select the correct string comparison Function
If you want to classify Unicode Standard strings, you can use the following functions:
Wcscmp (), wcsncmp (), wcsicmp (), and wcsnicmp ()
Wcscoll (), wcsncoll (), wcsicoll (), and wcsnicoll ()
Comparestring ()
The first type of function can be used to compare strings. do not refer to local or foreign characters. If you never want to support foreign languages, or you just want to test whether the content of the two strings is the same, such functions are very useful.
The second type of functions use the existing local settings (system settings, unless you call the wsetlocale () function before the string comparison function) to compare the two strings. These functions can also correctly classify foreign characters. If the local character "C" ("C" locale) is selected, these functions have the same functionality as the first type of functions.
The third type of function is Win32 function comparestring (). This function is similar to the second type of function, but it allows you to specify the local settings (the locale) as a parameter, rather than using the existing local settings (current locale settings ). The comparestring () function allows you to specify the length of two strings. You can set the second parameter to norm_ignorecase to make the function case insensitive when comparing strings.
Generally, the comparestring () function is not case sensitive even if the second parameter is not set to norm_ignorecase. We often use the wcsncoll () function to distinguish between uppercase and lowercase unless the local character "C" ("C" locale) is used ). Therefore, in our code, the comparestring () function is not used for Case sensitivity, but the wcsncoll () function is used for Case sensitivity.
Do not use relative paths
Unlike Windows NT, Windows CE does not have the current directory concept. Therefore, any path is relative to the root directory. If your software uses relative paths for files or directories, you are likely to move them elsewhere. For example, the path ". \ ABC" is treated as "\ ABC" in Windows CE.
Removed call to calloc () and time () Functions
The calloc () function in the C Runtime Library cannot be used, but the malloc () function can replace the calloc () function. Do not forget that the memory allocated during the calloc () function Initialization is zero, while the malloc () function is different. Similarly, the time () function cannot be used, but you can use the Win32 function getsystemtime () instead of the time () function.
After the above warnings, you will happily learn the two points of advice that surprised you at the end.
You do not need to change the call to the Win32 input/output (I/O) file.
Windows CE also supports Win32 input and output functions. Allows you to access objects as you access the Win32 file system. The createfile () function does not recognize the file_flag_random_access flag in Windows CE. However, this flag is only used for optional disk access and does not affect function calling.
Do not worry about the byte status
When we write an application to Windows CE, we can find that the digital data type of Windows CE is in the same byte State as that of intel, windows CE supports all processors.
Like almost all database engines, the Raima Database Manager stores digital data in binary form in database files. This means that a record is processed as a series of bytes no matter when it is written to or read from the database. As long as the database files are not transmitted to any other system, the problem of the byte status of the digital data is solved. If the database file is accessed by a processor from the original system with different bytes, the digital data will be misunderstood.
This problem occurs whenever you transfer files on machines with different processors. It is worth noting that all types of processors use the same byte status.
When using Windows CE, these suggestions should attract enough attention to avoid learning detours.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More