5-year experience in PowerBuilder (1) -- Support for Unicode by PB

Source: Internet
Author: User
Return directory return next

Copyright description

This series of articles are published in the blog Park, and cannot be copied, edited, printed, published, produced, published, or disseminated in any other way except on the Internet, this includes making and transmitting electronic documents in various formats without the knowledge of the author, or copying them in any form for commercial purposes without the permission of the author himself. The author reserves the right to pursue tort liability for this series of articles.

If you need to make an electronic document and use it for non-commercial purposes, please keep the following copyright information and contact the author to mail the copy.

Author: Zhang Nan

Network Name: SummerHeart

Email: Costware@163.com

Blog: http://summerheart.cnblogs.com/

Http://blog.csdn.net/summerheart

Time: 2008.5.26

Copyright: 2008

PS: reprinted. Please retain the above copyright information

 

Chapter 1 details that cannot be ignored by PowerBuilder 1. PowerScript

The basic programming language of PB is a language called PowerScript. It was developed by PowerSoft, the first developer of PB. It is called PowerScript because it is different from VB, Delphi, and C, it has its own system. Although it is not widely used as a standard programming language, it plays an indispensable role in PowerBuilder. Let's take a look at some of its features.

1.1 Unicode support

The following is an example of PB8.02. You can see that in the debugging status, string a gets a string with half Chinese characters, and the name length is changed to 6, the len function treats Chinese characters as two bytes, while English letters are processed as single bytes. In this way, when a character contains a Chinese character, the length of the character processing function (including mid and right) will be inconsistent with the number of characters.
PB8.02 figure-1 PB11.2 figure-2

However, this problem has been solved in pb11.2. Both Chinese characters and English letters are treated as one character in a unified manner, which is a universal Unicode character encoding. This encoding treats different characters in different countries in the world as one character, and each character occupies two bytes. Such processing is costly, but all characters using Unicode are processed in two bytes, occupying one byte space more than the previous one. However, this kind of effort is worthwhile, and its value is obvious, eliminating the inconsistency in processing between various verbal and textual sets. This is also a kind of globalization. Haha!

Looking back at the PB9 era, LeftA, LenA; LeftW, LenW, and so on emerged. Such a function form aims to solve the problem of Asian character sets (including ASCII characters including ANSI, that is, the encoding of single-byte characters between 0 and is added. With"A"Is used to process a single byte, with"WIs used to process double bytes. Of course, Unicode is now available, with"WThe function has been deprecated. Therefore, we recommend that you do not use these functions in future programming. All these functions are retained only to be compatible with programs developed in earlier versions of pb9. With"A"Can still be used as a single-byte character. In PB11.2,W"Function help description.


 

We can see the processing results of three different function forms. In PB11.2, regular functions (such as left and len) can fully process dual characters.W"It is absolutely necessary to be deprecated. However, single-byte functions (with"A") Is still processed by single byte, and is more effective than PB8.02 in identifying single-byte and double-byte strings.


 

In addition, the Unicode support for PB11.2 can be seen through an example, the call to an API written in VC ++, the API function prototype is:

Int foo (char * ip)

The call results in PB8.02 and PB11.2 are as follows:

 

Figure 1 Figure 2

The definitions and calls are the same, as shown below:

1 Function uLong foo (ref String ipaddress) Library "GetIP. dll"
2
3 string ls_HostIP = space (128)
4if foo (ls_HostIP) = 0 then
5 sle_2.text = trim (ls_HostIP)
6end if
7

The API function written in VC ++ contains a single-byte character pointer with the char * pointer. The preceding two figures show that the string of PB8.02 is processed in a single byte, it does not support Unicode, while PB11.2 already supports Unicode. It treats all strings as Unicode characters, so garbled characters may occur.

After the original definition in PB11.2 is added with ANSI, it is defined as: Function uLong foo (ref String ipaddress) Library "GetIP. dll" alias for "foo; ANSI"

The result is converted to the desired value.

How is PB11.2 converted?

In fact, this is not because of petabytes, but because the bytes occupied by a single character in the two character sets are different. Let's take a look at the length of this character with LenA (), which is equivalent to 13 (actually, the lenA function in PB11.2 does not get the length, the reason is that PB has processed the string as Unicode when returning the character pointer from the API), and len is changed to 7. Obviously, it treats every two sections as a Unicode character. The heap-based acⅱ Code of 1 is 39 in, and H31 39 in combination, but all the bytes are stored in the memory from right to left, therefore, the memory is stored as H39 31, and the box is displayed when you view the Unicode table. That is, the local machine does not have this symbol font, so it cannot be displayed, instead of a box.

Note: Unicode Table query

Http://www.wiki.cn/wiki/Unicode%E7%BC%96%E7%A0%81%E8%A1%A8/3000-3FFF

Solution to the character pairs before PB9

Since PB9 did not support Unicode functions in the past, you can only write functions to process Chinese character strings. The following gives the author a defined function f_mylen () for len () to process strings with Chinese characters. The Code is as follows:

Char l_ch
Int li_len, li_p
String ls_str

Ls_str = a_str
Li_p = 1

Do while len (ls_str)> = li_p
L_ch = mid (ls_str, li_p, 1)
If asc (l_ch)> 127 then
Li_p + = 2
Else
Li_p + = 1
End if
Li_len + = 1
Loop
Return li_len

 

The left () function is replaced by a custom f_myleft (). The Code is as follows:

Char l_ch
Int li_len
String ls_str
String ls_rtn
Li_len = f_mylen (a_str)
If a_len> = li_len then return a_str
Ls_str = a_str
Li_len = 1

Do while li_len <= a_len and len (ls_str)> = li_len
L_ch = mid (ls_str, li_len, 1)
If asc (l_ch)> 127 then
Ls_rtn = ls_rtn + mid (ls_str, li_len, 2)
Li_len + = 2
A_len + = 1
Else
Ls_rtn = ls_rtn + mid (ls_str, li_len, 1)
Li_len + = 1
End if
Loop
Return ls_rtn

 

For functions such as mid and right, it is similar to f_myleft.


Return directory return next

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.