Informix Database Support for UTF8 character sets
With the rapid development of enterprise informatization and the increasing depth of informatization, more and more enterprise data and information need to be stored in databases and
Management, but often the information is built on different platforms, stored in different database products, so that in the data exchange process across data
The unified standards of the database have become a top priority for enterprises.
Character Set problems are one of them. Taking the financial industry as an example, many banks and other financial institutions have established
A variety of platforms based on products of different database vendors, such as basic IBM Informix databases, IBM DB2 databases, Oracle data, and other business systems,
These business systems need to exchange data, centralize data, and analyze data between banks and banks.
The Unified Character Set of all business systems has become one of the challenges facing financial systems. In this article, we will introduce the Chinese Character Set of Informix.
And introduces the support for the UTF8 character set in the Informix database.
1. Basic knowledge about character sets
From the beginning, the earliest encoding scheme of character sets comes from ASCII, which is also the most common encoding method. This solution originated in the early 1960s S,
It was initially developed by the American Library of Congress to serve as a common standard for the American Library's bibliography exchange, and finally developed into the American National Standard ASCII (
Standard Code for Information Interchange), then further evolved into the world's computer character encoding Standard ISO646 (its full name is
7-bit coded character set for information interchange), thus becoming the foundation of computer coding solutions.
The earliest supported encoding scheme of Informix database is en_US.8859-1.
Because English characters are generally stored in one byte, the 7-bit encoding scheme can only represent a maximum of 128 characters. The extended 8 encoding scheme can also
It represents 256 characters, which is far from meeting the needs of computer development. For complicated character storage in Asian countries, more bitwise is required, so various encoding schemes
It is born.
To accommodate all characters and symbols in various languages around the world and solve the compatibility and conversion problems between different encodings, more than 10 companies jointly contributed in January 1991,
The Unicode Association was established, followed by Unicode encoding.
The Unicode Association's slogan is: provide each character with a unique number, no matter what platform, No matter what program, no matter what language.
At first, Unicode encoding uses 2-Byte (16 bit) for encoding, but it can only accommodate up to 65536 characters. It is still not enough.
Extended, that is, the Unicode 3.1 Standard, with additional supplementary character definitions added. Now the Unicode 5.0 standard has been released. For more information, see Unicode
Official website (http://www.unicode.org ). Unicode mainly has 3 implementation standards: UTF-8, USC-2 and UTF-16. Informix
International Language Supplement supports UTF-8.
According to the meaning of different standards, if the database needs to store different characters and symbols in different languages, it needs to exchange data across databases.
And analysis. It is generally recommended to use Unicode encoding. Of course, the Unicode scheme can represent more characters, but the extra
Therefore, you still need to consider carefully when selecting the most suitable database character set.
2. Examples of how Informix uses the UTF-8 Character Set
Standard Informix database products tend to have no UTF-8 character set, and customers or users can execute the glfiles command at $ INFORMIXDIR/
After the lc11.txt file is generated, you can check the content in the lc11.txt file to check whether the current version supports the utf8 character set.
In this case, we chose to use Informix Dynamic Server 11.50.fc3 for AIX 5.3 to test Informix's support for the UTF-8 character set.
Step 1: Run glfiles and generate lc11.txt. After check, the installed version does not contain the zh_cn.utf8 character set;
Step 2: download and install Informix international language supplement. In this example, we use the version of ils3.40.mc3,
After installation, run glfiles again to confirm that the new lc11.txt contains the zh_cn.utf8 character set;
Step 3: confirm the operating system's support for the Chinese Character Set, execute locale-A | grep-I zh_cn, and confirm that the operating system supports the Chinese ZH_CN.UTF-8
Step 4: Add export lang = ZH_CN.UTF-8, export db_locale = zh_cn.utf8, Export
Client_locale = zh_cn.utf8;
Step 5: restart the Informix database instance. Make sure that the new environment variables take effect, and then create a new database based on zh_cn.utf8.
Go to sysmaster and execute select * From sysdbslocale to confirm. If the corresponding dbs_collate content is "zh_cn.57372"
A database that supports the zh_cn.utf8 character set is created successfully.
Next, you can use the Telnet terminal that supports utf8, such as securitycrt, PUTTY-CKJ, etc. In the corresponding environment settings, click "support uft8" option,
Then, you can use dbaccess or other Informix tools to conveniently query and insert UTF8 Character Set Data.
Informix, as one of the mainstream OLTP databases in the market, has always had good support for Unicode character sets.
Environment, we have reason to believe that informix databases with good Unicode support will be used in enterprise data management, data analysis, and data integration.
More important.