A practical way to provide Unicode support for DB2 UDB for Linux, UNIX, and Windows

Source: Internet
Author: User
Tags db2 locale locale setting requires create database linux


Brief introduction



Today's applications are often designed for international use. These applications may need to handle strings in different languages. Unicode is a language-independent character representation standard.



Because the Java programming language already uses Unicode internally to represent characters, the development of internationalized applications is much easier. However, you cannot consider only the application side. The back-end database must also be able to handle Unicode characters. This article discusses several topics that help developers implement DB2 UDB applications for internationalization.



What Unicode standards are supported in DB2?



There is only one Unicode standard, but there are different Unicode character encoding schemes. The most common Unicode character encodings are UTF-8 and UCS-2:



UTF-8: Uses 1 to 4 bytes to represent the encoding of each character. This encoding scheme encodes ASCII characters in one byte and encodes non-ASCII characters with multiple bytes (up to 4 bytes).



UCS-2: Each Unicode character is encoded to 2 bytes. This coding scheme can represent more than 65,000 characters, which covers most characters of the most important languages in the world. UCS-2 is also used inside Java.



As mentioned earlier, the complete Unicode standard consists of more than 65,000 characters. Other characters outside this range are essentially those that may no longer be in use, or have limited use. For example, some Asian characters are used only in names. These extra characters, also known as supplemental characters, can be represented in UTF-8 with 4 bytes. In addition, there is a coding scheme called UTF-16, which can also be used to represent supplementary characters. To do this, UTF-16 uses 2 x 2 bytes.



The DB2 UDB only supports UTF-8 and UCS-2 encodings. Although DB2 does not support supplemental characters, supplemental characters can be stored in the DB2 UDB. It should be noted that more than 65,000 characters are sufficient for most applications. Processing supplemental characters in a Java application requires a special mechanism. Therefore, processing supplemental characters requires not only database support, but also application support.



How to use Unicode encoding schemes to store data in DB2



You can store Unicode data in a DB2 Unicode database. In a Unicode database, all tables use Unicode encoding schemes to store character data. DB2 also allows the use of Unicode format to store character data in Unicode tables in non-Unicode databases.



Defining a Unicode Database



Unicode databases store character data in Unicode format and do not display data in Unicode format. When you create a database, DB2 determines the encoding page for the database. The encoding page of a database can be determined implicitly or explicitly.



DB2 UDB for Linux, UNIX, and Windows can implicitly determine the encoding page based on an environment variable. When a database is created using the Create DB statement, the encoding page of the database is deduced from the locale setting of the operating system. For Windows operating systems, the encoding page for the database is deduced from the ANSI encoding page settings in the registry. For UNIX operating systems, the environment is deduced from the locale, including language, region, and encoding sets. It should be noted that the registry variable Db2codepage can be used to overwrite the encoding page. However, if the Db2codepage registry variable is set to an incorrect value, it can cause unpredictable results and potential data loss.



You can also explicitly specify an encoding page in the CREATE DATABASE statement using the Using CODESET clause. A CREATE database statement with a USING codeset UTF-8 clause indicates that the database can contain UTF-8 or UCS-2 encoded character data. For clarity, it is recommended that you explicitly define a Unicode database:



CREATE DATABASE db_name USING CODESET  codeset 
TERRITORY territory_name COLLATE USING collating sequence 



The codeset should be specified in uppercase characters, such as Utf-8,territory_name is the encoding that provides zone-specific support, collating sequence represents a method of comparing character data.



For example



CREATE  DATABASE UCSAMPLE USING CODESET UTF-8 TERRITORY US 





Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.