When you store multilingual data in a DB2 UDB database, Unicode is often the only encoding that adapts to the entire range of data. DB2 can store and process Unicode data, but its sorting function is limited to binary ordering and three cultural sorts. In addition, a database can have only one collation, which is determined when the database is created. These sorting restrictions are an obvious challenge for databases that need to support users and data in multiple languages.
IBM created a function library for managing Unicode data, called the ICU (International components for Unicode). This library provides a complete set of functions to manipulate Unicode data for the C and Java™ programming languages. This library is published under an unrestricted open source license, which makes it available for use in many applications.
The ICU provides functions that implement the standard Unicode collation algorithm (UCA) as well as many cultural tuning functions. It is easy to combine the appropriate ICU functions into a single UDF (user-defined function) so that DB2 can take full advantage of the ICU sort support. (literally, DB2 for Linux, UNIX, and Windows uses the ICU library to sort the three cultures supported on Unicode data.) )
This article focuses on DB2 V8 for Linux, UNIX, and Windows, but this UDF example can also be used on upcoming Viper. This UDF should also be able to be used on DB2 V7 FP3 or later, but this is not tested. In addition, this UDF may be able to be used on DB2 for zseries® and iseries™, but this is not tested and this article does not discuss these platforms.
Installing the sample Database
This article assumes that the database you are using is called SAMPLE, but you can use any database. Regardless of which database you use, it must be a Unicode database. To determine whether a database is a Unicode database, execute the following command:
DB2 get Database configuration for sample
The Database code page entry at the top of the configuration information must be "Utf-8". If you do not have a Unicode database available, you can create one with the following command:
DB2 CREATE DATABASE sample using CodeSet Utf-8 territory CA
This article provides binary code for 32-bit Intel and AMD platforms. The following describes how to install pre-built binaries on Linux and Windows. For all other platforms, instructions are provided in compiling the ICU and UDF.
Pre-built binary code for Linux (32-bit Intel or AMD)
To install a pre-built binary code on Linux:
Download the file Sortkey-linux-x86.zip from download and extract it into a temporary directory. The following files:
sortkey
libicui18n.so.34
libicuuc.so.34
libicudata.so.34
Copied to the Sqllib/function.
Connect to the database and run the DDL script to catalog this function:
db2 connect to sample
db2 -tvf createfn.db2
Note: This Linux pre-built binaries are compiled with GCC 2.96 on Red Hat Linux 7.2.
Pre-built binary code for Windows (32-bit)
To install a pre-built binary code on Windows:
Download the file Sortkey-windows-32.zip from download and extract it into a temporary directory.
The following files:
Sortkey.dll
Icuin34.dll
Icuuc34.dll
Icudt34.dll
Copied to the Sqllib/function.
Use a DB2 command window to connect to the database and run the DDL script to catalog the function:
DB2 Connect to sample
DB2-TVF CREATEFN.DB2