Wrong codepoints for non-ascii characters inserted in UTF-8 database using CLP

Last Update:2017-08-02 Source: Internet

Author: User

Tags db2 locale ibm support

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Technote (troubleshooting) problem (Abstract)

During Insert from the CLP there was no codepage conversion if operating system codepage and database codepage are both UTF -8. In the case data to is inserted should also is in UTF-8 encoding.
If data has a different encoding and the database codepage (this can is verified using any hex editor) and then the Operatin G system codepage should is changed to match the data's encoding in order to enforce the data conversion to the database C Odepage.

Symptom

Error executing Select SQL statement. Caught by Java.io.CharConversionException. errorcode=-4220

caused by:java.nio.charset.MalformedInputException:Input length = 4759 at com.ibm.db2.jcc.b.u.a (u.java:19) at COM.IBM.DB2.JCC.B.BC.A (bc.java:1762)

Cause

During an insert of data using CLP characters, they does not go through codepage conversion. IF operating system and database codepage both is UTF-8, but the data to was inserted is not Unicode, then data in the DAT Abase might has incorrect codepoints (Not-unicode) and the above error would be a result during data retrieval.

To verify the encoding for data to is inserted you can use any editor that shows hex representation of characters. Verify the codepoints for non-ascii characters so you try to insert. If you see only 1 bytes per Non-ascii characters then you need to force the database conversion during insert from CLP to U TF-8 database.

To force codepage conversion during inserts from the CLP make sure, the operating system codepage is Non-unicode and MA Tching to the codepage of data when you insert data to Unicode database from Non-unicode data source.

problem Details An example problem scenario is as follows:

Create a database of type UTF-8:
CREATE DATABASE <db> USING codeset utf-8 TERRITORY US
Create a table that holds character data:
CREATE TABLE Test (col char (20))
Check Operating system locale:
Locale Lang=en_us. UTF-8 lc_ctype= "en_US. UTF-8 "
Insert the non-ascii characters '? ', ' 3 ', '? ' which has codepoint 0x ' C3 ', 0x ' B3 ', 0x ' A9 ' in codepage 819 into the table:
INSERT into Test VALUES ('? ') INSERT into test values (' 3 ') insert into test values ('? ')
By running the following statement, you can see that all inserts statements caused only one byte to being inserted into the TA ble
SELECT Col, HEX (col) from test
? C3 3 B3? A9
However, the UTF-8 representation of those characters are:0x ' C383 ' for '? ', 0x ' c2b3 ' for ' 3 ', and 0x ' c2a9 ' for '? '. So these three rows in the table contain invalid characters in UTF-8.
When selecting from a column using the JDBC application, the following error would occur. This is expected because the table contains invalid UTF-8 data:error executing Select SQL statement. Caught by Java.io.CharConversionException. ERRORCODE=-4220 caused by:java.nio.charset.MalformedInputException:Input length = 4759 at Com.ibm.db2.jcc.b.u.a ( u.java:19) at COM.IBM.DB2.JCC.B.BC.A (bc.java:1762)
Delete all rows with incorrect Unicode codepoints from the test table:delete * from test
The change from the locale to one, matching codepage of data to be Inserted:export Locale=en_us. One of the the-determine the codepage for your data can is found Here:http://www.codeproject.com/articles/17201/detect -encoding-for-in-and-outgoing-text. IF you prepare data yourself using some editor please check the documentation for your editor to find out how to set up th e codepage for data being prepared by the editor.
Insert data to the Table:insert into Test VALUES ('? ') INSERT into test values (' 3 ') insert into test values ('? ')
Verify that inserted data were converted to UTF-8 during Insert:select Col, HEX (col) from test
? C383 3 c2b3? C2a9
Run your Java application selecting Unicode data. No exception should be reported.

Environment

UNIX, Linux, Unicode database

Diagnosing the problem

Verify that NON-ASCII data has a proper Unicode codepoints in Unicode database

Resolving the problem

Reinsert data with codepage conversion enforced by setting the operation system codepage matching to the codepage of data To be inserted

Related information

Export Data:

Community Questions and discussion

By adding a comment, we accept our Terms of use. Your comments entered on this IBM support site does not represent the views or opinions of IBM. IBM, in it sole discretion, reserves the right-to-remove any comments from this site. IBM is isn't responsible for, and does isn't validate or confirm, the correctness or accuracy of any comments you post. IBM does not endorse any of your comments. All IBM comments was provided "as is" and was not a warranted by the IBM in any.

Wrong codepoints for non-ascii characters inserted in UTF-8 database using CLP

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More