Wrong codepoints for non-ascii characters inserted in UTF-8 database using CLP

Source: Internet
Author: User
Tags db2 locale ibm support

Technote (troubleshooting) problem (Abstract)

During Insert from the CLP there was no codepage conversion if operating system codepage and database codepage are both UTF -8. In the case data to is inserted should also is in UTF-8 encoding.
If data has a different encoding and the database codepage (this can is verified using any hex editor) and then the Operatin G system codepage should is changed to match the data's encoding in order to enforce the data conversion to the database C Odepage.

Symptom

Error executing Select SQL statement. Caught by Java.io.CharConversionException. errorcode=-4220

caused by:java.nio.charset.MalformedInputException:Input length = 4759 at com.ibm.db2.jcc.b.u.a (u.java:19) at COM.IBM.DB2.JCC.B.BC.A (bc.java:1762)


Cause

During an insert of data using CLP characters, they does not go through codepage conversion. IF operating system and database codepage both is UTF-8, but the data to was inserted is not Unicode, then data in the DAT Abase might has incorrect codepoints (Not-unicode) and the above error would be a result during data retrieval.

To verify the encoding for data to is inserted you can use any editor that shows hex representation of characters. Verify the codepoints for non-ascii characters so you try to insert. If you see only 1 bytes per Non-ascii characters then you need to force the database conversion during insert from CLP to U TF-8 database.

To force codepage conversion during inserts from the CLP make sure, the operating system codepage is Non-unicode and MA Tching to the codepage of data when you insert data to Unicode database from Non-unicode data source.

problem Details An example problem scenario is as follows:

  1. Create a database of type UTF-8:
    CREATE DATABASE <db> USING codeset utf-8 TERRITORY US
  2. Create a table that holds character data:
    CREATE TABLE Test (col char (20))
  3. Check Operating system locale:
    Locale Lang=en_us. UTF-8 lc_ctype= "en_US. UTF-8 "
  4. Insert the non-ascii characters '? ', ' 3 ', '? ' which has codepoint 0x ' C3 ', 0x ' B3 ', 0x ' A9 ' in codepage 819 into the table:
    INSERT into Test VALUES ('? ') INSERT into test values (' 3 ') insert into test values ('? ')
  5. By running the following statement, you can see that all inserts statements caused only one byte to being inserted into the TA ble
    SELECT Col, HEX (col) from test
    ? C3 3 B3? A9
    However, the UTF-8 representation of those characters are:0x ' C383 ' for '? ', 0x ' c2b3 ' for ' 3 ', and 0x ' c2a9 ' for '? '. So these three rows in the table contain invalid characters in UTF-8.
  6. When selecting from a column using the JDBC application, the following error would occur. This is expected because the table contains invalid UTF-8 data:error executing Select SQL statement. Caught by Java.io.CharConversionException. ERRORCODE=-4220 caused by:java.nio.charset.MalformedInputException:Input length = 4759 at Com.ibm.db2.jcc.b.u.a ( u.java:19) at COM.IBM.DB2.JCC.B.BC.A (bc.java:1762)
  7. Delete all rows with incorrect Unicode codepoints from the test table:delete * from test
  8. The change from the locale to one, matching codepage of data to be Inserted:export Locale=en_us. One of the the-determine the codepage for your data can is found Here:http://www.codeproject.com/articles/17201/detect -encoding-for-in-and-outgoing-text. IF you prepare data yourself using some editor please check the documentation for your editor to find out how to set up th e codepage for data being prepared by the editor.
  9. Insert data to the Table:insert into Test VALUES ('? ') INSERT into test values (' 3 ') insert into test values ('? ')
  10. Verify that inserted data were converted to UTF-8 during Insert:select Col, HEX (col) from test
    ? C383 3 c2b3? C2a9
  11. Run your Java application selecting Unicode data. No exception should be reported.

Environment

UNIX, Linux, Unicode database

Diagnosing the problem

Verify that NON-ASCII data has a proper Unicode codepoints in Unicode database

Resolving the problem

Reinsert data with codepage conversion enforced by setting the operation system codepage matching to the codepage of data To be inserted

Related information

Export Data:

Community Questions and discussion

By adding a comment, we accept our Terms of use. Your comments entered on this IBM support site does not represent the views or opinions of IBM. IBM, in it sole discretion, reserves the right-to-remove any comments from this site. IBM is isn't responsible for, and does isn't validate or confirm, the correctness or accuracy of any comments you post. IBM does not endorse any of your comments. All IBM comments was provided "as is" and was not a warranted by the IBM in any.

Wrong codepoints for non-ascii characters inserted in UTF-8 database using CLP

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.