Technote (troubleshooting) problem (Abstract)
During Insert from the CLP there was no codepage conversion if operating system codepage and database codepage are both UTF -8. In the case data to is inserted should also is in UTF-8 encoding.
If data has a different encoding and the database codepage (this can is verified using any hex editor) and then the Operatin G system codepage should is changed to match the data's encoding in order to enforce the data conversion to the database C Odepage.
Symptom
Error executing Select SQL statement. Caught by Java.io.CharConversionException. errorcode=-4220
caused by:java.nio.charset.MalformedInputException:Input length = 4759 at com.ibm.db2.jcc.b.u.a (u.java:19) at COM.IBM.DB2.JCC.B.BC.A (bc.java:1762)
Cause
During an insert of data using CLP characters, they does not go through codepage conversion. IF operating system and database codepage both is UTF-8, but the data to was inserted is not Unicode, then data in the DAT Abase might has incorrect codepoints (Not-unicode) and the above error would be a result during data retrieval.
To verify the encoding for data to is inserted you can use any editor that shows hex representation of characters. Verify the codepoints for non-ascii characters so you try to insert. If you see only 1 bytes per Non-ascii characters then you need to force the database conversion during insert from CLP to U TF-8 database.
To force codepage conversion during inserts from the CLP make sure, the operating system codepage is Non-unicode and MA Tching to the codepage of data when you insert data to Unicode database from Non-unicode data source.
problem Details An example problem scenario is as follows:
- Create a database of type UTF-8:
CREATE DATABASE <db> USING codeset utf-8 TERRITORY US
- Create a table that holds character data:
CREATE TABLE Test (col char (20))
- Check Operating system locale:
Locale Lang=en_us. UTF-8 lc_ctype= "en_US. UTF-8 "
- Insert the non-ascii characters '? ', ' 3 ', '? ' which has codepoint 0x ' C3 ', 0x ' B3 ', 0x ' A9 ' in codepage 819 into the table:
INSERT into Test VALUES ('? ') INSERT into test values (' 3 ') insert into test values ('? ')
- By running the following statement, you can see that all inserts statements caused only one byte to being inserted into the TA ble
SELECT Col, HEX (col) from test
? C3 3 B3? A9
However, the UTF-8 representation of those characters are:0x ' C383 ' for '? ', 0x ' c2b3 ' for ' 3 ', and 0x ' c2a9 ' for '? '. So these three rows in the table contain invalid characters in UTF-8.
- When selecting from a column using the JDBC application, the following error would occur. This is expected because the table contains invalid UTF-8 data:error executing Select SQL statement. Caught by Java.io.CharConversionException. ERRORCODE=-4220 caused by:java.nio.charset.MalformedInputException:Input length = 4759 at Com.ibm.db2.jcc.b.u.a ( u.java:19) at COM.IBM.DB2.JCC.B.BC.A (bc.java:1762)
- Delete all rows with incorrect Unicode codepoints from the test table:delete * from test
- The change from the locale to one, matching codepage of data to be Inserted:export Locale=en_us. One of the the-determine the codepage for your data can is found Here:http://www.codeproject.com/articles/17201/detect -encoding-for-in-and-outgoing-text. IF you prepare data yourself using some editor please check the documentation for your editor to find out how to set up th e codepage for data being prepared by the editor.
- Insert data to the Table:insert into Test VALUES ('? ') INSERT into test values (' 3 ') insert into test values ('? ')
- Verify that inserted data were converted to UTF-8 during Insert:select Col, HEX (col) from test
? C383 3 c2b3? C2a9
- Run your Java application selecting Unicode data. No exception should be reported.
Environment
UNIX, Linux, Unicode database
Diagnosing the problem
Verify that NON-ASCII data has a proper Unicode codepoints in Unicode database
Resolving the problem
Reinsert data with codepage conversion enforced by setting the operation system codepage matching to the codepage of data To be inserted
Related information
Export Data:
Community Questions and discussion
By adding a comment, we accept our Terms of use. Your comments entered on this IBM support site does not represent the views or opinions of IBM. IBM, in it sole discretion, reserves the right-to-remove any comments from this site. IBM is isn't responsible for, and does isn't validate or confirm, the correctness or accuracy of any comments you post. IBM does not endorse any of your comments. All IBM comments was provided "as is" and was not a warranted by the IBM in any.
Wrong codepoints for non-ascii characters inserted in UTF-8 database using CLP