Using the Dbunit Framework database to insert special characters failed error-checking experience

Source: Internet
Author: User
Tags cdata control characters xml parser

This article records the error-checking experience of inserting a special character when inserting a database data using the Dbunit test framework. I hope I can have some inspiration to the small white students like me when they encounter similar problems.
In the unit test of the database interaction module, the Ext field in the database table needs to write the data before it is read and processed. The Ext field format is Key1ctrl^dvalue1ctrl^cKey2 ctrl^d value2. Using the DBUNIT framework for unit testing, Dbunit is a junit-extended database testing framework. The data inserted into the database in this project is organized in XML format. Some of the contents of the XML file are as follows

<?XML version= "1.0" encoding= "Utf-8"?><DataSet><feed_item_0007ID= "323723909"Biz_number= "11223345"ext= "item_price\u0003498.00\u0002postage\u000310.00\u0002location\u0003 guangzhou shenzhen \u0002properties\u0003186026840 : 125200612;6939376:922305\u0002 "        /></DataSet>

In Java, the value of Ctrl^d is (char) the value of 4,ctrl^c is (char) 3. This is to popularize the meaning of the string \u0003 in the Java code, which means that the value after the Unicode value (char) 3 is escaped. In the unit test, found in the database read out of the data, originally a character ctrl^c namely \u0003 into 6 characters, respectively, is \,u,0,0,0,3.

the process of troubleshooting the problem :

Since the value read out in the database is incorrect, the value inserted is wrong. The first is not very familiar with Java, not the import into the unit test package, did not find the use of the JUnit framework and Dbunit. Led to blind looking for a while colleague developed Dbunitbasetest this unit test base class problem, think is the problem of transcoding. No fruit, later saw the base class Dbunitebasetest source, only know that there is dbunit this east, and found that the base class did not do anything special treatment, is based on the configuration file initialization datasource, and then according to the XML data file to the corresponding table in the database to insert data process.

Later in Google search, with the key word is dbunit \u0003 and the like, too specific, leading to not find too much relevant useful information. have been struggling to find the solution to the problem of ideas.
Later a colleague cautioned that CDATA could be used to check the usage of CDATA, with some ideas.

"CDATA is a keyword used in an XML document to tell the XML parser that this part is not parsed and is intended for other applications, such as JavaScript, and so on." All text in the XML document is parsed by the parser, and only the text within the CDATA part is ignored by the parser. "

Later from the above search results on the page to see a useful east: numeric character reference.

Because XML syntax uses some characters for tags and attributes it isn't possible to directly use those characters inside XML tags or attribute values. To include special characters inside XM files are must use the numeric character reference instead of that character. The numeric character reference must be UTF-8 because the supported encoding for XML files are defined in the Prolog as enc Oding= "UTF-8" and should not be changed.

The numeric character reference uses the format:

& #nn; Decimal form

& #xhh; Hexadeciaml form

Therefore, the following solutions are available:
1. Try to write directly \u0004 numeric character, that is, & #4; Failure, reported error: Character reference "& #4" is an invalid XML Character
2.\u0004, only the \ with numeric character replaced, that is, & #92; U0004 this way is still 6 characters, with the same effect as writing \u0004 directly
3. Using cdata:ext=<! [cdata["postage\u000410.0\u0003"]]> found the wording may not be correct, the error is the format of XML error.

This time, I feel close to the truth, is to feel every search \u0004 related things, the scope is too small, not very able to find the answer to the question. Later, talking to colleagues about this problem, colleagues mentioned that these control characters are not correctly encoded, and suddenly woke me up. Direct search does XML support control characters, there are the following findings:

Specifically, 0x1-0x1f and 0x7f-0x9f must is encoded as escapes in XML 1.1. The former were forbidden and the latter were optionally not-escaped in 1.0.

Therefore, it can be seen that the use of scenario 1 o'clock, because XML1.0 does not support these control characters, so still error, and is said & #4这个字符是非法的XML字符. From the search results above, XML 1.1 supports these control characters, so happily the XML file in the XML version from 1.0 to 1.1, the result is an error:

Org.dbunit.dataset.DataSetException:Line 1:xml Version "1.1" is not supported, only XML 1.0 is supported.

Finally, the simple and rough solution: for this field of the table, use DataSource directly, and then use the statement Execute SQL statement to update the data to the field we want.

PS: Later encountered in the Java properties file, if there is Chinese, the program parsing out is garbled problem. Looking at the code for the base class dbunittest of unit tests written by colleagues, it is found that the properties file is Prop.load (new FileInputStream (file) loaded by the properties class. Searched for the definition of the load function of the properties class and found it because

The input stream is in a simple line-oriented format as specified in and are assumed to use the load(Reader) ISO 8859-1 character Encoding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.