DBCoffer and Oracle Character Set

Source: Internet
Author: User

Introduction

Database coffer (DBCoffer) is an Oracle data security enhancement product based on the Oracle extension mechanism, featuring high data security, completely transparent applications, and efficient ciphertext access. DBCoffer can prevent external data attacks that bypass the firewall, data theft from Internal High-Permission users, and Data leaks caused by disk and tape theft.

As an Oracle data security enhancement product, it is inevitable to perform operations on internal Oracle data, which is mainly to encrypt the data to be protected in Oracle, however, because DBCoffer protects and processes data externally from Oracle and then imports the data into the database again, which involves issues related to Character Set compatibility, Character Set conversion, and so on, when you encounter problems, how can we deal with them with ease? This requires developers and testers to have a certain knowledge base on Oracle character sets. The following describes some knowledge principles of character sets as the starting point, then, we will discuss the problem further and step by step.

1 Oracle Character Set Overview

The Oracle character set is a collection of symbols for the interpretation of byte data. It can be divided into different sizes and have an inclusive relationship. ORACLE supports the national language architecture, allowing you to store, process, and retrieve data in a localized language. It makes database tools, error messages, sorting order, date, time, currency, numbers, and calendar automatically adapt to localization languages and platforms.
The most important parameter that affects the character set of Oracle databases is the NLS_LANG parameter. The format is as follows:
NLS_LANG = language_territory.charset

It has three components (language, region, and Character Set), each of which controls the NLS subset features. Where: Language specifies the Language of the server message, territory specifies the date and digital format of the server, and charset specifies the character set. For example: AMERICAN _ AMERICA. ZHS16GBK
From the composition of NLS_LANG, we can see that the real impact on the database character set is actually the third part. When the "select userenv ('language') from dual" statement is used for query, the database server returns the "language_territory.charset" structure, and the charset corresponds to the character set of the currently connected database, the query results can also be used as the basis for configuring the client character set.

In terms of data storage, we have to mention two concepts: database character set and national character set. When installing Oracle, you can specify the database character set and the national character set. The function is to store, process, and retrieve data in the national language and format, for example, to store CHAR, VARCHAR2, CLOB, LONG and other data types. The National Character Set is essentially an additional character set selected for Oracle. It is mainly used to enhance the character processing capability of Oracle, because the NCHAR data type can support the use of fixed-length multi-byte encoding in Asia, you can only select AF16UTF16 and UTF8 in unicode encoding to store NCHAR, NVARCHAR2, NCLOB, and other data types. The default value is AF16UTF16.

Due to the wide variety of oracle character sets, and the multiple links in the storage, retrieval, and migration of oracle Data are closely related to Character Set settings, in practical applications, database developers and administrators often encounter Oracle Character Set problems.

 

2 Analysis of Common Character Set principles in Oracle

In the original database system, the character set had only one ASCII type. Since ASCII supports a lot of characters, many encoding schemes were introduced later. Most of these encoding schemes included ASCII, the Oracle Character Set US7ASCII is a 7-bit ASCII character set. Of course, it is not easy to clarify all the character sets of Oracle. The following describes some common Oracle character set encoding.

2.1 single-byte encoding

A single-byte 7-bit character set can contain 128 characters. The most common character set is US7ASCII.

Single-byte 8-bit character set, which can contain 256 characters. It is suitable for most European countries, such as WE8ISO8859P1 (Western Europe, 8-bit, ISO standard 8859P1 encoding)

2.2 multi-byte encoding

Variable-length multi-byte encoding. Some characters are represented by one byte. Other characters are represented by two or more characters. variable-length multi-byte encoding is commonly used for Asian languages, for example, Japanese, Chinese, and Hindi. For example, AL32UTF8 (in which AL stands for ALL and applies to ALL languages) and ZHS16GBK231280.

Fixed-length multi-byte encoding. Each character uses a fixed-length multi-byte encoding scheme. Currently, Oracle only supports the fixed-length multi-byte encoding AF16UTF16 and is only used for national character sets.

2.3 Unicode encoding

Unicode is a single encoding scheme that covers all the known characters currently used around the world, that is, Unicode provides a unique encoding for each character. UTF-16 is a Unicode 16-bit encoding method, a fixed length multi-byte encoding, with 2 bytes representing a Unicode character, AF16UTF16 is the UTF-16 encoding character set. UTF-8 is Unicode 8-bit encoding, is a variable-length multi-byte encoding, this encoding can use 1, 2, 3 bytes to represent a Unicode character, AL32UTF8, UTF8 and UTFE are UTF-8 encoded character sets.

When the encoding value of A character set (character set A) contains all the encoding values of another character set (Character Set B), and the same encoding value of the two character sets represents the same character, character Set A is the Super character of Character Set B, or Character Set B is the subset of Character Set. Because US7ASCII is the earliest Oracle Database encoding format, many character sets are supersets of US7ASCII, such as WE8ISO8859P1, ZHS16CGB231280, ZHS16GBK, oracle internal character set conversion only ensures that the conversion from the subset to the superset is normal.

 

3 Communication Architecture Analysis Between DBCoffer and Oracle

With the basic knowledge of the character set above, let's talk about the communication architecture between DBCoffer and Oracle. I believe it will soon help you understand where Character Set conversion is involved in the figure, as shown in: application and Oracle client, external database and Oracle server, Oracle client and Oracle server, Oracle server and Exp export, Imp import and Oracle server, the point mentioned here is that the Oracle Server communicates directly and character sets may occur. In DBCoffer, the external database is called to directly communicate with DBCSecureService to process the data and send it back to the database. Of course, this is only the tip of the iceberg of DBCoffer, next we will discuss the possible situations from the differences between the two sides of the communication.

Figure 1

 

3.1 Analysis of consistent character sets

As we all know, if the character sets at both ends are consistent between the Oracle client and the server, there will be no character set conversion, that is, what is input at the client, then what is the database storage, this is why Chinese data is also well supported for 7-bit US7ASCII, 8-bit WE8ISO8859P1, and UTF8 character sets, but when these character sets are used to process Chinese characters, the development and maintenance of the corresponding client programs will be much more difficult, and the possibility of garbled code generation is much higher than that of ZHS16GBK. The following example shows an Oracle database with the java thin connection Character Set WE8ISO8859P1.

The java thin connection can also be considered as a client of Oracle. the character set used by this "client" is generally referenced by the JAVA default character set. After the experiment, when the java thin method is used to connect to Oracle, when the server character set is ZHS16GBK and UTF8, the DML statement execution and result set display are displayed, yes, you do not need to convert the relevant character set. When the Oracle database character set is WE8ISO8859P1, the problem arises. For the SQL statement to be executed, in particular, the SQL statement containing Chinese data must first perform a transcoding process. The transcoding function is as follows:

Public String changeCharset (String str, String newCharset, String oldCharset) throws UnsupportedEncodingException {

If (str! = Null ){

// Convert the target character set to the desired Character Set

Byte [] bs = str. getBytes (oldCharset );

Return new String (bs, newCharset );

}

Return null;

}

Table 1
 
The DML statement is converted from GBK encoding to ISO8859-1 encoding, while the result set obtained is the opposite, the conversion from "ISO8859-1" encoding to "GBK" encoding can display Chinese characters normally. Some may be wondering why the server does not need to be converted when the UTF8 character set is used? To solve this problem, we need to use the knowledge of subsets and supersets mentioned above. UTF-8 is a UNICODE variable-length Byte encoding. In short, UTF-8 can store almost all the texts in the world. That is, UTF8 is the superset of ZHS16GBK. In this process, no data is lost or changed, while the default string encoding of JAVA strings is "GBK ", this is why JAVA does not need to transcode SQL statements and result sets, regardless of whether the database character set is ZHS16GBK or UTF8. This is why some prompts are garbled when the database character set in DBCStudio is WE8ISO8859P1. You need to perform transcoding before displaying the information.
 

3.2 Analysis of inconsistent character sets

The inconsistency here mainly refers to the inconsistency between the character set on which extproc depends on the external library calling process and the character set on the Oracle database. If the Oracle client and the server are on one machine, work can be performed through the IPC protocol, which is short for Inter-process Communication, thus improving the data access speed. Like the Oracle client, the character set that the external process depends on is also determined by the ring variable NLS_LNAG, it will undoubtedly increase the difficulty of application development and maintenance in the future. Once Character Set conversion is required, it will face risks such as program running exceptions and data loss. Here we will take a look at what will happen with the DBC system.

First, modify the local listener. ora to enable the external process calling function in Oracle. Here, we mainly modify the listening configuration of Oracle. For more information, see the following table.

SID_LIST_LISTENER =

(SID_LIST =

(SID_DESC =

(SID_NAME = ljb)

(ORACLE_HOME = E: \ oracle \ product \ 10.2.0 \ db_1)

)

(SID_DESC =

(SID_NAME = PLSExtProc)

(ORACLE_HOME = E: \ oracle \ product \ 10.2.0 \ db_1)

(ENVS = EXTPROC_DLLS = ANY)

(PROGRAM = extproc)

)

)

LISTENER =

(DESCRIPTION_LIST =

(DESCRIPTION =

(ADDRESS = (PROTOCOL = IPC) (KEY = extproc ))

(ADDRESS = (PROTOCOL = TCP) (HOST = Jiabo) (PORT = 1521 ))

)

)

 

Table 2

 

For example, if the database character set is ZHS16GBK and the value of the registry NLS_LANG is SIMPLIFIED CHINESE_CHINA.ZHS16GBK, start DBCoffer and encrypt a table in a common Schema, the NUMBER of records in this table is 830. Select a NUMBER type field for Data Encryption. At this time, the encryption operation is completed normally. Then add the system environment variable NLS_LANG = AMERICAN_AMERICA.UTF8. Be sure to restart the machine and synchronize the environment variables to the Registry. After the restart, when PL/SQL Developer is used to connect to Oracle, a warning dialog box Indicating inconsistent character sets between the client and the server is displayed, and then the system environment variable NLS_LANG is deleted, when PL/SQLDeveloper is used to connect to Oracle again, there is no warning of inconsistent character sets, indicating that the client character set is already ZHS16GBK.

But when the authorization is being queried, the problem arises, with the error message: ORA-01722: Invalid Number. When you locate a low-level table for query, it is found that the ciphertext fields corresponding to the underlying table have the same length. The results are shown in the following table:

SQL> select count (*) from odct # orders where lengthb (customerid) = 18 and lengthb (e

Mployeeid) = 18;

 

COUNT (*)

----------

830

Table 3

 

Data decryption is incorrect during query. Delete the system environment variable NLS_LANG and restart the listener and the Oracle instance. At this time, the system-level environment variable NLS_LANG in the registry is AMERICAN_AMERICA.UTF8. After the machine is restarted again, the Data Query becomes normal, therefore, it is certain that the extproc process reference character set is taken from the environment variable NLS_LANG in the registry.

We recommend that you configure NLS_LANG to the listener and replace the red part in Table 2

(ENVS = "EXTPROC_DLLS = ANY, NLS_LANG = AMERICAN_AMERICA.UTF8 ");

Then restart the listener to take effect.

 

How to export and import 4ZHS16GBK

Before discussing this question, we need to clarify the encoding principle of GBK. Its inner space is 0x8140-0 xFEFE, that is, a Chinese character contains two bytes. below is the ASCII value expression of GBK encoding:
First byte second byte
GBK      |  x81-0xFE(129-254)    |   0x40-0xFE(64-254)
Table 5

In Oracle, the character set conversion of ZHS16GBK is the same. When the first byte is 129-254, the Oracle server finds the next byte to form a complete Chinese character encoding.

When both ends of the IPC protocol program have the same character set, we know that character set conversion will not occur when data is transmitted to the database, that is, data is directly stored into the database. The problem is that when the encrypted ciphertext field is an odd value length, due to the diversity of data, it is difficult to ensure that the ASCII value of the last byte is not 129-254 In the encrypted ciphertext data. See the following table first:

Byte number

Byte 1

Byte 2

Byte 3

......

Byte 2n-1

Byte 2n

Byte 2n + 1

ASCII Value

...

...

...

...

...

...

129-254

Table 4

However, because Character Set conversion does not occur, data will not be lost when stored in the database. However, when Character Set conversion occurs, after the first 2n characters are matched, the remaining 2n + 1 bytes will be searched for the next byte to form a GBK character encoding, obviously, there is no place to find it. At this time, exceptions or data loss will inevitably occur.

The following is an example. The old version of DBCoffer system is used to encrypt a field, and then export the ciphertext table containing the field to the database using the EXP command, rename the underlying ciphertext table, and then export it back to the database using the IMP command, compare the length of two ciphertext fields again, as shown in:

Figure 2

It is found that some of the ciphertext data of the newly imported table is lost in one byte. Next, we will export the Import and Export tables with the EXP command again to see if data loss occurs during the first export or import process. We will use the hexadecimal comparison tool to compare the two exported binary data files, as shown in:

Figure 3

The left side of the image is the first exported data file, and the right side of the image is the second exported data file. Obviously, data loss occurs during data import, and the loss of bytes is between 81-FE, the corresponding device control signal 11 is changed to the data transfer code conversion signal 10. This point exactly corresponds to the previous theory. The first 2n bytes have completed the corresponding Character Parsing, And the last 2n + 1 byte needs to find the next byte again, but it has reached the end, oracle considers it an invalid byte and discards it. However, for the DBCoffer system, this damages data integrity and causes DBCoffer to run abnormally.

Through the above analysis, I have learned about the causes and consequences of the problem. Now I am mainly working on how to solve the problem. readers who have read this article believe that they have their own solutions, but BDCoffer do better, in order to be more compatible, DBCoffer uses two sets of solutions to solve this problem, and can be flexibly configured, about the specific solution Interested friends can mail to: liaojiabo@schina.cn.

 

5 DBCoffer supports Oracle Character Set Exploration

According to the theory, DBCoffer should support common character sets in Oracle, but this is not a simple test. It requires a deep understanding of the encoding of various character sets in Oracle, in addition, Oracle character set involves a wide range of knowledge, so you need to calm down and think carefully about the principles contained in it.

Due to the unremitting efforts and rigorous work attitude of the development team, DBCoffer currently supports the ZHS16GBK, US7ASCII, and WE8ISO8859P1 character sets well, and UTF8 is being tested. We believe that DBCoffer will soon be compatible with more common Oracle character sets.
This article demonstrates the relevant content of DBCoffer and Oracle for your reference through theoretical and practical experiments.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.