DBCoffer and Oracle Character Set

Source: Internet
Author: User
As an Oracle data security enhancement product, it is inevitable to perform operations on internal Oracle data, which is mainly to encrypt the data to be protected in Oracle

As an Oracle data security enhancement product, it is inevitable to perform operations on internal Oracle data, which is mainly to encrypt the data to be protected in Oracle

Introduction

Database coffer (DBCoffer) is an Oracle data security enhancement product based on the Oracle extension mechanism, featuring high data security, completely transparent applications, and efficient ciphertext access. DBCoffer can prevent external data attacks that bypass the firewall, data theft from Internal High-Permission users, and Data leaks caused by disk and tape theft.

As an Oracle data security enhancement product, it is inevitable to perform operations on internal Oracle data, which is mainly to encrypt the data to be protected in Oracle, however, because DBCoffer protects and processes data externally from Oracle and then imports the data into the database again, which involves issues related to Character Set compatibility, Character Set conversion, and so on, when you encounter problems, how can we deal with them with ease? This requires developers and testers to have a certain knowledge base on Oracle character sets. The following describes some knowledge principles of character sets as the starting point, then, we will discuss the problem further and step by step.

1 Oracle Character Set Overview

The Oracle character set is a collection of symbols for the interpretation of byte data. It can be divided into different sizes and have an inclusive relationship. ORACLE supports the national language architecture, allowing you to store, process, and retrieve data in a localized language. It makes database tools, error messages, sorting order, date, time, currency, numbers, and calendar automatically adapt to localization languages and platforms.
The most important parameter that affects the character set of Oracle databases is the NLS_LANG parameter. The format is as follows:
NLS_LANG = language_territory.charset

It has three components (language, region, and Character Set), each of which controls the NLS subset features. Where: Language specifies the Language of the server message, territory specifies the date and digital format of the server, and charset specifies the character set. For example: AMERICAN _ AMERICA. ZHS16GBK
From the composition of NLS_LANG, we can see that the real impact on the database character set is actually the third part. When the "select userenv ('language') from dual" statement is used for query, the database server returns the "language_territory.charset" structure, and the charset corresponds to the character set of the currently connected database, the query results can also be used as the basis for configuring the client character set.

In terms of data storage, we have to mention two concepts: database character set and national character set. When installing Oracle, you can specify the database character set and the national character set. The function is to store, process, and retrieve data in the national language and format, for example, to store CHAR, VARCHAR2, CLOB, LONG and other data types. The National Character Set is essentially an additional character set selected for Oracle. It is mainly used to enhance the character processing capability of Oracle, because the NCHAR data type can support the use of fixed-length multi-byte encoding in Asia, you can only select AF16UTF16 and UTF8 in unicode encoding to store NCHAR, NVARCHAR2, NCLOB, and other data types. The default value is AF16UTF16.

Due to the wide variety of oracle character sets, and the multiple links in the storage, retrieval, and migration of oracle Data are closely related to Character Set settings, in practical applications, database developers and administrators often encounter Oracle Character Set problems.

2 Analysis of Common Character Set principles in Oracle

In the original database system, the character set had only one ASCII type. Since ASCII supports a lot of characters, many encoding schemes were introduced later. Most of these encoding schemes included ASCII, the Oracle Character Set US7ASCII is a 7-bit ASCII character set. Of course, it is not easy to clarify all the character sets of Oracle. The following describes some common Oracle character set encoding.

2.1 single-byte encoding

A single-byte 7-bit character set can contain 128 characters. The most common character set is US7ASCII.

Single-byte 8-bit character set, which can contain 256 characters. It is suitable for most European countries, such as WE8ISO8859P1 (Western Europe, 8-bit, ISO standard 8859P1 encoding)

2.2 multi-byte encoding

Variable-length multi-byte encoding. Some characters are represented by one byte. Other characters are represented by two or more characters. variable-length multi-byte encoding is commonly used for Asian languages, for example, Japanese, Chinese, and Hindi. For example, AL32UTF8 (in which AL stands for ALL and applies to ALL languages) and ZHS16GBK231280.

Fixed-length multi-byte encoding. Each character uses a fixed-length multi-byte encoding scheme. Currently, Oracle only supports the fixed-length multi-byte encoding AF16UTF16 and is only used for national character sets.

2.3 Unicode encoding

Unicode is a single encoding scheme that covers all the known characters currently used around the world, that is, Unicode provides a unique encoding for each character. UTF-16 is a Unicode 16-bit encoding method, a fixed length multi-byte encoding, with 2 bytes representing a Unicode character, AF16UTF16 is the UTF-16 encoding character set. UTF-8 is Unicode 8-bit encoding, is a variable-length multi-byte encoding, this encoding can use 1, 2, 3 bytes to represent a Unicode character, AL32UTF8, UTF8 and UTFE are UTF-8 encoded character sets.

When the encoding value of A character set (character set A) contains all the encoding values of another character set (Character Set B), and the same encoding value of the two character sets represents the same character, character Set A is the Super character of Character Set B, or Character Set B is the subset of Character Set. Because US7ASCII is the earliest Oracle Database encoding format, many character sets are supersets of US7ASCII, such as WE8ISO8859P1, ZHS16CGB231280, ZHS16GBK, oracle internal character set conversion only ensures that the conversion from the subset to the superset is normal.

3 Communication Architecture Analysis Between DBCoffer and Oracle

With the basic knowledge of the character set above, let's talk about the communication architecture between DBCoffer and Oracle. I believe it will soon help you understand where Character Set conversion is involved in the figure, as shown in: application and Oracle client, external database and Oracle server, Oracle client and Oracle server, Oracle server and Exp export, Imp import and Oracle server, the point mentioned here is that the Oracle Server communicates directly and character sets may occur. In DBCoffer, the external database is called to directly communicate with DBCSecureService to process the data and send it back to the database. Of course, this is only the tip of the iceberg of DBCoffer, next we will discuss the possible situations from the differences between the two sides of the communication.

Figure 1

3.1 Analysis of consistent character sets

As we all know, if the character sets at both ends are consistent between the Oracle client and the server, there will be no character set conversion, that is, what is input at the client, then what is the database storage, this is why Chinese data is also well supported for 7-bit US7ASCII, 8-bit WE8ISO8859P1, and UTF8 character sets, but when these character sets are used to process Chinese characters, the development and maintenance of the corresponding client programs will be much more difficult, and the possibility of garbled code generation is much higher than that of ZHS16GBK. The following example shows an Oracle database with the java thin connection Character Set WE8ISO8859P1.

The java thin connection can also be considered as a client of Oracle. the character set used by this "client" is generally referenced by the JAVA default character set. After the experiment, when the java thin method is used to connect to Oracle, when the server character set is ZHS16GBK and UTF8, the DML statement execution and result set display are displayed, yes, you do not need to convert the relevant character set. When the Oracle database character set is WE8ISO8859P1, the problem arises. For the SQL statement to be executed, in particular, the SQL statement containing Chinese data must first perform a transcoding process. The transcoding function is as follows:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.