The database management system supports some kind of coding, which mainly involves three aspects:
Database server support.
Data access interface support.
Client tool support.
1 Database Server character encoding :
The database server supports some kind of encoding, meaning that the database server can receive, store, and supply the encoded characters (including identifiers, character field values) to clients from the client, and can convert the encoded characters to other encodings (such as UTF-8 encoding to GBK encoding).
1.1 Specify the database server encoding:
Postgresql:
Specify when creating database:
CREATE DATABASE ... ENCODING ...
You can take ASCII, UTF-8, EUC_CN 、......
1.2 View database encoding
Postgresql:
Show server_encoding
2 Database Access interface encoding
The data access interface supports some kind of encoding, and the interface should be able to read and write the encoded characters correctly, and should not lose data and distort data.
Take the JDBC interface for example:
JDBC interfaces are typically set Client_encoding,set client_encoding to file_encoding according to the JVM's file.encoding.
Converts a string to a client_encoding-encoded byte stream, which is passed to the server side, the prototype string.getbytes (client_encoding).
After you receive the byte stream for the server, use client_encoding to construct the string object as the return value of the GetString to the application, prototype string (byte[], ..., client_encoding)
3 Client Code
The client tool supports some kind of encoding, must be able to display the encoded characters read from the database, and can also submit the encoded characters to the server side through this tool.
3.1 PostgreSQL client-side encoding of the specified session
SET client_encoding to ' value '
3.2 View Database encoding
Show client_encoding
4 View binary strings with different encoding characters
Here are the binary storage strings of several characters in the database under different encodings, PostgreSQL select decoding (name, ' escape ') from test to view the binary strings in the database server.
4. 1 Take "shell steel" as an example
GBK encoded as: B1B4 B8D6
UTF-8 encoded as: e8b49d E992A2
GB18030 encoded as: B1B4 B8D6
4. 2 Taking "" as an example
GBK encoded as: FE57 FE54
UTF-8 encoded as: eea09c EEA099
GB18030 encoded as: 8336c9388336c935
5 Code Conversion Example
The following is a concrete example to see, in this example, the client uses gbk/gb18030 encoding, the interface uses GBK18030 encoding at both ends, the database server uses UTF-8 encoding:
Conversion involves:
Conversion between encoding and connecting client encodings in an application
Connecting between server-side encoding and database server encoding
In the image above, the orange-Red Arrows indicate
For example, the binary strings in the database server under different encodings are:
GBK encoded as: FE57 FE54
UTF-8 encoded as: eea09c EEA099
GB18030 encoded as: 8336c9388336c935
Socket:
The programming interface guarantees that the character encoding sent to the server side is consistent with the client_encoding of the current session.
Client_encoding can be set to the current encoding of the characters obtained from the application
You can also get the client_encoding of the current session, converting the characters obtained from the application into the client_encoding set encoding
Server:
The conversion between client_encoding and server_encoding
According to the conversion of the database code conversion algorithm, the method in the target coding is converted to the question mark ""
6 problems encountered in peacetime
The character was incorrectly coded to parse, resulting in garbled characters.
Characters are present in two character sets, causing this part of the character to become "