MySQL Practice class Character set, data type and storage engine selection

Last Update:2014-10-07 Source: Internet

Author: User

Tags table definition

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

How to choose the right storage engine

Features of several common storage engines

Here we focus on several common storage engines and compare the differences and recommended usage between each storage engine.

Features MyisamBDBMemoryInnoDBArchive

Storage limit No no 64TB No

Transaction security support support

Lock mechanism table lock page lock table lock row lock row lock

B-Tree index Support Support support

Hash index support support

Full- text Indexing Support

Cluster index support

Data cache support support

Index cache support support

Data compression support support

Space use low low n / a high very low

Memory usage low low medium high Low

Bulk Insert speed High High high low Very high

Support for foreign key support

The 2 most commonly used storage engines:

MyISAM is the default storage engine for MySQL. When create creates a new table, MyISAM is used by default when the storage engine for the new table is not specified. Each myisam is stored as three files on disk. The file name is the same as the table name, and the extension is. frm (store table definition), respectively. MYD (MYData, storing data),. MYI (myindex, storage index). Data files and index files can be placed in different directories, evenly distributed IO, for faster speeds.

The InnoDB storage Engine provides transactional security with commit, rollback, and crash resiliency. However, compared to the MyISAM storage engine, InnoDB writes are less efficient and consume more disk space to preserve data and indexes.

Selection criteria: According to the application characteristics of the appropriate storage engine, for the complex application system can be selected according to the actual situation of a variety of storage engine combinations.

The following are the applicable environments for common storage engines:

MyISAM: Default MySQL plug-in storage engine, which is one of the most commonly used storage engines in the Web, data warehousing, and other application environments

InnoDB: For transactional applications, with many features, including acid transaction support.

Memory: Keep all your data in RAM and provide extremely fast access in environments where you need to quickly find references and other similar data.

Merge: Allows a MySQL DBA or developer to logically group together a series of equivalent MyISAM tables and reference them as 1 objects. Ideal for VLDB environments such as data warehousing.

Basic principles for selecting data types

Premise: Use a suitable storage engine.

Selection principle: Depending on the storage engine selected, determine how to select the appropriate data type.

The following selection methods are categorized by storage engine:

MyISAM data storage engine and data columns: MyISAM data tables, it is best to use fixed-length (CHAR) data columns instead of variable-length (VARCHAR) data columns.

Memory storage engine and data columns: Memory data tables are currently stored using fixed-length data rows, so there is no relationship between using char or varchar columns. Both are handled as char types.

InnoDB storage engine and data columns: varchar types are recommended.

For INNODB data tables, the internal row storage format does not differentiate between fixed-length and variable-length columns (all data rows use the head pointer pointing to data column values), so in essence, using fixed-length char columns is not necessarily simpler than using variable-length varchar columns. Thus, the main performance factor is the total amount of storage used by the data rows. Since Char takes up more space on average than varchar, it is better to use varchar to minimize the amount of storage and disk I/O for the rows of data that need to be processed.

Here's a list of fixed-length data columns and variable-length data columns.

Char vs. varchar

Char and varchar types are similar, but they are saved and retrieved in different ways. They are also different in terms of their maximum length and whether trailing spaces are retained. No case conversions are made during the storage or retrieval process.

The following table shows the results of saving various string values to char (4) and varchar (4) columns, explaining the difference between char and varchar:

Value CHAR (4) storage requirements VARCHAR (4) storage requirements

"' 4 bytes ' '1 bytes

' AB' ab '4 bytes ' ab '3 bytes

' abcd ' ABCD '4 bytes ' ABCD '5 bytes

' Abcdefgh 'abcd '4 bytes ' ABCD '5 bytes

Note that the value of the last row in the table above applies only when strict mode is not used, and if MySQL is running in strict mode, values that exceed the column length are not saved and an error occurs.

Values retrieved from char (4) and varchar (4) Columns are not always the same because trailing spaces are removed from the Char column when retrieved. The difference is illustrated by the following example:

Mysql> CREATE TABLE VC (v VARCHAR (4), C CHAR (4));

Query OK, 0 rows affected (0.02 sec)

Mysql> INSERT into VC VALUES (' AB ', ' ab ');

Query OK, 1 row Affected (0.00 sec)

Mysql> SELECT CONCAT (V, ' + '), CONCAT (c, ' + ') from VC;

+----------------+----------------+

| CONCAT (V, ' + ') | CONCAT (c, ' + ') |

+----------------+----------------+

| AB + | ab+ |

+----------------+----------------+

1 row in Set (0.00 sec)

Text and BLOBs

Note the following points when using the text and BLOB field types to better perform the performance of your database.

①blob and text values also raise some of their own problems, especially when a large number of delete or update operations are performed. Deleting this value leaves a large "void" in the data table, and the records that are filled in with these "holes" may be of different lengths, and it is recommended to defragment such tables periodically using the OPTIMIZE table feature for improved performance.

② uses the synthesized (synthetic) index. The indexed columns of the composition are useful at some point. One approach is to create a hash value based on the contents of the other columns and store the value in a separate data column. Next you can find the data row by retrieving the hash value. However, we should note that this technique can only be used for exact matching queries (hash values are not useful for range search operators like < or >=). We can use the MD5 () function to generate hash values, or you can use SHA1 () or CRC32 (), or use your own application logic to calculate the hash value. Keep in mind that numeric hash values can be stored efficiently. Similarly, if the hash algorithm generates strings with trailing spaces, do not store them in char or varchar columns, which are affected by trailing whitespace removal.

A synthesized hash index is particularly useful for those blob or text data columns. Finding the hash identifier value is much faster than searching the BLOB column itself.

③ avoid retrieving large blob or text values when unnecessary. For example, a SELECT * query is not a good idea unless you can determine that the WHERE clause as a constraint will only find the data row you want. Otherwise, you may have no destination to transfer a large number of values on the network. This is also an example of how blob or text identifier information is stored in the indexed column of the composite for our help. You can search for indexed columns, determine which rows of data are needed, and then retrieve the Blob or text value from the qualifying data rows.

④ separate a BLOB or text column into a separate table. In some environments, if you move these columns to the second data table, you can convert the data columns in the original data table to a fixed-length data row format, which makes sense. This reduces fragmentation in the primary table, allowing you to get the performance advantage of fixed-length data rows. It also allows you to run a SELECT * query on the main data table without transmitting a large number of blobs or text values over the network.

Floating point and fixed point number

In order to be able to attract people's attention, in the introduction of floating point and fixed-point number before you let us see an example:

mysql> CREATE TABLE Test (C1 float (10,2), C2 decimal (10,2));

Query OK, 0 rows affected (0.29 sec)

mysql> INSERT into test values (131072.32,131072.32);

Query OK, 1 row affected (0.07 sec)

Mysql> select * from test;

+-----------+-----------+

| C1 | C2 |

+-----------+-----------+

| 131072.31 | 131072.32 |

+-----------+-----------+

1 row in Set (0.00 sec)

From the above example, we see that the value of the C1 column is changed from 131072.32 to 131072.31, which is caused by the inaccuracy of the floating-point number.

In MySQL, float, double (or real) are floating-point numbers, and decimal (or numberic) is the fixed-point number.

The advantage of floating-point numbers relative to the fixed-point number is that floating-point numbers can represent a larger range of data in the case of a certain length, and its disadvantage is that it can cause accuracy problems. In the future of floating point and fixed-point number of applications, we have to remember the following points:

The floating-point number has the error problem;

The data which is sensitive to the precision of the currency should be represented or stored by the fixed-point number;

In programming, if using floating-point number, pay special attention to the error problem, and try to avoid floating-point comparison;

Pay attention to the handling of some special values in floating-point numbers.

Character Set Overview

The character set is a set of symbols and encoding rules, whether in the Oracle database or in the MySQL database, there is a character set selection problem, and if in the database creation phase does not select the correct character set, you may need to replace the character set in the late, and the replacement of the character set is a relatively expensive operation, There is also a certain risk, so we recommend that at the beginning of the application, the right set of characters should be selected according to the requirements, avoiding unnecessary adjustments at a later stage.

Introduction to MySQL supported character sets

The MySQL server can support multiple character sets (you can view all MySQL-supported character sets with the show Character set command), and you can specify different character sets for different fields on the same server, the same database, or even the same table. MySQL has significantly more flexibility than other database management systems such as Oracle, where only the same character set can be used in the same database.

The MySQL character set includes two concepts: the character set (CHARACTER) and the proofing Rules (COLLATION). The character set is used to define how MySQL stores strings, and the proofing rules define how strings are compared. Character sets and proofing rules are a one-to-many relationship, and MySQL supports more than 70 proofing rules for more than 30 character sets.

Each character set corresponds to at least one proofing rule. You can use show COLLATION like ' utf8% '; command to view the collation rules for the associated character set.

Unicode Overview

Unicode is an encoding specification. Here we briefly summarize the history of Unicode encoding.

First from the ASCII code, ASCII code is also an encoding specification, but ASCII code can only represent a maximum of 256 characters, is for the English generation, and in the face of Chinese, Arabic and other complex text, 256 characters are obviously not enough. So each country or organization has formulated the standard which conforms to own language, for example gb2312, Big5 and so on. However, the practice of setting their own standards obviously has many drawbacks, so the Unicode code specification came into being.

Unicode is also a character encoding method, but it is designed by international organizations and can accommodate all languages in the world coding scheme. The scientific name for Unicode is "Universal multiple-octet Coded Character Set", referred to as UCS. UCS can be seen as an abbreviation for "Unicode Character Set".

Unicode has two sets of standard UCS-2 and UCS-4, which represent one character in 2 bytes, the latter representing one character in 4 bytes. For example, the UCS-2, which can represent the number of characters 2^16=65535, can basically accommodate all the European and American characters and most Asian characters.

How to choose the right character set

We recommend that you use a small character set as much as possible if you can fully satisfy your application. Because a smaller character set means that it saves space, reduces the number of bytes transmitted over the network, and improves the performance of the system due to the small amount of storage space indirectly.

There are many character sets that can hold Chinese characters, such as UTF8, gb2312, GBK, Latin1, and so on, but gb2312 and GBK are commonly used. Because the gb2312 font is smaller than the GBK font, some remote words (for example: Minghe) can not be saved, so in the selection of character sets must weigh these remote words in the application of the probability of the occurrence and the impact, can not make a positive response to the best choice GBK.

Settings for the MySQL character set

The MySQL character set and proofing rules have 4 levels of default settings: Server-level, database-level, table-level, and field-level. They are set up in different places, and the effects are not the same.

The server character set and proofreading are determined when the MySQL service is started. Can be set in MY.CNF:

[Mysqld]

Default-character-set=utf8

Or specify in the startup options:

Mysqld--default-character-set=utf8

Or, at compile time, specify:

./configure--with-charset=utf8

If you do not specify a particular server character set, the default is to use Latin1 as the server character set. The three settings above specify only the character set, no proofing rules are specified, this is the default collation of the character set, and if you want to use a non-default proofing rule for that character set, you need to specify the proofing rules at the same time as the specified character set.

You can use show variables like ' character_set_server '; the command queries the current server's character set and proofing rules.

This article from "Ops said: from rookie to veteran" blog, please be sure to keep this source http://liuqunying.blog.51cto.com/3984207/1560964

MySQL Practice class Character set, data type and storage engine selection

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More