MySQL High Performance table design specification

Source: Internet
Author: User
Tags uuid

Good logical and physical design is the cornerstone of high performance, and the schema should be designed according to the query statement that the system will execute, which often requires a tradeoff between various factors.

First, select the optimized data type

MySQL supports a very high number of data types, and choosing the right data type is critical to gaining performance.

smaller is usually better

Smaller data types are typically faster because they consume less disk, memory, and CPU cache, and require less CPU cycles to process.

it's simple.

Operations of simple data types typically require less CPU cycles. For example, an integer is less expensive than a word Fu Cao because character sets and collation rules (collations) make it more complex to compare characters than integral types.

try to avoid null

If the query contains nullable columns, it is more difficult for MySQL to optimize because nullable columns make indexing, index statistics, and value comparisons more complex. Nullable columns use more storage space, and special handling is required in MySQL. When a nullable column is indexed, an extra byte is required for each index record, and in MyISAM it can even cause a fixed-size index (for example, an index of only one integer column) to become a variable-size index.


There are, of course, exceptions, such as InnoDB, which uses a separate bit to store null values, so it has good spatial efficiency for sparse data.

1. Integer type

There are two types of numbers: integers (whole number) and real numbers. If you store integers, you can use these kinds of integer types: TINYINT, SMALLINT, Mediumint, INT, BIGINT. Use 8,16, 24, 32, 64-bit storage space, respectively.

An integer type has an optional **unsigned * property, which means that negative values are not allowed, which can roughly increase the upper limit of a positive number by one more. such as TINYINT. UNSIGNED can store a range of 0–255, while the TINYINT storage range is-128-127.

Signed and unsigned types use the same storage space and have the same performance, so you can choose the right type based on the actual situation.

Your choice determines how MySQL saves data in memory and on disk. However, integer calculations generally use a 64-bit BIGINT integer, even in a 32-bit environment. (Some aggregate functions are exceptions, and they are computed using a decimal or DOUBLE).

MySQL can specify widths for integer types, such as INT (11), which makes no sense for most applications: it does not limit the legal scope of a value, but only specifies how many of the MySQL interactive tools (such as the MySQL command-line client) are used to display the number of characters. For storage and Computation, int (1) and int (20) are the same.

2. Real type

A real number is one with a fractional part. However, they are not just for storing fractional parts, but also for using decimal to store integers larger than BIGINT.

The float and double types support approximate calculations using standard floating-point operations.


The decimal type is used to store exact decimals.


Both floating-point and decimal types can specify precision. For the decimal column, you can specify the maximum number of digits allowed before and after the decimal point. This affects the space consumption of the column.

There are several ways to specify the precision required for floating-point columns, which allows MySQL to choose a different data type, or to trade-offs the values when stored. These precision definitions are non-standard, so we recommend specifying only the data type and not specifying the precision.

Floating-point types typically use less space than decimal when they store values of the same range. Float uses 4 bytes of storage. Double takes 8 bytes, which has a higher precision and a larger range than float. As with the integer type, only the storage type can be selected; MySQL uses a double as the type of the internal floating-point calculation.

Because additional space and computational overhead are required, you should try to use decimal only when you are making accurate calculations of decimals. However, when the data is the most large, consider using bigint instead of decimal, the unit of currency that needs to be stored is multiplied by the number of decimal places.

3. String TypeVARCHAR

Used to store variable string, length support to 65535
Need to use 1 or 2 extra bytes to record the length of a string
Fit: The maximum degree of a string is more than average?

CHAR

Fixed length, range is 1~255
Fit: Stores very short strings, or all values close to the same length; change frequently

generosity is unwise.

The space overhead for storing ' hello ' with varchar (5) and varchar (200) is the same. What are the advantages of using shorter columns?

It turns out to be a big advantage. Longer columns consume more memory because MySQL typically allocates a fixed-size block of memory to hold internal values. This is especially bad when you use a temporary table of memory to sort or manipulate. It's just as bad when using disk staging tables for sorting.

So the best strategy is to allocate only the space you really need.

4.BLOB and text types

Both blob and text are string data types that are designed to store large data and are stored in binary and character mode, respectively.

Unlike other types, MySQL treats each blob and text value as a separate object. The storage engine typically does special processing when it is stored. When the blob and text values are too large, InnoDB uses a dedicated "external" storage area for storage, where each value requires a 1–4 byte storage area to store the actual value in the row.

The only difference between blob and text is that the BLOB type stores binary data, there is no collation or character set, and the TEXT type has character sets and collations

5. Date and Time type

There is no substitute for most time types, so there is no problem with the best choice. The only question is what to do when you save the date and time. MySQL offers two similar date types: Date Time and TIMESTAMP.

But for now we are more suggesting how to store timestamps, so there is no longer much explanation for date time and timestamp.

5. Other types5.1 Select identifiers

The minimum data type should be selected in the case where the demand for a range of values can be met and the future growth space is reserved.

Integer type

Integers are usually the best choice for identifying columns because they are fast and can use Auto_increment.

Enum and set types

Emum and set types are often a bad choice for identity columns, although it may not be a problem for some static "definition tables" that contain only fixed states or types. Enum and set columns are suitable for storing fixed information, such as ordered state, product type, and gender.

String type

If possible, you should avoid using string types as identity columns because they are very space-consuming and often slower than numeric types.

There is also a need to pay attention to a completely "random" string, such as a string produced by MDS (), Shal (), or UUID (). The new values generated by these functions are arbitrarily distributed in a large amount of space, which causes the INSERT and some SELECT statements to become slow. If you store UUID values, you should remove the "-" symbol.

5.2 Special types of data

Some types of data wells are not directly consistent with built-in types. The timestamp of low-nanosecond precision is an example, and another example is a 1pv4 address, where a varchar (15) column is often used to store IP addresses, however, they are actually 32-bit unsigned integers, not strings. The representation of dividing an address into four segments with a decimal point is just to make it easier for people to read. Therefore, the IP address should be stored with an unsigned integer. MySQL provides the Inet_aton () and Inet_ntoa () functions to convert between the two presentation methods.

Second, table structure design

1. Paradigm and Inverse paradigm

There are often many ways to represent any given data, from full normalization to full inverse normalization, and a compromise between the two. In a normalized database, each fact data appears and appears only once. Conversely, in a reverse-normalized database, information is redundant and may be stored in multiple places.

advantages and disadvantages of the paradigm

When considering performance improvements, it is often recommended to have a paradigm design of the schema, especially when writing dense scenes.

    • Normalized update operations are usually faster than inverse normalization.

    • When data is well-normalized, there is little or no duplication of data, so only less data needs to be modified.

    • A normalized table is usually smaller and better placed in memory, so it is faster to perform the operation.

    • Little extra data means less need for DISTINCT or GROUP by statements when retrieving list data.

advantages and disadvantages of inverse paradigm

The worst-case scenario for most queries-even if the table is not using an index-is a full table scan, without the need for an associated table. This can be much faster than association when the data is larger than memory, because it avoids random i/0.

A separate table can also use a more efficient indexing strategy.

the normalization and inverse normalization of mixed use

It is often necessary to mix in real-world applications, possibly using partially normalized schemas, cache tables, and other techniques.


Table increases the number of redundant fields appropriately, such as performance first, but adds complexity. You can avoid table association queries.

Simple familiarity with database paradigms

The first paradigm (1NF): Field values are atomic and can no longer be divided (all relational database systems satisfy the first normal form );

For example: Last Name field , where the last name and first name are a whole , if you distinguish the first and last name then you must set up two separate fields ;

Second Normal (2NF): A table must have a primary key , that is, each row of data can be uniquely differentiated ;

Note: First paradigm must be satisfied first ;

Third Paradigm (3NF): A table cannot contain information about non-critical fields in other related tables , i.e. data tables cannot have redundant fields ;

Note: The second paradigm must be satisfied first ;

2. table field Few but good

      • I/O efficient

      • Easy to maintain field separation

      • Single table 1 g volume 500W row evaluation

      • Single line not exceeding 200Byte

      • No more than 50 int fields in a single table

      • No more than 20 char (10) fields in a single table

      • It is recommended that the number of fields in a table be controlled within 20

      • Splitting Text/blob,text type processing performance is much lower than varchar, forcing the generation of hard disk temporary tables to waste more space.

< ext.: Http://mp.weixin.qq.com/s/ES2qdpD_Gzx-5mDIjHUpAA >

MySQL High Performance table design specification

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.