MySQL Schema design optimization

Last Update:2015-07-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Good logic design and physical design is the cornerstone of high performance, should be based on the system to execute the query statement to design the schema. This often requires a balance of factors.

For example, anti-paradigm design can speed up certain types of queries, but it may also slow down other types of queries. For example, adding a count table and a summary table is a good way to refine the query.

However, the maintenance costs of these tables can be high. MySQL's unique features and implementation details have a significant impact on performance.

Simple principles for selecting an optimized data type:

1. Smaller usually better

In general, you should try to use the smallest data type that correctly stores your data.

2. Simple is good

Operations of simple data types typically require less CPU cycles.

3. Avoid null as much as possible

If the query contains nullable columns, it is more difficult for MySQL to optimize because nullable columns make indexes, index statistics, and value comparisons more complex.

Nullable columns use more storage space, and special handling is required in MySQL.

The first step in schema design is to build a table, and the first step in building a table is to design the field, but understanding the MySQL data type can also help with the design field.

Let's start by understanding MySQL's basic data type.

1. Integer type

TINYINT, Smallint,mediumint,int,bigint

8,16,24,32,64, bit storage space.

MySQL can set a width for the certificate type, such as int (11), but this is meaningless for most applications; it does not limit the value's legal scope. For storage and compute

Int (1) and int (20) are the same

2. Real type

Float,double,decimal (for precise calculation)

3. String type

Varchare

The most common type of string. It is more space-saving than a fixed-length type because it only applies the necessary space. There is one case exception if the MySQL table is created using row_format=fixed.

varchar requires a length of 1 or 2 extra bytes to record the string. If the maximum length of a column is less than or equal to 255, a byte is used to record the length.

CHAR

is a fixed length. MySQL always allocates enough space based on the defined string. When you store a char value, MySQL deletes all trailing spaces.

Char values are padded as needed to facilitate comparisons.

Char is suitable for storing very short strings. Char is also better than varchar for frequently changed data because the fixed-length char type is not prone to fragmentation.

Blob and text

Both blob and text are string data types that are designed to store large data and are stored in binary and character mode, respectively.

Using enumerations (enums) instead of string types

4. Date and Time type

Datetime

This type can hold a wide range of values. From 1001 to 9999, the precision was wonderful.

TIMESTAMP

Storage space is small, and depending on the time zone changes, there is a special automatic update capability. Precision is wonderful.

If you want to store time that is smaller than the granularity of seconds, you can store a microsecond-level timestamp with the bigint type.

5. Bit data type

BIT

Selection of identifiers:

Integer type

Integers are usually the best choice for identifying columns. Because they are quick and can use Auto_increment

Enum and set types

String type

If possible, avoid using string types as identifiers, because they are very space-consuming and often slower than numeric types.

Especially be careful when using strings as identity columns in the MyISAM table. MyISAM uses a compressed index on a string by default, which makes the query slow.

For a completely ' random ' string, you also need to be more careful, such as MD5 (), SHA1 (), or UUID () to produce a string.

The new values generated by these functions are arbitrarily distributed in a large amount of space, which causes the INSERT and some SELECT statements to become slow:

1. Because the insertion value is randomly written to a different location in the index, it makes the INSERT statement slower. This causes page splitting, random disk access, and clustered index fragmentation for the clustered storage engine.

2.select statements can become slower because logically neighboring behaviors are distributed in different places on disk and memory.

3. Random values cause the cache to be poorly performing for all types of query statements.

Pitfalls in Schema design

1. Too many columns

2. Too many associations

3. The Almighty Enumeration

Paradigm and Inverse paradigm

Three main paradigms:

1. must have primary key, column is not divided　　

2. When a table is a composite primary key, the non-primary key field does not depend on the partial primary key (that is, it must depend on all primary key fields)

3. All non-primary attributes in relational mode R (U,F) do not have transitive dependencies on any candidate keywords

The inverse paradigm, as the name implies, does not follow the three paradigms.

First, let's look at the pros and cons of paradigms:

Advantages:

Normalized update operations are usually faster than inverse paradigms.

When data is well-normalized, there is little or no duplication of data, so you only need to modify less data

The normalized table is usually small, and can be better placed in memory, so the operation will be faster.

There is little extra data that means fewer distinct or group by statements are needed to retrieve list data.

Disadvantages:

Overly dependent associations. This is expensive and may invalidate some index policies.

Inverse paradigm Advantages and disadvantages:

Because all the data in a table, can be very good to avoid association.

If the association is not required, the worst case for most queries, even if the index is not used, is a full table scan, which may be much faster than the association when the data is larger than memory.

Because this avoids the random I/O.

A separate table makes better use of the indexing strategy.

　Mixed paradigm and inverse paradigm are the mainstream of the schema.

Cache tables and Summary tables

Easy to query, difficult to maintain.

Materialized view

Counter table

MySQL Schema design optimization

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MySQL Schema design optimization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

MySQL Schema design optimization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support