MYSQL Optimization Guide

Source: Internet
Author: User
Tags add time one table

Database design principles standardized and normalized database design paradigm (3NF) First paradigm

Data attribute unique indication

In any relational database, the first paradigm (1NF) is the basic requirement for relational schemas, and a database that does not meet the first normal form (1NF) is not a relational database.
The so-called First paradigm (1NF) refers to the fact that each column of a database table is an indivisible basic data item and cannot have multiple values in the same column, that is, an attribute in an entity cannot have multiple values or cannot have duplicate properties. If duplicate attributes are present, you may need to define a new entity, which is composed of duplicate attributes, and a one-to-many relationship between the new entity and the original entity. In the first normal form (1NF), each row of a table contains only one instance of information. For example, for the Employee Information table in Figure 3-2, the employee information cannot be displayed in one column, and two or more of these columns cannot be displayed in one column; Each row in the Employee Information table represents only one employee's information, and the information for an employee appears only once in the table. In short, the first paradigm is a column with no duplicates.

Second Paradigm

Unique indication of line information
The second paradigm (2NF) is established on the basis of the first paradigm (1NF), i.e. satisfying the second normal form (2NF) must first satisfy the first paradigm (1NF). The second normal form (2NF) requires that each instance or row in a database table must be divided by a unique region. For the implementation of the distinction, it is common to add a column to the table to store unique identities for each instance.
The employee information sheet is added to the worker number (emp_id) column because each employee's employee number is unique, so each employee can be uniquely differentiated. This unique attribute column is called the primary key or primary key, and the main code.
The second normal form (2NF) requires that the attributes of an entity depend entirely on the primary key. The so-called full dependency is the inability to have a property that depends only on the primary key, and if so, this part of the property and the primary key should be separated to form a new entity, and the new entity is a one-to-many relationship with the original entity. For the implementation of the distinction, it is common to add a column to the table to store unique identities for each instance. In short, the second paradigm is that a non-principal attribute is dependent on the primary key.

Third Paradigm

Unique storage of information materials
Satisfying the third normal form (3NF) must first satisfy the second normal form (2NF). In short, the third paradigm (3NF) requires that a database table not contain non-primary key information already contained in other tables. For example, there is a departmental information table, where each department has a department number (dept_id), a department name, a department profile, and so on. Then in the Employee Information table in Figure 3-2, the department number can no longer be the department name, department profile and other departments related information to join the Employee Information table. If there is no departmental information table, it should be built according to the third paradigm (3NF), otherwise there will be a lot of data redundancy. In short, the third paradigm is that properties do not depend on other non-principal properties.
Conditions that satisfy the third paradigm:
If the relationship R exists non-trivial fd a1a2a3 ... An->b, and either left {a1a2a3 ... An} is a super-key, or the right side of B belongs to a key, it is considered that the relationship R belongs to the third normal (3NF).

Inverse Paradigm Design

Database design to strictly abide by the paradigm, the design of the database, although the idea is clear, the structure is very reasonable, but, sometimes, but to a certain extent to break the paradigm design.
There is no contradiction here, because the higher the paradigm, the more tables may be designed, the more complex the relationship may be, but the performance is not necessarily very good, because more than one table, it increases the relevance. This is evident.
The most obvious way to break the paradigm is the redundancy method, in which space is exchanged for time, the data is redundant in multiple tables, and when the query can reduce or avoid the association between the tables.

Data-driven

With data-driven rather than hard-coded methods, many policy changes and maintenance can be much more convenient, greatly enhancing the flexibility and scalability of the system.
For example, if the user interface is to access external data sources (files, XML documents, other databases, etc.), the corresponding connection and path information may be stored in the User interface support table. Also, if the user interface performs tasks such as workflow (sending mail, printing letterhead, modifying record status, etc.), the data that generates the workflow can also be stored in the database. Role Rights management can also be done through data driven. In fact, if the process is data-driven, you can put considerable responsibility on the user to maintain their workflow process.

Basic information to consider various changes and record data

When designing a database, consider which data fields may change in the future.
Example data add time, update time user's 注册ip and 登录ip so on

Database build database table name
    1. Table name should be descriptive, eliminate all pinyin or pinyin English mixed naming method
    2. Table names run with letters, numbers, and underscores, and no other characters are allowed. Table names begin with a word, do not run with numbers and underscores
    3. Table names are uniformly prefixed with the underscore link between the prefix table names. You can use a prefix to have multiple installations of the same project in one library.
    4. Table list words are lowercase, use underline links between words
    5. Table name cannot be longer than 64 characters
    6. All data table names, as long as the name is a number of words, it is recommended to be named plural, for example: xs_users (user table)
    7. Table name to avoid MySQL reserved word
database table field Name
    1. Field names should be descriptive, eliminate all pinyin or pinyin in English mixed naming method
    2. Field names allow letters, numbers, and underscores, and no other characters are allowed. Field names are encouraged to start with words related to the contents of the table, allowing but not encouraging the use of numbers and other characters.
    3. Field names are lowercase, and the words are underlined.
    4. Field name cannot be longer than 64 characters
    5. Character types and lengths must be consistent across data tables, not allowing the same field to be integer in one table, but in another table as a character type.
    6. When the fields between several tables are connected, be aware that the name of the connected field between the table and the table is uniform, such as the UID in the Xs_orders table and the UID in the Xs_carts table, and the ID in the Xs_users table is saved.
    7. Fields that store multiple items, or fields that represent numbers, should also be in the plural, such as views
    8. Each table suggests a field that represents the increment of the ID, either in the form of a full name or just named as an ID
Field index name
    1. Index names allow letters, numbers, and underscores, and do not allow other characters to be used
    2. Non-group index for any foreign key
    3. Do not index fields of type Text/blob, do not index too many characters
    4. Build a composite index based on business requirements
    5. Index length cannot exceed 64 characters
    6. Tables with frequent data manipulation, do not build too many indexes
Field structure

When the table structure design, should be done just right, repeated deliberation, so as to achieve the best data storage System 短小精悍

    1. A null-valued field that, when the database is being compared, first determines whether it is null, or not NULL, before the value is compared. Therefore, based on efficiency considerations, all fields cannot be empty, that is, all properties that use NOT null are decorated with fields;
    2. If you do not use a field that stores non-negative numbers, you must set it to the unsigned type to get a value storage space that is larger than the range
    3. Any type of data table, the field space should be based on the principle of sufficient, not wasteful
    4. Individual field types should be noted when designing data structures: Enum enum type is replaced by the tinyint type
    5. A data table containing any varchar, text, or variable-length fields, i.e. a variable-length table, or a fixed-length table. If you can use fixed length data type when designing table structure, use fixed length as far as possible, because the query, retrieve, update speed of fixed-length table is very fast. If necessary, you can split some critical, frequently accessed tables, such as fixed-length data, a table, and a table of non-fixed-length data.
    6. A smaller field type is always much faster to process than a larger field type. For character types, the processing time is directly related to the length of the string. In general, smaller table processing is faster. For a fixed-length table, you should choose a smaller type, as long as you can save enough space. A value of type text records the length of the value with 2 bytes, while a Longtext records the length of its value with 4 bytes. If the stored value does not exceed 64kb in length,
    7. Numerical operations are generally faster than word Fu Yun strings, such as comparison operations, which can be compared in the logarithm of a single operation. And the string operation design a few progressive byte comparison, if wear longer, this comparison to more. If the numeric number of string column is limited, the superiority of numerical operation should be obtained by using the ordinary integer type.
SQL Optimization Optimization goal

Reduce the number of Io, IO is always the most vulnerable to the database, which is determined by the responsibility of the database, most of the database operations more than 90% of the time is the use of IO operations, reduce the number of IO is the first priority in SQL optimization, of course, is the most obvious optimization means.
To reduce CPU computing, in addition to the IO bottleneck, the SQL optimization needs to be considered in the optimization of CPU operation. Order BY, group By,distinct ... Are all CPU-intensive (these operations are basically CPU-processed in-memory data comparison operations). When our IO optimizations are in a certain phase, reducing CPU computing becomes an important goal of our SQL optimization

Common misconceptions count (1) and COUNT (Primary_key) are better than count (*)

Many people use COUNT (1) and COUNT (Primary_key) instead of Count () in order to count the number of records, andthey think this is a better performance, in fact this is a myth. For some scenarios, this is more likely to be possible, andsome special optimizations should be made for the count () count operation for the database.

Count (column) and COUNT (*) are the same

This myth is common even among many senior engineers or DBAs, and many people will take it for granted. In fact, Count (column) and COUNT () are a completely different operation and represent a completely different meaning.
Count (column) is a record that indicates how many column fields in the result set are not empty
Count (
) is the number of records that represent the entire result set

Select a, b from ... than select A,b,c from ... Allows database access to a smaller amount of data

This misconception exists mainly in a large number of developers, the main reason is that the database storage principle is not too understanding.
In fact, most relational databases are stored as rows (row), and data access operations are in a fixed-size IO unit (called block or page), typically 4kb,8kb ... Most of the time, multiple rows are stored in each IO unit, and each row is all the fields that store the row (except for special types of fields such as lobs).
So, whether we are taking a field or multiple fields, the amount of data that the database needs to access in the table is actually the same.
Of course, there are exceptions, that is, our query can be done in the index, that is, when only a A, a, a, two fields, do not need to return to the table, and C This field is not used in the index, you need to return to the table to get its data. In such cases, the IO volume of the two will be significantly different.

Order by must have a sort operation

We know that the index data is actually orderly, and if the data we need is consistent with the order of an index, and our query is executed by this index, then the database will generally omit the sort operation and return the data directly, because the database knows that the data has already met our sorting requirements.
In fact, using indexes to optimize SQL with ordered requirements is a very important optimization method.
Extended reading: http://blog.csdn.net/zzxian/article/details/7927810

Basic principles minimize the use of foreign key associations

Database design, account number, permissions, constraints, triggers, are designed for the C/s structure, is based on the non-trusted as a premise. The b/S mode security boundary moves forward to the Web service layer, which is trusted between the application and the database, and is more flexible for the application to do by itself.

Join as little as possible

The advantage of MySQL is simplicity, but it's also a disadvantage in some ways. The MySQL optimizer is efficient, but because of its limited amount of statistical information, the optimizer is more likely to deviate from the work process. For a complex multi-table Join, on the one hand due to its optimizer constraints, and also in the Join this aspect of the effort is not enough, so performance from the Oracle and other relational database predecessors still have a certain distance. But if it is a simple single-table query, this gap will be very small even in some scenarios to better than these database predecessors.

Try to avoid select *

Many people find it difficult to understand this point, above is not in the misunderstanding just said that the number of fields in the SELECT clause does not affect the read data? Yes, most of the time it does not affect the IO volume, but when we have an order by operation, the number of fields in the SELECT clause will largely affect our sorting efficiency. In addition, the above error is not also said, but most of the time will not affect the IO volume, when our query results only need to be found in the index, it will greatly reduce the amount of IO.

Try to use join instead of subquery

While Join performance is poor, there is a significant performance advantage over MySQL subqueries. MySQL's sub-query execution plan has been a big problem, although this problem has existed for many years, but has been released by all the stable version of the widespread, has not been much improved. While the authorities have recognized this issue early and pledged to resolve it as soon as possible, at least we have not yet seen which version of the issue has been better solved.
MYSQL 5.6 has optimized sub-query http://www.linuxidc.com/Linux/2012-08/67606.htm

Try to be less or

When there are multiple conditions in the WHERE clause to "or" coexist, the MySQL optimizer does not have a good solution to its execution plan optimization problem, coupled with MySQL-specific SQL and Storage layered architecture, resulting in poor performance, often using union ALL or U Nion (when necessary) in lieu of "or" will have a better effect.

Try to use UNION all instead of union

The difference between Union and union all is that the former needs to merge two (or more) result sets and then perform a unique filtering operation, which involves sorting, adding a lot of CPU operations, and increasing resource consumption and latency. So when we can confirm that it is not possible to duplicate a result set or do not care about repeating the result set, try to use union all instead of union.
Extended read: http://jingyan.baidu.com/article/2d5afd69e8dfd285a3e28e66.html

Filter as early as possible

This optimization strategy is most commonly found in the optimal design of the index (better filter fields are put forward).
This principle can also be used in SQL authoring to optimize some of the Join's SQL. For example, when we are querying multiple tables for paging data, we'd better be able to filter the good data on a single table, and then join with another table with the result set of the page, so as much as possible to reduce unnecessary IO operations, greatly saving the time spent in IO operations.

Avoid type conversions

The "type conversion" here refers to the type conversion that occurs when the type of the column field in the WHERE clause is inconsistent with the passed parameter type:
The conversion of a human being on a column_name directly results in MySQL (which in fact has the same problem with other databases) cannot use the index, and if it is not, it should be converted on the parameters passed in by the database itself
If our incoming data types and field types are inconsistent, and we do not have any type conversion processing, MySQL may either make a type conversion operation on our data or leave it to the storage engine to process it, which will cause the execution plan problem if the index is not available.

Prioritize high-concurrency SQL rather than low-frequency execution some "big" sql

For the destructive, high concurrency SQL will always be larger than the low frequency, because the high concurrency of SQL once there is a problem, not even give us any respite to the system will be compressed. And for some, although the need to consume a lot of IO and slow response to SQL, because of the low frequency, even if encountered, the most is to let the whole system to respond slowly, but at least for a while, let us have the opportunity to buffer.

Optimize from a global perspective, rather than one-sided adjustment

SQL optimization cannot be done separately for one, but should take full account of all SQL in the system, especially when optimizing SQL's execution plan by tuning the index, it must not be forgotten how, pound foolish.

Explain every SQL that runs in the database whenever possible

To optimize SQL, you need to be aware of the SQL execution plan to determine if there is room for optimization to determine if there is an execution plan problem. After a period of optimization of the SQL running in the database, it is obvious that SQL may have been scarce, and most of them need to be explored, when a large number of explain operations are needed to collect the execution plan and determine whether optimization is needed.

Featured MySQL queries, subqueries, and connection queries

Http://www.cnblogs.com/rollenholt/archive/2012/05/15/2502551.html

MySQL Big Data volume preliminary optimization scheme:

MySQL only does simple things, tens tables, no matter how optimized, the same SQL does not have 100,000 levels of table access fast.
If you are designing large tables, ask yourself a few questions:
1. Database Sub-Library 摘除数据表之间的关联
1. Horizontal sub-table/mysql partition
1. Vertical Split

MYSQL Optimization Guide

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.