MySQL database 3 "Optimization 1" Table optimization

Source: Internet
Author: User
Tags one table

a and the optimization of table structure

1 , standardization  

Normalization is the process of organizing data in a database. This includes creating tables based on design rules and establishing relationships between them, and by eliminating redundancy and inconsistent correlation, the design rules can simultaneously protect data and improve the flexibility of the data. Usually database standardization is to make the database design conform to a certain level of paradigm, usually satisfies the third paradigm. There are also fourth paradigms (also known as Boyce Codd Paradigms, BCNF) and the fifth paradigm exist, but are seldom considered in practical design. Ignoring these rules may make the database design less than perfect, but this should not affect functionality.
The characteristics of standardization:

1) All "objects" are in its own table, with no redundancy.
2) The database is usually generated by the E-r diagram.
3) Concise, updated properties usually require only a few records to be updated.
4) Join operation is time consuming.
5) Select,sort optimization measures are relatively small.
6) suitable for OLTP applications.

Non-standardized features:

1) Store a lot of data in a single table, data redundancy.
2) Update data overhead is large, updating a property may update many tables, many records.
3) It is possible to lose data when deleting data.
4) Select,order has a lot of optimization options.
5) suitable for DSS applications.

Both standardization and non-standardization have their own advantages and disadvantages, usually in one database design can be mixed, some tables standardized, some tables retain some redundant data:

1) Standardization of OLTP usage, non-standardized use of DSS
2) Use materialized views. MySQL does not directly support this database feature, but it can be replaced with a MyISAM table.
3) Redundant some data in the table, for example, the ref_id and name exist in the same table. However, be aware of the update issue.
4) for some simple objects, use value directly as the building. such as IP address, etc.

  2. Moderate anti-paradigm , attention is moderate.

We all know that the three-paradigm model, based on three-paradigm, is the most efficient way to save data and the easiest mode to expand. When we develop an application, the database is designed to adhere to the three paradigms to the maximum extent, especially for OLTP-based systems, which are rules that must be adhered to. Of course, the biggest problem with the three paradigms is that queries often require Join many tables, resulting in very low query efficiency. so sometimes based on performance considerations, we need to deliberately violate the three paradigms, moderate redundancy, in order to achieve the purpose of improving query efficiency. Note that the inverse paradigm here is modest and must provide sufficient justification for this practice.

3 , appropriate indexing

Speaking of improving database performance, the index is the most inexpensive thing. Do not add memory, do not change the program, do not have to tune SQL, as long as the correct ' create index ', the query speed may increase hundreds of thousands of times, this can be really tempting. But there is no free lunch, the increase in query speed is at the cost of inserting, updating, deleting, these write operations, increase the number of I/O. Because the storage structure of the index differs from the storage of the table, the index of one table often takes up more space than the space occupied by the data. This means that we have done a lot of extra work when we write the database, and this job is just to improve the efficiency of reading. Therefore, we create an index that must ensure that the index is not "at a loss".

All MySQL column types can be indexed. Using indexes on related columns is the best way to improve the performance of select operations. The following points should be noted when using indexes:

1) MySQL will only use prefixes, such as key (A, B) ... where b=5 will not be used in the index.
2) Use the index to select the sex. It is not very good to use indexes on columns that vary very little, such as the sex column.
3) Define unique index on the unique column.
4) Avoid creating indexes that are not used.
5) in Btree index (InnoDB using btree), you can index on the columns that need to be sorted.
6) Avoid duplicate indexes.
7) Avoid indexing on an existing index prefix. For example, if Index (a, B) is present, remove index (a).
8) controls the length of a single index. Use Key (name (8)) to index several characters in front of the data.
9) The shorter the key value the better, the best use of integer.
10) In the query to use the index (using explain viewing), you can reduce the number of read disk, speed reading data.
11) Similar key values are better than random. Auto_increment is better than a UUID.
Optimize table can compress and sort index, and be careful not to run frequently.
Analyze table can update data.

4 , select the appropriate data type

One of the most basic optimizations is to make the table occupy as little space as possible on disk. This can result in a very high performance boost, because the data is small, the disk is read faster, and the table content is processed in less memory during the query. At the same time, indexing on smaller columns also takes up less resources. You can use the following techniques to make the table more performance and minimize storage space:

1) use the appropriate type, and do not store numbers as strings.
2) use the most efficient (minimal) data type possible. MySQL has a lot of specialized types that save disk space and memory.
3) use smaller integer types as much as possible to make the table smaller. For example, Mediumint is often better than int because the Mediumint column uses 25% less space.
4) if possible, the declaration is listed as not NULL. It makes anything faster and each column can save one bit. Note If you do need null in your application, you should definitely use it, just to avoid having it on all columns by default.
5) for MyISAM tables, if there are no variable-length columns (VARCHAR, text, or BLOB columns), use a fixed-size record format. It's faster but unfortunately it may be a waste of space. Even if you've already used the Create option to row_format=fixed the varchar column, you can also prompt for a fixed-length row.
6) uses sample characterset, such as Latin1. Use as little utf-8 as possible, because the utf-8 occupies 3 times times more space than latin1. You can use latin1, such as Mail,url, on fields that do not need to use utf-8.  

    1. Numeric type: Never use double as a last resort, it's not just a matter of storage length, but also an issue of accuracy. Similarly, decimal is not recommended for fixed-precision decimals, and it is recommended to convert to integer storage by multiplying the fixed multiples, which saves storage space without any additional maintenance costs. For the storage of integers, in the case of large amount of data, it is recommended to distinguish the choice of open tinyint/int/bigint, because the storage space occupied by the three is also very different, can determine the fields that do not use negative numbers, it is recommended to add unsigned definition. Of course, if you have a database with a small amount of data, you can also not strictly distinguish between three integer types.
    2. Character type: Not the last resort to use the TEXT data type, which determines whether his performance is less than char or varchar type processing. Fixed Length field, it is recommended to use CHAR type, indefinite long field to use VARCHAR as far as possible, and only set the appropriate maximum length, rather than very random to a very large maximum length limit, because of different length range, MySQL will have the same storage processing.
    3. Time Type: Use the timestamp type as much as possible because the storage space requires only half of the DATETIME type. For data types that only need to be accurate to one day, it is recommended to use the date type because his storage space requires only 3 bytes, less than timestamp. It is not recommended to store the value of a UNIX timestamp through the int type class because it is too intuitive to cause unnecessary inconvenience to maintenance and does not bring any benefit.
    4. Enum & SET: For the Status field, you can try to use an ENUM to store it, because you can greatly reduce the storage space, and even if you need to add a new type, as long as it is added at the end, modifying the structure does not require rebuilding the table data. What about storing pre-defined attribute data? You can try out the set type, even if there are multiple properties, and you can also save a lot of storage space.
    5. LOB type: Strongly against storing LOB type data in the database, although the database provides such functionality, but this is not what he is good at, we should let the right tools do what he is good at, in order to reach the extreme. Storing LOB data in a database is like having a marketing professional who learned a little bit of Java in school a few years ago to write Java code.

5 , always set an ID for each table

We should set an ID for each table in the database as its primary key, and the best is an int type (recommended to use unsigned), and set the automatically added Auto_increment flag.
Even if you have a field in the users table that has a primary key called "email", you don't have to make it a primary key. Use the varchar type to degrade performance when the primary key is used. In addition, in your program, you should use the ID of the table to construct your data structure. Also, under the MySQL data engine, there are some operations that need to use primary keys, in which case the performance and settings of the primary key become very important, such as clustering, partitioning ...

6 , use not NULL whenever possible

Unless you have a very special reason to use null values, you should always keep your fields not NULL. Do not assume that NULL does not require space, that it requires extra space, and that your program will be more complex when you compare it. Of course, this is not to say that you cannot use NULL, the reality is very complex, there will still be cases where you need to use a null value.

7 , fixed-length tables are faster

If all the fields in the table are fixed length, the entire table is considered "static" or "Fixed-length". For example, there are no fields of the following type in the table: Varchar,text,blob. As long as you include one of these fields, the table is not a fixed-length static table, so the MySQL engine will handle it in a different way.
Fixed-length tables can improve performance because MySQL searches faster because these fixed lengths are easy to calculate the offset of the next data, so the nature of reading will be fast. And if the field is not fixed, then every time you want to find the next one, you need the program to find the primary key. Also, fixed-length tables are more likely to be cached and rebuilt. However, the only side effect is that a fixed-length field wastes some space, because the field is set to allocate so much space whether you use it or not. Using the "vertical split" technique, you can split your table into two that are fixed-length and one that is indefinite.

8 , choose the right engine

MySQL offers many kinds of engines, and the most we use is the myisam,innodb,memory three categories. The Official handbook says myisqm reads faster than InnoDB, about 3 times times faster. However, the book can not be trusted ah, "OreIlly.High.Performance.Mysql" This book refers to the MyISAM and InnoDB comparison, in the test MyISAM performance is less than InnoDB. As for memory, haha, it is more useful. It is a good choice to make a temporary table in a batch (if the memory is large enough). In one of my batches, the speed ratio was almost 1:10.

9 , files, pictures and other large files with file system storage, without database

Needless to say, the Iron law!!! The database only stores paths.

Ten , using enum instead of varchar

The enum type is very fast and compact. In fact, it holds the tinyint, but it appears as a string on its appearance. In this way, using this field to make a list of options becomes quite perfect. If you have a field such as "gender", "Country", "nation", "state" or "department", you know that the values of these fields are limited and fixed, then you should use Enum instead of varchar.
MySQL also has a "suggestion" that tells you how to reorganize your table structure. When you have a varchar field, this suggestion will tell you to change it to an enum type. With procedure analyse () you can get advice.

One , storing the IP address as an unsigned integer
  
Many programmers do not realize that they can store IP addresses as integers when they create a varchar (15). When you have an int type, you only occupy 4 bytes of space, which is a fixed-size field. And, this will bring you the advantage of querying, especially when you need to use such a where condition: IP between Ip1 and IP2. We must use unsigned INT because the IP address uses an entire 32-bit unsigned shaping. Instead of your query, you can use Inet_aton () to turn a string IP into a shape, and use Inet_ntoa () to turn an integer into a string IP. In PHP, there are also functions such as Ip2long () and Long2ip ().

A , Prepared statements

Prepared statements is much like a stored procedure, a collection of SQL statements running in the background, and we can derive many benefits from using Prepared statements, whether it's a performance issue or a security issue.
Prepared statements can check some of the variables you've bound so that you can protect your program from "SQL injection" attacks. In terms of performance, this gives you a considerable performance advantage when the same query is used multiple times. You can define some parameters for these prepared statements, and MySQL will parse only once.
While the latest version of MySQL in the transmission prepared statements is using the binary situation, this makes the network transfer very efficient.
Of course, there are some cases where we need to avoid using prepared statements because it does not support query caching. But it is said that after version 5.1 was supported. To use preparedstatements in PHP, you can view its user manual: Mysqli extension or using the database abstraction layer, such as PDO.

Second, the split of the table

1 , horizontal division of the table

If a table has too many records, such as thousands, and needs to be retrieved frequently, then we need to piecemeal. If I break into 100 tables, then there are only 100,000 records for each table. Of course this requires data to be logically divided. A good division basis, in favor of the simple implementation of the program, can also make full use of the advantages of the horizontal table. For example, the system interface only provides monthly query function, then the table by month split into 12, each query query only one table is enough. If you want to divide by region, even if the table is smaller, the query or to unite all the tables to check, it is better not to dismantle. So a good basis for splitting is the most important.

2 , vertically dividing the table

Some table records are not many, may also be 2, 30,000, but the field is very long, the table occupies a large amount of space, the table needs to perform a large number of I/O, greatly reducing performance. At this point, you need to split the large field into another table, and the table is a one-to-one relationship with the original table.

three , table partitioning

1. What is table partitioning

In layman's terms, a table partition is a large table, divided into several small tables according to the conditions. mysql5.1 started supporting data table partitioning. For example, if a user table has more than 6 million records, you can partition the table according to the date of storage, or partition the table according to the location. Of course, you can also partition according to other conditions.

2. Why partitioning a table
To improve scalability, manageability, and database efficiency for large tables and tables with various access patterns.

4. When to use partitions

• Massive data sheets

• Historical tables are fast queries that can be used in a archive+partition way.

• Data table index is larger than server valid memory

• For large tables, especially if the index is much larger than the server's valid memory, the partitioning efficiency is more efficient when the index is not available.

5. using table partitioning is not a guarantee of performance improvements. It relies on the following factors:

      • The partition uses the column used forpartitioning;
      • Partition function If the original field is not of type int;
      • Server speed;
      • The amount of memory.

Four , read/write separation

If the database pressure is very large, a machine can not support, then use MySQL replication to achieve multiple machine synchronization, the pressure of the database is dispersed.  

Master Slave1 Slave2 Slave3

Master Library Master is used to write, slave1-slave3 are used to make select, each database share a lot less pressure.

To achieve this, the program needs special design, write operations master, read all operations slave, to the development of the program brings an additional burden. Of course, there is now a middleware to implement this agent, the program to read and write which databases are transparent. The official has a mysql-proxy, but it's alpha version. Sina has a amobe for MySQL, also can achieve this purpose, the structure is as follows

Five, table Optimization tool procedure analyse ()

PROCEDURE analyse () will let MySQL help you analyze your fields and their actual data, and will give you some useful advice. These suggestions will only become useful if there is actual data in the table, because it is necessary to have data as a basis for making some big decisions.
For example, if you create an int field as your primary key, but there is not much data, then PROCEDURE analyse () suggests that you change the type of the field to Mediumint. Or you use a varchar field, because there is not much data, you might get a suggestion that you change it to an enum. These suggestions are probably because the data is not enough, so the decision-making is not accurate.
In phpMyAdmin, you can view these suggestions by clicking "proposetable structure" while viewing the table. It is important to note that these recommendations only become accurate when the data in your table is getting more and more. Keep in mind that you are the one who will make the final decision.

MySQL database 3 "Optimization 1" Table optimization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.