MySQL database optimization and mysql database Optimization

Source: Internet
Author: User

MySQL database optimization and mysql database Optimization

For a data-centric application, the quality of the database directly affects the performance of the program, so the database performance is crucial. In general, to ensure database efficiency, we should do the following work: SQL statement optimization and indexing, database (table) design, database parameter configuration, and appropriate hardware resources and operating systems, this Order also shows the impact of these four operations on performance.


Next we will clarify one by one:


I. SQL statement optimization and Indexing

1. SQL statement optimization tool: Slow log:

Use MySQL slow query logs to monitor SQL statements with high efficiency.

Slow log analysis tool: mysqldumpslow output

If you find that the system is slow and you cannot tell where it is, you should use this tool. You only need to configure parameters for mysql. mysql records slow SQL statements by itself. The configuration is simple. It is configured in the parameter file:

Slow_query_log = d:/slow.txt

Long_query_time = 2

You can find the statement whose execution time exceeds 2 seconds in d:/slow.txt. Locate the problem based on the file.

· Mysqldumpslow. pl

Slow log files may be large, making it difficult to see. At this time, we can use the mysql tool. This tool can format slow log files. Statements with different parameters will be categorized into classes, for example, there are two statements select * from a where id = 1 and select * from a where id = 2. After this tool, only select * from a where id = N is left, in this way, it is much more comfortable to read. In addition, this tool can achieve simple sorting, so that we can be targeted.

2. Explain

Now we know which statement is slow. Why is it slow? Let's see how mysql is executed. You can see the mysql execution plan using explain. The following usage comes from the manual.

EXPLAIN syntax (obtain SELECT-related information)

EXPLAIN [EXTENDED] SELECT select_options

The EXPLAIN statement can be used as a synonym for DESCRIBE, or obtain information about how MySQL executes the SELECT statement:

· EXPLAIN tbl_name is a synonym for DESCRIBE tbl_name or show columns from tbl_name.

· If the keyword "EXPLAIN" is placed before the SELECT statement, MySQL will EXPLAIN how it processes the SELECT statement and provide the order of how the table is joined and joined.

This section describes the 2nd usage of EXPLAIN.

With the help of EXPLAIN, you can know when to add an index to the table to obtain a faster SELECT statement that uses indexes to search for records.

If incorrect indexes are used, run analyze table to update the TABLE statistics (such as the keyword set trend), which will affect the optimizer's selection.

You can also know whether the optimizer joins the table in an optimal order. To force the optimizer to set a SELECT statement to join in the table naming order, the statement should start with STRAIGHT_JOIN, not just SELECT.

EXPLAIN returns a row of information for each table in the SELECT statement. Tables are listed in the order they are read by MySQL during query processing. MySQL uses a single-sweep multi-join scan to resolve all connections. This means that MySQL reads a row from the first table, finds a matching row in the second table, and then in 3rd tables. After all the tables are processed, It outputs the selected columns and returns the table list until it finds a table with more matching rows. Read the next row from the table and continue processing the next table.

When the EXTENDED keyword is used, EXPLAIN generates additional information, which can be viewed using show warnings. This information shows the optimizer limits the tables and column names in the SELECT statement, what the SELECT statement looks like after rewriting and executing the optimization rule, and may include other annotations of the optimization process.


3. If nothing can be done, try full index scan.

If a statement cannot be optimized, you can try the following method: Index overwrite.

If a statement can retrieve all the data from the index, you do not need to read the table through the index, saving a lot of I/O. Such a table

If I want to count the scores of each student in each question, in addition to creating an index for the primary key and Foreign keys of each table, we also need to index the actual score field of [score]. In this way, the entire query can obtain data from the index.


2. Database (table) Design

Moderate anti-paradigm, attention is moderate

We all know that the three-paradigm model based on the three-paradigm is the most effective way to store data and the most easily scalable model. When developing applications, the database to be designed must comply with the three paradigms to the maximum extent. Especially for OLTP systems, the three paradigms are the rules that must be followed. However, the biggest problem with the three paradigm is that many tables need to be joined during queries, resulting in low query efficiency. So sometimes, based on performance considerations, we need to intentionally violate the three paradigm and moderately implement redundancy to improve query efficiency. Note that the anti-paradigm here is moderate and must provide sufficient justification for this practice. The following is a bad example:

To improve the retrieval efficiency of student activity records, the unit names are redundant in the student activity records table. The unit information contains 500 records, and the student activity record contains about 2 million data records within one year. If the unit name field is not redundant in the student activity record table, it only contains three int fields and one timestamp field, which only occupies 16 bytes and is a small table. After a varchar (32) field is redundant, it is three times that of the original one, and so many I/O operations are involved in the retrieval. In addition, the number of records varies greatly from 500 to 2000000, resulting in 4000 Redundant records being updated for a new unit name. It can be seen that this redundancy is counterproductive.

The following redundancy is good.

We can see that the [total score of the student examination] is redundant, and this score can be fully obtained through the [score] summary. In the [total score of a student examination], one student has only one record, while in the [score condition], one student makes a small question and one record for a question in the examination paper, the ratio is roughly. In addition, the score of the score will not be changed easily, and the update frequency is not high. Therefore, this redundancy is better.

1. Create an index as appropriate

To improve database performance, indexing is the best and inexpensive. There is no need to add memory, no need to change the program, no need to tune the SQL, as long as the correct 'create Index' is executed, the query speed may be improved by times, which is really tempting. However, there is no free lunch in the world, and the query speed is improved at the cost of the insert, update, and delete speeds. These write operations increase a lot of I/O. Because the index storage structure is different from the table storage, the index space of a table is larger than the data storage space. This means that we have done a lot of extra work while writing the database, and this work is only to improve the Read efficiency. Therefore, to create an index, we must ensure that the index will not "lose money ". Follow these rules:

The index field must be a field that is frequently used as a query condition;

If multiple fields are indexed, the first field is often used as the query condition. If only the second field is used as the query condition, this index does not work;

The indexed fields must have sufficient discrimination;

Mysql supports prefix indexing for long fields;


2. Horizontal table Division

Horizontal Table Partitioning is used to solve the problem of excessive data volume in a single table. The structure of each table in a horizontally divided table is consistent.

If a table contains too many records, for example, tens of millions of records and needs to be searched frequently, it is necessary to convert them to zero. If I split 100 tables, each table has only 0.1 million records. Of course, data can be logically divided. A good division basis is conducive to the simple implementation of the program, and can also make full use of the advantages of horizontal table sharding. For example, if the system interface only provides the monthly query function, you can split the table into 12 by month. You can query only one table for each query. If you want to split the table by region, even if the table is smaller, you still need to join all the tables for query. It is better not to split the table. Therefore, a good splitting basis is the most important.

Here is a good example.

The questions that each student has done are recorded in this table, including questions and incorrect questions. Each question corresponds to one or more knowledge points. We need to analyze the students' knowledge points based on the wrong questions. This table is easy to split to tens of millions of tables and is in urgent need. So what is the basis for splitting? From the perspective of demand, both teachers and students will eventually focus on a student. The Student Union cares about themselves, and the teacher cares about the students in his class. In addition, the knowledge points of each subject are different. Therefore, we can easily think of splitting the table by combining the fields of the subject and knowledge points. In this way, each table contains about 20 thousand pieces of data, and the retrieval efficiency is very high.


3. Vertical Table Partitioning

The so-called vertical split is to split a table with many columns into multiple tables, which solves the table width problem. Generally, Vertical Split can be performed according to the following principle:

(1) store infrequently used fields in a single table;

(2) store large fields in a table independently;

(3) Put frequently used fields together.


Some tables have a small number of records, which may contain 2 or 30 thousand records, but the fields are very long. The table occupies a large amount of space and requires a large number of I/O operations during table retrieval, seriously reducing the performance. In this case, you need to split the large field into another table, and the table has a one-to-one relationship with the original table.

Two tables, question content and answer information, are initially added to question information as several fields. We can see that the question content and answer fields are very long, when there are 30 thousand records in the table, the table occupies 1 GB of space and is very slow in the list of questions. After analysis, it is found that the system displays the detailed content of the questions by Page Based on the query conditions such as volume, unit, type, category, and difficulty. The join operation is performed on these tables for each search. It is depressing to scan the 1g table every time. We can split the content and answer into another table, and read the Big Table only when the detailed content is displayed. As a result, two tables, question content and answer information, are generated.


4. Select the appropriate field type, especially the primary key.

The general principle of selecting a field is to keep the field small and not big. You can use a field that occupies a small byte without using a large field. For example, we strongly recommend that you use the auto-increment type instead of guid for the primary key. Why? Saving space? What is space? Space is efficiency! Locate a record in 4 bytes and 32 bytes. It is too slow and obvious. When several tables are involved in join operations, the effect is more obvious. It is worth mentioning that datetime, timestamp, and datetime occupy 8 bytes, while timestamp occupies 4 bytes and only uses half, while timestamp indicates the range is 1970-2037, it is more than enough for most applications, especially the information such as the exam time and logon time.


5. Files, images, and other large files are stored in the file system, without the need for Databases

Needless to say, iron !!! The database is only stored in the path.


6. Foreign keys are clearly displayed to facilitate index creation

We all know that in powerdesigner, the relationship is established for two entities. When a physical model is generated, the foreign key is automatically indexed. So we should not be afraid to build a link to bring the line out of chaos, just set up a cut.


7. Master the Table Writing Time

When the database mode is the same, how to use the database also plays an important role in performance. Writing to a table also results in a significant impact on subsequent operations. For example, in the above example of moderate redundancy,

Our initial goal was to record the total score of the examinee to improve the search efficiency, that is, to write the table when the score is entered. The requirement is as follows: list all student scores for this test. If no score is entered, the Student name is displayed, but the total score is blank. This query requires the use of [Student Information] left outer join [student exam total score information]. We all know that outer join is less efficient than join. To avoid this problem, we will write this table during the exam arrangement, insert all the students into it, and the scores will be null, so that we can use join to achieve this effect. In addition, the advantage is that all students in a class are arranged to take the test and all the students are scored. Now there is a new student in the class. If you check the student's score, the new student will be listed. The result is not recorded. This is obviously incorrect. If it is written at the scheduled time, you can record the actual candidates for the exam. The role of this table is unknown.


8. You 'd better concentrate on batch operations to avoid frequent read/write operations.

The system contains the points, and students and teachers can earn points through the system. The points are complex, and different points are obtained for each type of operation, each person has an upper limit for each type of points per day. For example, you can earn 1 point for a Single login, but no matter how many logins you have, you can only accumulate one logon point per day. This is still simple. Some points are abnormal. For example, the teacher points are classified into situations where the teacher judges homework. The rule is: The teacher judges homework and finds that the student is wrong. The student has changed the score, the teacher judges that if all the students are correct at this time, the teacher will be given extra points. If the students are still wrong, the student will be changed to know that all the students are correct, to add extra points to the teacher. If the program is used for processing, it is very likely that each function will write a bunch of additional code to process the points like this. Not only do programming colleagues not find the focus, but it also puts a lot of pressure on the database. After discussion with the requirement personnel, it is not necessary to accumulate the points in real time, so we adopt the background script batch processing method. In the dark of night, let the machine play.

This abnormal point rule is read in batches as follows:

1select person_id, @semester_id, 301003, 0, @one_marks, assign_date, @one_marks2         from hom_assignmentinfo   ha, hom_assign_class hac3         where ha.assignment_id = hac.assignment_id4              and ha.assign_datebetween @time_beginand @time_end5              and ha.assignment_idnot in6                    (7                        select haa.assignment_idfrom hom_assignment_appraise haa, hom_check_assignment hca8                         where haa.appraise_id = hca.appraise_idand haa.if_submit=19                              and (10                                      (hca.recheck_state = 3004001and hca.check_resultin (3003002, 3003003) )11                                      or12                                      (hca.recheck_state = 3004002and hca.recheck_resultin (3003002, 3003003))13                                    )14                    )15              and ha.assignment_idnot in16                    (17                        select assignment_idfrom hom_assignment_appraisewhere if_submit=0and result_type = 018                    )19              and ha.assignment_idin20                    (21                        select haa.assignment_idfrom hom_assignment_appraise haa, hom_check_assignment hca22                         where haa.appraise_id = hca.appraise_idand haa.if_submit=123                              and hca.check_resultin (3003002, 3003003)24                    );

 

This is only an intermediate process. If the program is used for real-time processing, the database will be closed even if the programmer does not strike.


9. select an appropriate engine

Mysql provides many engines. We use myisam, innodb, and memory. In the official manual, myisqm is faster than innodb, which is about three times faster. However, I cannot believe it in my book. The book OreIlly. High. Performance. Mysql mentions the comparison between myisam and innodb, and the Performance of myisam in the test is inferior to that of innodb. As for memory, haha, it is quite easy to use. Using temporary tables in batch processing is a good choice (if the memory is large enough ). In one of my batch processing, the speed is almost.


 

Iii. database parameter configuration

The database is OS-based. Currently, most MySQL instances are installed on Linux systems. Therefore, some OS parameter configurations may affect MySQL performance.


The most important parameter is the memory. We mainly use the innodb engine, so the following two parameters are very tuned.

# Additional memory pool that is used by InnoDB to store metadata

# Information. If InnoDB requires more memory for this purpose it will

# Start to allocate it from the OS. As this is fast enough on most

# Recent operating systems, you normally do not need to change this

# Value. show innodb status will display the current amount used.

Innodb_additional_mem_pool_size = 64 M

# InnoDB, unlike MyISAM, uses a buffer pool to cache both indexes and

# Row data. The bigger you set this the less disk I/O is needed

# Access data in tables. On a dedicated database server you may set this

# Parameter up to 80% of the machine physical memory size. Do not set it

# Too large, though, because competition of the physical memory may

# Cause paging in the operating system. Note that on 32bit systems you

# Might be limited to 2-3.5G of user level memory per process, so do not

# Set it too high.

Innodb_buffer_pool_size = 5G

For myisam, you need to adjust the key_buffer_size

Of course, adjusting parameters depends on the status. You can use the show status statement to view the current status and decide which parameters to adjust.

Cretated_tmp_disk_tables increases tmp_table_size

Handler_read_key height indicates that the index is correct. Handler_read_rnd height indicates that the index is incorrect.

Key_reads/Key_read_requests should be less than 0.01 to calculate the cache loss rate and increase the Key_buffer_size.

Opentables/Open_tables add table_cache

The number of links in select_full_join that do not have a practical index. If the value is not 0, check the index.

If select_range_check is not 0, this checklist index.

The number of merged sort_merge_passes sorting algorithms. If the value is large, you should increase sort_buffer_size.

The number of table locks that table_locks_waited cannot obtain immediately. If the value is high, optimize the query.

Threads_created the number of threads created to process connections. If Threads_created is large, increase the value of thread_cache_size.

The cache renewal rate is calculated using Threads_created/Connections.

 

4. reasonable hardware resources and Operating Systems

If the memory of your machine exceeds 4 GB, the 64-bit operating system and 64-bit mysql should be adopted without doubt, and the cost is not obvious.

Read/write splitting

If the database is under great pressure and cannot be supported by one machine, you can use mysql replication to synchronize multiple machines to distribute the database pressure.

Master

Slave1

Slave2

Slave3

The master database master is used for writing, the slave1-slave3 is used for select, each database to share a lot of less pressure.

To implement this method, Special Program design is required. Write operations are performed on the master and read operations are performed on the slave, which brings additional burden to program development. Of course, there is already middleware to implement this proxy, and it is transparent to which databases are read and written in the procedure. There is an official mysql-proxy, but it is still alpha. Sina has an ambench for mysql instance, which can also be achieved. The structure is as follows:

For usage instructions, see the amobe manual.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.