Database performance Optimization

Last Update:2018-01-23 Source: Internet

Author: User

Tags joins mysql query mysql version

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, MySQL The main application scenarios

1. Web Site System

2. Logging System

3. Data Warehouse System

4. Embedded system

Second, MySQL Frame Composition:

Third, MySQL Storage Engine Overview

1 ) MyISAM Storage Engine

MyISAM Storage Engine table in the database, each table is stored as a three physical file named after the table name. There must be a. frm file that holds the table structure definition information, as well as the. MYD and. MYI files, which store the table's data (. MYD) and index data (. MYI), respectively, for any storage engine that is indispensable. Each table has and only such three files are stored as a table of the MyISAM storage type, meaning that no matter how many indexes the table has, it is stored in the same one. The Myi file.

MyISAM supports the following three types of indexes:

1. B-tree Index

The B-tree index, as the name implies, is that all the index nodes are stored according to the Balancetree data structure, and all the index nodes are in the leaf node.

2. R-tree Index

The R-tree index is stored in a different way from the B-tree index, and is primarily designed to index fields of storage space and multidimensional data, so the current MySQL version only supports the geometry type of fields for indexing.

3. Full-text Index

The Full-text index is the full-text index that we say, and his storage structure is b-tree. The main thing is to solve inefficient problems that we need to use like queries.

2 ) Innodb Storage Engine

1. Support Transaction Installation

2, data multi-version read

3, the improvement of locking mechanism

4. Implement FOREIGN key

3 ) Ndbcluster Storage Engine

The NDB storage engine, also known as the Ndbcluster storage engine, is used primarily for mysqlcluster distributed cluster environments, and cluster is a new feature that MySQL has only begun to provide from version 5.0.

4 ) Merge Storage Engine

The merge storage engine, also mentioned in the MySQL user manual, is also known as the Mrg_myisam engine. Why? Because the merge storage engine can be simply understood as its function is to implement the same structure of the MyISAM table, through some special packaging to provide a single access to the portal, to reduce the complexity of the application of the purpose. To create a merge table, not only the structure of the base table is exactly the same, including the order of the fields, and the index of the base table must be exactly the same.

5 ) Memory Storage Engine

Memory storage engine, by name it is easy to know that he is a storage engine that stores data in memory. The memory storage engine does not store any data on disk, only the. frm file that holds information about a table structure is on disk. So once the Mysqlcrash or the host crash, the memory table has only one structure left. The memory table supports indexes and supports both hash and b-tree two formats. Because it is stored in memory, it stores data in a fixed-length space and does not support fields of BLOB and text types. The memory storage engine implements page-level locking.

6 ) BDB Storage Engine

The BDB storage engine is called the BerkeleyDB storage engine, and like InnoDB, it is not a storage engine developed by MySQL itself, but is provided by Sleepycatsoftware and, of course, the open source storage engine, which also supports transactional security.

7 ) Federated Storage Engine

The functionality implemented by the Federated storage Engine is basically similar to Oracle's Dblink, which is primarily used to provide access to the data on the remote MySQL server. If we use source code compilation to install MySQL, then you must manually specify that the federated storage Engine is enabled, because MySQL defaults to the storage engine.

8 ) ARCHIVE Storage Engine

The archive storage engine is primarily used to store outdated, infrequently accessed historical data with small storage space. The archive table does not support indexing, a. frm structure definition file, and a. Arz data compression file also has a. ARM meta-information file. Due to the particularity of the data stored, the archive table does not support the deletion, modification

Only insert and query operations are supported. The locking mechanism is row-level locked.

9 ) blackhole Storage Engine

Blackhole Storage Engine is a very interesting storage engine, the function of the name, is a "black hole." Just like the "/dev/null" device under our UNIX system, no matter what information we write, there is no return.

Ten ) CSV Storage Engine

The CSV storage Engine actually operates a standard CSV file that does not support indexing. The main purpose is that you may sometimes need to export data from the database into a report file, and the CSV file is a more standard format supported by many software, so we can first create a CVS table in the database, and then insert the generated report information into the table, To get a copy of the CSV report file.

Iv. Impact MySQLServer factors related to performance

1 The impact of business requirements on performance

Typical needs: A forum post total statistics, requirements: real-time updates.

2 The impact of system architecture and implementation on performance

The following types of data are not suitable for storing in a database:

Binary Multimedia data

Flow queue Data

Very large text data

Improve system performance with cache technology:

System configuration and rules of the data;

Basic information data of active users;

Personalized and customized information data for active users;

Quasi-real-time statistical information data;

Some other frequently accessed but less-changed data;

3 Query The effect of statements on system performance

Requirements: Take out a group (assuming ID 1) under the user number (ID), user nickname (Nick_name), and follow the time to join the group (user_group.gmt_create) in reverse order, remove the first 20.

Solution One:

SELECT id,nick_name from User,user_group WHERE user_group.group_id=1 and User_group.user_id=user.id ORDER by User_ Group.gmt_create desc limit 100, 20;

Solution Two:

Select User.id,user.nick_name from (select User_idfrom user_groupwhere user_group.group_id=1order by gmt_create Desclimit 100,20) T,userwhere t.user_id=user.id;

By comparing the execution plans of the two solutions, we can see the number of records in the first solution that need to participate in the join with the user table. mysql is estimated by statistical data to be 31156, that is, all the satisfied group_id= returned by the User_group table 1 record count (actual data in the system is 20000). While the second solution's execution plan, the user table participates in the join the data to have only 20, the difference is very big, we think the second solution should be obviously superior to the first one solution.

4 Schema The performance impact of the design on the system

Minimize requests for database access.

Minimize query requests for useless data.

5 The effect of hardware environment on system performance

1. Typical OLTP application system

For all kinds of database system environment of the most common OLTP system, which is characterized by a large amount of concurrency, the overall data volume is more, but each access to less data, and access to the data is more discrete, active data accounted for the overall data is not too large. The database for this type of system is actually the most difficult to maintain, the most difficult to optimize, the overall performance requirements of the host is also the highest. Because not only the traffic is very high, the amount of data is not small.

For these characteristics and analysis, we can get an approximate direction of the OLTP.

Although the overall data volume of the system is large, but the proportion of the system active data in the total data is small, we can enlarge the memory capacity to cache the active data in memory as much as possible;

Although IO access is very frequent, but the amount of data accessed per visit is small and discrete, our requirement for disk storage is that IOPS perform well and throughput is a secondary factor;

Concurrency is very high, CPU per second to deal with a lot of requests, so CPU processing power needs to be relatively strong;

Although the amount of data per interaction with the client is not particularly large, the network interaction is very frequent, so the network devices that the host interacts with the client are not too weak for the traffic capacity.

2. Typical OLAP application system

The main feature of OLAP system used for data analysis is that the data volume is very large, concurrent access is not many, but the amount of data that need to be retrieved each time accesses is more, and the data access is relatively centralized, there is not too obvious active data concept.

Based on the various characteristics of OLAP system and corresponding analysis, the approximate strategy for the hardware optimization of OLAP system is as follows:

The amount of data is very large, so the unit capacity of disk storage system needs to be as large as possible;

Single Access data volume is large, and the access data is relatively centralized, then the performance requirements for the IO system is to have the largest IO throughput per second, so the disk with the largest throughput per second should be chosen;

Although the IO performance requirements are relatively high, but the concurrent requests are less, so the CPU processing power is more difficult to become a performance bottleneck, so the CPU processing power is not too stringent requirements;

Although the number of accesses per request is very large, but most of the data in the execution is not returned to the client, the amount of data returned to the client is small, so the network device requirements for interacting with the client are not too high;

In addition, because of the OLAP system because of its long time, can be very good parallelization, so the general OLAP system is composed of multiple hosts a cluster, and the host and host in the cluster data interaction is generally very large, so in the cluster between the host network equipment requirements are very high.

3, in addition to the above two typical applications, there is a special kind of application system, their data volume is not particularly large, but the access request and its frequent, and most of the read request. It may require tens or even tens of thousands of of requests per second, and each request is very simple, and most likely only one or a few smaller records, such as a database-based DNS service, is a type of service.

Although the amount of data is small, but the access is very frequent, so the large memory can be used to cache the majority of the data, which can guarantee very high hit rate, disk IO is smaller, so the disk does not need special high performance;

The concurrent requests are very frequent, which can be handled more than the CPU processing power.

Although the application and database interaction is very large, but each interaction data is small, the overall traffic will be larger, but generally the average gigabit NIC is enough.

Five, MySQL Introduction to locking mechanism

Row-level locking (Row-level)

Table-level Locking (Table-level)

Page-level locking (Page-level)

In the MySQL database, the use of table-level locking is mainly myisam,memory,csv and other non-transactional storage engines, while the use of row-level locking is mainly the InnoDB storage engine and Ndbcluster storage engine, Page-level locking is primarily the locking method of the BerkeleyDB storage engine.

Six, MySQL Query the optimization

The optimization ideas and principles of query statement are mainly mentioned in the following aspects:

1. Optimize the query that needs to be optimized;

2. Positioning the Optimization object performance bottleneck;

3. Clear objectives for optimization;

4. Starting from explain;

5. Multi-Use profile

6. Always drive large result sets with small result sets;

7. Complete the sorting in the index as much as possible;

8. Only take out the columns you need;

9. Use only the most effective filter conditions;

10. Avoid complicated joins and subqueries whenever possible;

Rational design and use of indexes

1) B-tree Index

In general, the physical files of the B-tree index in MySQL are mostly stored in the structure of Balancetree, that is, all the data that is actually needed is stored in the leafnode of the tree, and the shortest path to any leafnode is exactly the same length, So we all call it the B-tree Index. Of course, it is possible that various databases (or MySQL's various storage engines) will slightly transform the storage structure when storing their own b-tree indexes. such as the B-tree index of the InnoDB storage engine is actually the storage structure used B+tree, that is, the B-TREE data structure on the basis of a small transformation, in each of the Leafnode above the index key to store the relevant information, Also stores pointer information to the latter leafnode adjacent to the Leafnode, primarily to speed up the efficiency of retrieving multiple adjacent leafnode.

2) Hash Index

Hash index is not used much in MySQL, it is mainly used by memory storage engine, and the hash index is the default index type in memory storage engine. The so-called hash index, in fact, through a certain hash algorithm, will need to index the key value of the hash operation, and then the resulting hash value into a hash table. Then each time you need to retrieve, the search criteria will be the same hash of the algorithm, and then the hash table and the hash value to compare and obtain the corresponding information.

Hash index can only meet "=", "in" and "<=>" query, can not use range query;

Hash index can not be exploited to avoid the sorting operation of data;

The hash index cannot use partial index key query;

Hash index at any time can not avoid the table sweep surface;

When a hash index encounters a large number of equal hash values, the performance is not necessarily higher than the B-tree index.

3) Full-text Index

The Full-text index, which is what we often call full-text indexing, is supported only by the MyISAM storage engine in MySQL, and not all data types support full-text indexing. Currently, only columns of the three data types, Char,varchar and text, can be full-text indexed.

Index can greatly improve the efficiency of data retrieval, but also can improve the performance of sorting grouping operations, but we can not ignore the problem is that the index is completely independent of the underlying data part of the data, update the data will bring the amount of IO and adjust the index of the calculation of the resource consumption.

If you need to create an index, several principles: more frequent fields that are query criteria should be indexed; fields with poor uniqueness are not suitable for creating indexes individually, even if they are frequently used as query criteria; fields that are updated very frequently are not suitable for creating indexes;

Fields that do not appear in the WHERE clause should not create an index;

Join Optimization of statements

Minimize the total number of nestedloop in a join statement; "Always drive a large result set with a small result set."

Optimize the inner loop of nestedloop;

Ensure that the join condition field on the drive table in the JOIN statement has been indexed;

Do not be too stingy with the Joinbuffer setting when the Join condition field of the driver table is not guaranteed to be indexed and the memory resource is sufficient;

ORDER by , GROUP by and the DISTINCT Optimized

1) Implementation and optimization of ORDER by

When you optimize the order by in the query statement, the performance of the order by operation can be greatly improved by using existing indexes to avoid actual sort calculations.

Optimized sorting:

1. Increase the setting of max_length_for_sort_data parameters;

2. Remove unnecessary return fields;

3. Increase the sort_buffer_size parameter setting;

2) Implementation and optimization of GROUP by

Because group by actually also needs to be sorted, and the group by is mostly just a sort of grouping operation, compared to order by. Of course, if you use some other aggregate functions when grouping, you also need to calculate some aggregate functions. Therefore, in the implementation of group by, the index can be used as well as the order by.

3) Realization and optimization of distinct

Distinct is actually very similar to the group by operation, except that only one record is taken out of each group after group by. Therefore, the implementation of distinct and the implementation of group by is basically similar, there is not much difference. It can also be done by either a loose index scan or a compact index scan, and of course, MySQL can only be completed with a temporary table when the distinct can not be completed with just the index. However, a bit different from group by is that distinct does not need to be sorted. That is, in a query that is just a distinct operation, MySQL uses a temporary table to "cache" the data once it is not able to perform the operation just by using the index, but does not filesort the data in the staging table.

Seven, MySQL Database Schema performance optimization of the design

Efficient model Design

Moderate redundancy-let query do two fewer joins

Optimization of vertical partition-summary table for large-scale segment

Large table horizontal Split-type-based spin-off optimization

Statistics-quasi-real-time optimization

The right data type

The time storage format General class is not too many, we commonly used mainly is the datetime,date and the timestamp these three kinds. In terms of storage space, timestamp has a minimum of four bytes, while the other two data types are eight bytes, one more times. The disadvantage of timestamp is that he can only store time since 1970, while the other two types of time can be stored as early as 1001. If there is a need to store the time before 1970, we must discard the timestamp type, but as long as we do not need to use the time 1970 years ago, it is best to use timestamp to reduce storage space consumption.

Character Store Type

The char[(M)] type is a static length type, and the storage length is fully calculated with the number of characters, so the final memory length is character set based, such as latin1 the maximum storage length is 255 bytes, but if GBK is used, the maximum storage length is 510 bytes. The Char type storage feature is that no matter how long we actually store the data, it will be stored in the database m characters, not enough by the space to fill, m default is 1. Although Char will be filled with space, but when accessing the data, MySQL ignores the last of all the spaces, so if we actually need a space at the end of the actual data, we cannot use the char type to store it.

varchar[(M)] belongs to the dynamic storage length type and only occupies the length of the actual stored data. Tinytext,text,mediumtext and Longtext These four types belong to one kind of storage way, all is the dynamic storage length type, the difference is only the maximum length limit.

Transaction optimization

1. Dirty read: Dirty read refers to when a transaction is accessing the data, and the data has been modified, and this modification has not been committed to the database, then another transaction also accesses the data, and then used this data.

2. Non-repeatable reads: Refers to the same data that is read multiple times within a transaction. When this transaction is not finished, another transaction accesses the same data. Then, between the two read data in the first transaction, the data that the first transaction two reads may be different because of the modification of the second transaction. This occurs when the data that is read two times within a transaction is not the same and is therefore called non-repeatable read.

3. Phantom reads: A phenomenon that occurs when a transaction is not executed independently, for example, the first transaction modifies data in a table that involves all rows of data in the table. At the same time, the second transaction modifies the data in the table by inserting a new row of data into the table. Then the user who will be working on the first transaction in the future finds that there are no modified rows of data in the table, as if the illusion had occurred.

The information supported by the INNODB at the transaction isolation level is as follows:

1.READ UNCOMMITTED

Often become dirty Reads (dirty Read), can be said to be the lowest level of isolation on the transaction: in the normal non-locking mode of the execution of the select so that we see the data may not be the query at the point of origin data, and thus in this isolation is non-consistent Reads (consistent read);

2.READ COMMITTED

At this isolation level, Dirtyread does not appear, but non-repeatablereads (non-repeatable reads) and phantomreads (phantom reads) may occur.

3. Repeatable READ

The REPEATABLE read isolation level is the INNODB default transaction isolation level. Under the REPEATABLE Read isolation level, dirtyreads is not present and non-repeatable read is not present, but there is still the possibility of phantomreads.

4.SERIALIZABLE

The Serializable isolation level is the highest level in the standard transaction isolation level. After the isolation level is set to serializable, the data that is seen at any time in the transaction is the state of the start of the transaction, regardless of whether any other transaction has modified some data and committed it during this time. Therefore, Phantomreads does not appear under the SERIALIZABLE transaction isolation level.

Viii. Data Segmentation for scalable design

Vertical segmentation of data

Vertical segmentation of data can also be referred to as vertical segmentation. Think of the database as a large chunk of chunks of "data blocks" (tables), we cut these "chunks" vertically, and then scatter them across multiple database hosts. Such a segmentation method is a vertical (longitudinal) data segmentation.

Advantages of vertical slicing

The splitting of the database is simple and clear, and the splitting rule is explicit;

The application module is clear and easy to integrate.

Data maintenance convenient and easy to locate;

Disadvantages of vertical slicing

Some table associations cannot be completed at the database level and need to be completed in the program;

There is still a quiet performance for tables with extremely frequent access and large data volumes, and may not necessarily meet the requirements;

Transaction processing is relatively more complex;

When the segmentation reaches a certain extent, the extensibility will be limited;

Read-through segmentation can lead to complex system transitions and difficult maintenance.

Horizontal segmentation of data

The vertical segmentation of data can be easily understood as the partitioning of the data according to the table, and the horizontal segmentation is no longer based on the table or function module to slice. In general, simple horizontal slicing is the main way to spread an extremely mundane table into multiple tables, with a subset of the data in each table, according to some rule of a field.

Advantages of horizontal slicing

Table Association is basically able to complete on the database side;

There is no problem that some very large data volumes and high-load tables encounter bottlenecks;

Application-side Overall schema changes are relatively small;

Transaction processing is relatively simple;

As long as the segmentation rules can be defined, it is basically more difficult to meet the extensibility limit;

Disadvantages of horizontal slicing

The segmentation rules are relatively more complex, and it is difficult to abstract a segmentation rule that satisfies the whole database.

Later data maintenance difficulty has increased, manual positioning of data is more difficult;

The coupling degree of each module in the application system is high, which may cause some difficulties in the migration and splitting of the later data.

Possible problems in data segmentation and integration

1. Problems with the introduction of distributed transactions

It is possible to split a distributed transaction across multiple databases into multiple small transactions that are only on a single database, and to master small transactions through the application. Of course, the requirement for this is that our Russian application must be robust enough to provide some technical difficulties for the application as well.

2. Cross-node Join issues

It is recommended that the application be processed by taking the corresponding drive result set in the MySQLServer where the driver table is located, and then taking the corresponding data from the drive result set to the MySQL server where the driver table is located.

3. Cross-node merge sort paging problem

Fetch data in parallel from multiple data sources, and then the application summarizes the processing.

Nine, the scalability of the design Cache with the Search the Use

Reduce database access and increase performance by introducing cache (Redis, Memcached).

Through the introduction of search (Lucene, SOLR, ElasticSearch), the use of efficient full-text indexing and Word segmentation algorithm, as well as efficient data retrieval implementation, to solve the database and the traditional cache software completely can not solve the full-text fuzzy search, classified statistical query and other functions.

Database performance Optimization

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More