1. Introduction
In the Web application architecture, the data persistence layer (usually a relational database) is a key core component and has a very important impact on the system performance. MySQL is currently the most widely used open-source database, but the default setting of MySQL database has very poor performance, just a toy database. Therefore, you must optimize the MySQL database in the product.
Optimization is a complex task. This article describes MySQL-related database design and query optimization, server-side optimization, and storage engine optimization.
2. Database Design and Query Optimization
In MySQL Server performance tuning, the first thing to consider is the Database Schema design, which is very important. A bad Schema design, even running on the MySQL Server with excellent performance, will also show poor performance. Similar to the Schema, the design of query statements will also affect the performance of MySQL, inefficient SQL queries should be avoided. This section will discuss in detail the optimization of these two aspects.
2.1 Schema Design
Schema optimization depends on the query to be run. Different queries have different Schema optimization solutions. Section 2.2 describes the optimization of Query Design. The Schema design is also affected by the expected dataset size. Schema Design considerations include standardization, data types, and indexes.
2.1.1 Standardization
Standardization is the process of organizing data in databases. This includes creating tables based on design rules and establishing relationships between these tables. By canceling redundancy and inconsistent correlations, this design rule can simultaneously protect data and improve data flexibility. Generally, database standardization is to make the database design conform to a certain level of paradigm, and usually the third paradigm is enough. There are also the fourth paradigm (also known as Boyce Codd paradigm, BCNF) and the fifth paradigm, but it is rarely considered in actual design. Ignoring these rules may make the database design less perfect, but this should not affect the functionality.
Features of standardization:
1) All "objects" are in their own tables, with no redundancy.
2) databases are usually generated by E-R graphs.
3) Concise. To update attributes, only a few records need to be updated.
4) The Join Operation is time-consuming.
5) Select and sort have fewer optimization measures.
6) applicable to OLTP applications.
Non-Standardized features:
1) store a lot of data in a table and ensure data redundancy.
2) the overhead of updating data is very high. updating an attribute may update many tables and many records.
3) data may be lost when data is deleted.
4) Select and order have many optimization options.
5) applicable to DSS applications.
Both standardization and non-standardization have their own advantages and disadvantages. Generally, they can be used together in a database design. Some tables are standardized, while some tables retain some redundant data:
1) Standardized use of OLTP and non-standardized use of DSS
2) use materialized views. MySQL does not directly support this database feature, but it can be replaced by a MyISAM table.
3) redundant data is stored in tables. For example, ref_id and name are stored in the same table. But pay attention to the update issue.
4) for some simple objects, values are directly used for creation. For example, IP address
5) Reference by PRIMARY/unique key. MySQL can optimize this operation, for example:
Java code
Select city_name
From city, state
Where state_id = state. id and state. code = 'CA' "converted to" select city_name from city where state_id = 12
2.1.2 Data Type
One of the most basic optimizations is to minimize the disk space occupied by tables. This can greatly improve the performance, because the data is small, the disk reads faster, and the table content is processed during the Query Process, which consumes less memory. At the same time, if you create an index on a smaller column, the index also consumes less resources.
You can use the following technology to make the table better and minimize the storage space:
1) use the correct and appropriate type. Do not store numbers as strings.
2) use the most effective (minimum) data type as much as possible. MySQL has many specialized types that save disk space and memory.
3) try to use a smaller integer to make the table smaller. For example, MEDIUMINT is often better than INT, because the MEDIUMINT column uses 25% less space.
4) If possible, the declared column is not null. It makes everything faster and saves one bit for each column. NOTE: If NULL is required in the application, you should use it without a doubt, but avoid having it on all columns by default.
5) for MyISAM tables, if there are no variable-length columns (VARCHAR, TEXT, or BLOB columns), use a fixed-size record format. This is fast, but unfortunately it may waste some space. Even if you have used the CREATE option to make the VARCHAR column ROW_FORMAT = fixed, you can also be prompted to use a fixed Length Row.
6) use the sample character set, for example, latin1. Use UTF-8 as few as possible, because UTF-8 occupies three times the space of latin1. You can use latin1 on fields that do not require UTF-8, such as mail and url.
2.1.3 Index
All MySQL column types can be indexed. Using indexes for related columns is the best way to improve the performance of SELECT operations. Pay attention to the following points when using indexes:
1) MySQL only uses the prefix, such as key (a, B )... Where B = 5 will not use the index.
2) Select an index. Using indexes on columns with few changes is not very good, such as gender columns.
3) define the Unique index in the Unique column.
4) Avoid creating indexes that cannot be used.
5) in the Btree index (InnoDB uses Btree), you can create an index on the columns to be sorted.
6) Avoid duplicate indexes.
7) avoid creating an index on the prefix of an existing index. For example, if index (a, B) exists, remove index ().
8) control the length of a single index. Use key (name (8) to index the first few characters of the data.
9) The shorter the key value, the better. integer is recommended.
10) using indexes in queries (using explain for viewing) can reduce the number of disk reads and accelerate data reading.
11) Similar key values are better than random values. Auto_increment is better than uuid.
12) Optimize table can compress and sort indexes, so do not run them frequently.
13) Analyze table can update data.
2.2 Designing queries
The optimization of query statements is a Case by case problem. Different SQL statements have different optimization schemes. Here I will only list some general techniques.
1) When there is an index, make sure that the correct index is used for the query. You can use EXPLAIN select... View the results and analyze the query.
2) use the matching type during query. For example, select * from a where id = 5. If the id here is of the character type and has an index, the index cannot be used for this query, and the full table scan will be performed, resulting in a slow speed. Which of the following statements is true... Where id = "5", with quotation marks indicating the type is a character.
3) Use -- log-slow-queries-long-query-time = 2 to view slow queries. Use the explain statement to analyze the query and optimize the query.
3. Server Optimization
3.1 install MySQL
MySQL has many release versions. It is best to use the binary version released by MySQL AB. You can also download the source code for compilation and installation, but some bugs in the compiler and class library may cause potential problems in compiled MySQL.
If the MySQL server is installed on an Intel processor, you can use the intel c ++ compiled version. As mentioned in a PPT in Linux World2005, mySQL compiled using the intel C ++ compiler can query about 30% faster than the normal version. The compiled version of Intel c ++ can be downloaded from the official MySQL website.
3.2 server settings Optimization
The default MySQL settings have poor performance, so some parameter adjustments are required. This section describes some common parameter adjustments and does not involve specific storage engines (mainly MyISAM and InnoDB. Related optimizations are described in section 4 ).
-- Character-set: for a single language, use a simple character set, such as latin1. Use less Utf-8, UTF-8 occupies more space.
-- Memlock: locking MySQL can only run in the memory to avoid swapping. However, if the memory is insufficient, an error may occur.
-- Max_allowed_packet: It must be large enough to adapt to relatively large SQL queries, which has no significant impact on performance, mainly to avoid packet errors.
-- Max_connections: The maximum connection allowed by the server. If it is too large, the out of memory will appear.
-- Table_cache: the number of tables that MySQL keeps open at the same time. The overhead of opening a table is relatively large. Generally, the value is 512.
-- Query_cache_size: memory size used for cache query.
-- Datadir: the root directory for mysql to store data, which is separated from the installation file on a different disk to improve performance.
4. Storage Engine Optimization
MySQL supports different storage engines, including MyISAM and InnoDB.
4.1 MyISAM
MyISAM manages non-transaction tables. It provides high-speed storage and retrieval, as well as full-text search capabilities. MyISAM is supported in all MySQL configurations. It is the default storage engine, unless MySQL is configured to use another engine by default.
4.1.1 MyISAM features
4.1.1.1 MyISAM Properties
1) transactions are not supported and tables are damaged due to downtime
2) use smaller memory and disk space
3) Table-based locks may cause serious performance problems when concurrently updating data.
4) MySQL only caches indexes, and data is cached by OS
4.1.1.2 Typical MyISAM usages
1) Log System
2) read-only or mostly read-only applications
3) full table Scan
4) Batch import data
5) Low-concurrency read/write without transactions
4.1.2 Key Points of MyISAM Optimization
1) The declared column is not null, which can reduce disk storage.
2) use optimize table for fragment and free space recovery. Note that it only runs after a very large data change.
3) When Deleting/updating/adding has a large amount of data, index is forbidden. Use alter table t disable keys.
4) set myisam_max _ [extra] _ sort_file_size to be large enough to significantly increase the repair table speed.
4.1.3 MyISAM Table Locks
1) avoid concurrent insert and update.
2) insert delayed can be used, but data may be lost.
3) Optimize the query statement.
4) Horizontal partitioning.
5) vertical partitioning.
6) If none of them work, use InnoDB.
4.1.4 MyISAM Key Cache
1) set key_buffer_size variable. MyISAN's most important cache settings are used to cache index data in the MyISAM table. This parameter only affects MyISAM. Generally, the memory size is set to 25-33% in the Server that only uses MyISAM.
2) You can use several different Key Caches (for some hot data ).
A) set global test. key_buffer_size = 512*1024;
B) CACHE INDEX t1.i1, t2.i1, t3 IN test;
2) Preload index to Cache can increase the query speed. Because the preloading index is sequential, It is very fast.
A) load index into cache t1, t2 ignore leaves;
4.2 InnoDB
InnoDB provides MySQL with a transaction security (ACID-compatible) storage engine with the capabilities of commit, rollback, and crash recovery. InnoDB provides row level lock and an Oracle-like non-locked read in the SELECT statement. These features increase the deployment and performance of multiple users. There is no need to expand the lock in InnoDB, because row level lock in InnoDB is suitable for a very small space. InnoDB also supports the foreign key constraint. In SQL queries, You can freely mix InnoDB tables with other MySQL tables, or even in the same query.
InnoDB is designed to achieve maximum performance when processing a large amount of data. Its CPU usage efficiency is very high.
The InnoDB Storage engine has been fully integrated with the MySQL server. the InnoDB Storage engine maintains its own buffer pool to cache data and indexes in the memory. InnoDB stores its tables and indexes in a tablespace. The tablespace can contain several files (or original disk partitions ). This is different from the MyISAM table. For example, in the MyISAM table, each table is in a separate file. The InnoDB table can be any size, even on an operating system with a file size limited to 2 GB.
Many large database sites that require high performance use the InnoDB engine. The famous Internet news site Slashdot.org runs on InnoDB. Mytrix, Inc. Stores more than 1 TB of data on InnoDB, and some other sites process an average of 800 inserts/updates per second on InnoDB.
4.2.1 InnoDB features
4.2.1.1 InnoDB Properties
1) supports transactions, ACID, and foreign keys.
2) Row level locks.
3) different isolation levels are supported.
4) it requires more memory and disk space than MyISAM.
5) No key compression.
6) data and indexes are cached in the memory hash table.
4.2.1.2 InnoDB Good
1) applications requiring transactions.
2) highly concurrent applications.
3) Automatic Recovery.
4) Fast primary key-based operations.
4.2.2 InnoDB optimization key points
1) try to use the short and integer primary keys.
2) Load/Insert data is in the primary key order. If the data is not sorted by the primary key, sort the data first and then perform database operations.
3) SET UNIQUE_CHECKS = 0 and SET FOREIGN_KEY_CHECKS = 0 in the Load data to avoid overhead for checking Foreign keys and uniqueness constraints.
4) use prefix keys. Because InnoDB does not have the key compression function.
4.2.3 InnoDB server side settings
Innodb_buffer_pool_size: This is the most important setting of InnoDB and has a decisive impact on InnoDB performance. The default setting is only 8 M, so the InnoDB performance under the default database settings is poor. You can set a memory of 60-80% on the database server that only has the InnoDB Storage engine. More precisely, set a memory size of 10% larger than that of InnoDB tablespaces when the memory capacity permits.
Innodb_data_file_path: Specifies the table data and index storage space, which can be one or more files. The last data file must be automatically expanded, and only the last file can be automatically expanded. In this way, when the space is used up, the data file is automatically expanded (in 8 Mb) to accommodate additional data. For example, innodb_data_file_path =/disk1/ibdata1: 900 M;/disk2/ibdata2: 50 M: autoextend two data files are stored on different disks. Data is first placed in ibdata1. When the data reaches MB, it is placed in ibdata2. Once it reaches 50 MB, ibdata2 will automatically increase in 8 MB. If the disk is full, add a data file to another disk.
Innodb_autoextend_increment: The default value is 8 Mb. If there is a large volume of insert data at a time, you can add it as appropriate.
Innodb_data_home_dir: directory for storing tablespace data. It is in the mysql data directory by default. Setting it to a partition different from the MySQL installation file can improve performance.
Innodb_log_file_size: this parameter determines the recovery speed. If it is too large, the rediscovery process will be slow. If it is too small, the query performance will be affected. Generally, MB can take into account both the performance and the rediscovery speed.
.
Innodb_log_buffer_size: the disk speed is very slow. Writing logs directly to the disk affects InnoDB performance. This parameter sets the log buffer size, generally 4 MB. If there is a large blob operation, you can increase it as appropriate.
Innodb_flush_logs_at_trx_commit = 2: This parameter sets the processing of log information in the memory when the transaction is committed.
1) = 1. When each transaction is committed, the log buffer is written to the log file to refresh the disk operations of the log file. Truly ACID. Slow speed.
2) = 2. When each transaction is committed, the log buffer is written to the file, but the disk operation is not refreshed. Transactions in the last second will be deleted only when the operating system crashes or power is down. Otherwise, transactions will not be lost.
3) = 0, the log buffer is written to the log file once per second, and the log file is refreshed by disk operations. Any crash of the mysqld process will delete the last second of the transaction before the crash
Innodb_file_per_table: stores each InnoDB table and its indexes in its own files.
Transaction-isolation = READ-COMITTED: Doing This setting can improve performance if the application can run at the READ-COMMITED isolation level.
Innodb_flush_method: sets the InnoDB synchronous IO method:
1) Default-use fsync ().
2) O_SYNC is usually slow to open files in sync mode.
3) O_DIRECT: Use Direct IO on Linux. This can significantly increase the speed, especially on RAID systems. Avoid extra data replication and double buffering (mysql buffering and OS buffering ).
Innodb_thread_concurrency: Maximum number of threads in InnoDB kernel.
1) set to at least (num_disks + num_cpus) * 2.
2) You can disable this restriction by setting it to 1000.
5. Cache
There are many cache types. Adding appropriate cache policies to applications can significantly improve application performance. As application caching is a big topic, further research is needed.
6. Reference
1) http://www.mysqlperformanceblog.com/
2) Advanced MySQL Performance Optimization, Peter Zaitsev, Tobias Asplund, MySQL Users Conference 2005
3) Improving MySQL Server Performance with Intel C ++ Compiler, Peter Zaitsev, and Linux World 2005
4) MySQL Performance Optimization, Peter Zaitsev, Percona Ltd, open source database conference 2006
5) MySQL Server Settings Tuning, Peter Zaitsev, co-founder, Percona Ltd, 2007
6) MySQL Reference Manual