1. Introduction
In the Web application architecture, the data persistence layer (usually a relational database) is a key core part that has a very important impact on the performance of the system. MySQL is currently the most used open source database, but the MySQL database default settings performance is very poor, just a toy database. Therefore, the use of MySQL database in the product must be the necessary optimization.
Optimization is a complex task, this article describes MySQL-related database design and query optimization, server-side optimization, storage engine optimization.
2. Database design and query optimization
In MySQL server performance tuning, the first thing to consider is the database schema design, which is very important. A bad schema design can perform poor performance even when running on a performance-tuning MySQL server; similar to schema, the design of query statements can also affect MySQL performance and should avoid writing inefficient SQL queries. This section will discuss in detail these two aspects of optimization.
2.1 Schema Design
The optimization of schema depends on what query is going to run, and different query will have different schema optimization schemes. Section 2.2 describes the optimization of query design. Schema design is also affected by the size of the expected data set. Schema design is mainly considered: standardization, data type, index.
2.1.1 Standardization
Standardization is the process of organizing data in a database. This includes creating tables based on design rules and establishing relationships between them, and by eliminating redundancy and inconsistent dependencies, the design rule protects the data and increases the flexibility of the data. Generally, database standardization is to make database design conform to a certain level of paradigm, which usually satisfies the third normal form. There are also fourth paradigms (also known as Boyce Codd Paradigms, BCNF)) and fifth normal paradigms, but are rarely considered in practical design. Ignoring these rules may make the database design less perfect, but this should not affect functionality.
Characteristics of standardization:
1 All "objects" are in its own table, without redundancy.
2 The database is usually generated by E-R diagram.
3) Simplicity, updating properties usually requires only a few updates to the record.
4) The join operation is time-consuming.
5) Select,sort optimization measures are relatively few.
6) is suitable for OLTP applications.
Non-standardized features:
1 Store a lot of data in a table, data redundancy.
2 Update data is very expensive, update a property may update a lot of tables, many records.
3 It is possible to lose data when deleting data.
4 Select,order has a lot of optimization options.
5) applicable to DSS applications.
Both standardization and non-standardization have their own advantages and disadvantages, usually in a database design can be mixed, some tables standardized, some tables retain some redundant data:
1 Standardization of OLTP usage, non-standardized use of DSS
2 use materialized views. MySQL does not directly support this database feature, but it can be replaced with a MyISAM table.
3 Redundancy Some data in the table, for example, ref_id and name exist in the same table. But pay attention to updating the problem.
4 for some simple objects, use value directly as the building. such as IP address, etc.
5) Reference by Primary/unique KEY. MySQL can optimize this operation, for example:
Java code
Select City_name
From City,state
where State_id=state.id and state.code= ' CA ' "converted to" select City_name from city where state_id=12
2.1.2 Data type
One of the most basic optimizations is to make the table as small as possible on disk. This can lead to very high performance, because the data is small, the disk reads faster, and the table content is handled less memory during the query process. At the same time, indexing on smaller columns also takes up less resources.
You can use the following techniques to make the table more performance and minimize storage space:
1 Use the correct and appropriate type, do not store the number as a string.
2 Use the most effective (minimum) data type as much as possible. MySQL has a number of specialized types that save disk space and memory.
3 use smaller integer types as much as possible to make the table smaller. For example, Mediumint is often better than int because the Mediumint column uses 25% less space.
4) If possible, the declaration is listed as not NULL. It makes everything faster and saves one per column. Note If you do need null in your application, you should use it without question, just avoid having it on all columns by default.
5 for MyISAM tables, if there are no variable-length columns (VARCHAR, text, or BLOB columns), use a fixed-size record format. It's quicker but unfortunately it could be a waste of space. Even if you have already used the Create option to make the varchar column row_format=fixed, you can also hint that you want to use a fixed length line.
6 Use sample character set, such as Latin1. Use Utf-8 as little as possible, because the utf-8 occupies 3 times times as much space as the latin1. You can use latin1, such as Mail,url, on a field that does not need to use utf-8.
2.1.3 Index
All MySQL column types can be indexed. Using indexes on related columns is the best way to improve the performance of select operations. The following points should be noted in using the index:
1 MySQL will only use prefixes, such as key (A, B) ... where b=5 will not use the index.
2 to select the index of use. It is not good to use an index on a column with few changes, such as a sex column.
3 Define the unique index on the unique column.
4 Avoid establishing an index that cannot be used.
5 in Btree index (InnoDB uses btree), you can index the columns that need to be sorted.
6 Avoid duplicate indexing.
7 avoid indexing on the prefix of an existing index. For example, if Index (A,B) is present, the index (a) is removed.
8 control the length of a single index. Use Key (name (8)) to index the first few characters of the data.
9 The shorter the key value the better, the best use of integer.
10 in the query to use the index (using explain view), you can reduce the number of read disk, speed reading data.
11) Similar key values are better than random. Auto_increment is better than a UUID.
Optimize table can compress and sort index, and be careful not to run frequently.
Analyze table to update data.
2.2 Designing Queries
The optimization of query statements is a case by case problem, different SQL has different optimization scheme, here I only list some common skills.
1 in case of index, try to ensure that the query uses the correct index. You can use the explain select ... View the results and analyze the query.
2 The matching type is used when querying. For example, select * from a where id=5, if the ID is a character type, and there is index, this query will not use the index, do a full table scan, the speed will be very slow. The correct one should be ... where id= "5", with quotation marks indicating that the type is a character.
3 Use--log-slow-queries–long-query-time=2 to view the statements that are slow in the query. Then use the Explain analysis query to make the optimization.
3. Server-Side optimization
3.1 MySQL Installation
MySQL has many distributions, preferably using a binary version of the MySQL AB release. You can also download the source code to compile the installation, but some bugs in the compiler and class library may cause potential problems with the compiled MySQL.
If the server that is installing MySQL is using Intel Corporation's processor, you can use the Intel C + + compiled version, and in a PPT in Linux World2005, the MySQL query compiled using the Intel C + + compiler is faster than the normal version by about 30%. The Intel C + + compiled version can be downloaded at the MySQL official website.
3.2 Server Setup Optimization
MySQL default settings performance is poor, so to do some adjustment of parameters. This section describes some general tuning of parameters, and does not involve specific storage engines (mainly referring to MYISAM,INNODB, related optimizations are described in 4).
--character-set: If a single language uses a simple character set such as Latin1. As little as possible use utf-8,utf-8 occupy more space.
--memlock: Locking MySQL can only run in memory, avoiding swapping, but if there is not enough memory, there may be an error.
--max_allowed_packet: To be large enough to fit the larger SQL query, there is not much impact on performance, mainly to avoid packet errors.
Maximum allowable connections for--max_connections:server. Too much will appear out of memory.
--table_cache:mysql keeps the number of open table at the same time. Opening the table is expensive. The general setting is 512.
--query_cache_size: The amount of memory used to cache queries.
--datadir:mysql the root directory where the data is stored, and the installation files separate on different disks can improve performance.
4. Storage Engine Optimization
MySQL supports different storage engines, mainly using MyISAM and InnoDB.
4.1 MyISAM
MyISAM Manage non-transaction tables. It provides high-speed storage and retrieval, as well as full-text search capabilities. MyISAM is supported in all MySQL configurations and is the default storage engine, unless you configure MySQL by default to use another engine.
4.1.1 MyISAM Characteristics
4.1.1.1 MyISAM Properties
1 does not support transactions, downtime damage table
2 use of small memory and disk space
3 table-based locks, concurrent update data can cause serious performance problems
4 MySQL only cache index, data by the OS cache
4.1.1.2 Typical MyISAM usages
1) Log System
2) read-only or mostly read-operated applications
3) Full table scan
4) Batch Import data
5 low concurrent read/write with no transaction
Key points for optimization of 4.1.2 MyISAM
1 The declaration is listed as not NULL and can reduce disk storage.
2 Use Optimize table to defragment and reclaim free space. Note that you only run after very large data changes.
3 when deleting/updating/adding a large amount of data, it is forbidden to use index. Use alter TABLE t DISABLE KEYS.
4) Set myisam_max_[extra]_sort_file_size large enough to significantly improve the speed of repair table.
4.1.3 MyISAM Table Locks
1) Avoid concurrent insert,update.
2 can use insert delayed, but it is possible to lose data.
3 optimize the query statement.
4) Horizontal partition.
5) Vertical partition.
6 If none works, use InnoDB.
4.1.4 MyISAM Key Cache
1) Set key_buffer_size variable. Myisan The most important cache setting, which is used to cache the index data for the MyISAM table, which affects only MyISAM. You typically set the memory size of 25-33% in a server that uses only MyISAM.
2 can use several different key caches (for some hot data).
A) SET GLOBAL test.key_buffer_size=512*1024;
b) CACHE INDEX t1.i1, t2.i1, T3 in test;
2 preload index to cache can improve the speed of query. Because preloading index is in order, so very fast.
A) LOAD INDEX into the CACHE t1, T2 IGNORE LEAVES;
4.2 InnoDB
INNODB provides MySQL with transaction security (acid-compatible) storage engine with Commit, rollback, and crash recovery capabilities. InnoDB provides row level lock and also provides an oracle-consistent, unlocked read in the SELECT statement. These features add to multi-user deployment and performance. There is no need to enlarge the lock in the InnoDB because row level lock in InnoDB is suitable for very small spaces. InnoDB also supports foreign key constraints. In SQL queries, you can freely mix innodb types of tables with other types of MySQL tables, even in the same query.
InnoDB is designed to maximize performance when dealing with large amounts of data. Its CPU usage efficiency is very high.
The InnoDB storage engine is fully integrated with the MySQL server, and the InnoDB storage engine maintains its own buffer pool for caching data and indexes in memory. InnoDB stores its table & index in a tablespace, a tablespace can contain several files (or a raw disk partition). This is different from the MyISAM table, for example in the MyISAM table where each table is separated from the file. The InnoDB table can be of any size, even if the file size is limited to 2GB operating systems.
Many large database sites that require high performance use the InnoDB engine. The famous Internet news site slashdot.org runs on the InnoDB. Mytrix, Inc. stores more than 1TB of data on InnoDB, and some other sites handle an average of 800 inserts/updates per second on the InnoDB.
4.2.1 InnoDB characteristics
4.2.1.1 InnoDB Properties
1 support transaction, ACID, foreign key.
2 Row level locks.
3 supports different isolation levels.
4) requires more memory and disk space than MyISAM.
5) No key compression.
6 data and indexes are slow to exist in the memory hash table.
4.2.1.2 InnoDB Good for
1 requires the application of the transaction.
2) high concurrent applications.
3) Automatic recovery.
4 faster operation based on primary key.
Key points for optimization of 4.2.2 InnoDB
1) Try to use Short,integer primary key.
2) Load/insert data by primary key order. If the data is not sorted by primary key, first sort and then database operations.
3 The load data is for setting set Unique_checks=0,set foreign_key_checks=0, you can avoid the cost of foreign key and uniqueness constraint checking.
4 use prefix keys. Because InnoDB does not have a key compression function.
4.2.3 InnoDB server-side settings
Innodb_buffer_pool_size: This is the most important setting for InnoDB, which has a decisive effect on InnoDB performance. The default setting is only 8M, so the default database settings below InnoDB performance is poor. On the database server with only the InnoDB storage engine, you can set up 60-80% memory. To be more precise, set the memory size of 10% larger than the InnoDB tablespaces if memory capacity allows.
Innodb_data_file_path: Specifies the space for table data and index storage, which can be one or more files. The last data file must be automatically expanded, and only the last file will allow automatic expansion. This way, when the space is exhausted, the automatic expansion data file automatically grows (in 8MB) to accommodate additional data. For example: Innodb_data_file_path=/disk1/ibdata1:900m;/disk2/ibdata2:50m:autoextend two data files on different disks. The data is first placed in the Ibdata1, and when 900M is reached, the data is placed in the IBDATA2. Once the 50MB,IBDATA2 is reached, it will grow automatically in 8MB units. If the disk is full, you need to add a data file to the other disk.
Innodb_data_home_dir: Place table space Data directory, default in MySQL data directory, set to and MySQL installation files different partitions can improve performance.
Innodb_log_file_size: This parameter determines the recovery speed. If it's too big, recovery will be slow, too small. Impact query performance, generally 256M can take account of performance and recovery speed
。
Innodb_log_buffer_size: Disk speed is very slow, the log write directly to the disk will affect the performance of InnoDB, this parameter sets the size of log buffer, General 4M. If there is a large blob operation, you can increase it appropriately.
innodb_flush_logs_at_trx_commit=2: This parameter sets the processing of log information in memory when transaction commits.
1 = 1 o'clock, when each transaction commits, the log buffer is written to the log file, and the log file is flushed to disk operations. Truly ACID. Speed is slow.
2 = 2 o'clock, the log buffer is written to the file at each transaction commit, but the log file is not refreshed for disk operations. Only the operating system crashes or loses power will delete the last second of the transaction, otherwise the transaction will not be lost.
3 = 0 o'clock, log buffers are written to log files once per second, and disk operations are refreshed on the log files. Any mysqld process crash deletes the last second of the transaction before the crash
Innodb_file_per_table: You can store each InnoDB table and its index in its own file.
Transaction-isolation=read-comitted: If the application can run at the Read-commited isolation level, there is a certain performance boost for this setting.
Innodb_flush_method: How to set up InnoDB synchronous IO:
1) default– use Fsync ().
2) O_sync open files in SYNC mode, usually slower.
3) O_direct, use DIRECT IO on Linux. Can significantly improve speed, especially on RAID systems. Avoid additional data replication and double buffering (MySQL buffering and OS buffering).
Innodb_thread_concurrency:innodb kernel The maximum number of threads.
1) at least set to (Num_disks+num_cpus) *2.
2) can be set to 1000来 to prohibit this restriction
5. Caching
There are many types of caching, and adding an appropriate caching policy to your application can significantly improve the performance of your application. Because application caching is a big topic, this part needs further research.