"High Performance MySQL" is a classic good book, from the application layer to the database to the hardware platform, a variety of tuning skills, FAQs are all mentioned. The database of various conceptual skills usually have contact, such as index, partition, sharding, etc., but to really improve or so the system to learn a bit.
Chapter 1:mysql Architecture and history
1.1 Transaction Isolation level
Transaction isolation level is a commonplace problem, but most of the materials mentioned dirty reading, phantom reading, repetition reading is foggy, so it is the best practice to experience the deepest. Recommended article: MySQL database transaction isolation level.
1.2 Implicit and Explicit Locking
InnoDB automatically manages locks based on the transaction isolation level by default, while supporting two standard SQL not mentioned display locks (Explicit Locking):
- SELECT ... LOCK in SHARE MODE
- SELECT ... For UPDATE
- Lock/unlock TABLES
The InnoDB uses a two-phase lock protocol (two-phase Locking Protocol). Locks at any time within a transaction, releasing all locks together at the last commit or rollback. The two-Phase lock protocol (two-phase commit with Distributed transaction XA) is detailed as follows:"One-time lock protocol: At the beginning of a transaction, all locks are applied once, and no locks are requested. If one of the locks is not available, the entire request is unsuccessful, the transaction is not executed, and at the end of the transaction, all locks are released at once. A one-time lock protocol does not cause a deadlock problem, but the concurrency of the transaction is not high."Two-Phase lock protocol: The whole transaction is divided into two stages, the previous one is lock, and the latter one is unlocked. During the lock-in phase, transactions can be locked or manipulated, but cannot be unlocked. Until the first lock is released by the transaction, it enters the unlocking phase, in which the transaction can only be unlocked, the data manipulated, and the lock cannot be added. Two-Phase Lock protocol
enables transactions to have a high degree of concurrency, because unlocking does not have to happen at the end of a transaction. It's
not enough to solve the deadlock problemBecause it has no sequential requirements in the lock phase. If two transactions respectively applied for A, B lock, and then apply for the other party's lock, at this time into a deadlock state.
1.3 multiversion Concurrency Control
Similar to the optimistic locking mechanism, but some articles introduced to the INNODB implementation are not purely MVCC. First mark, go back to the deep source research. Favorite article: Ho Dengcheng InnoDB Multi-version (MVCC) Implementation of a brief analysis of old yards of the MySQL in the MVCC."InnoDB implementation is really not MVCC, because there is no multi-version of the core coexistence, undo log content is only the result of serialization, the process of recording multiple transactions, not part of the multi-version coexistence. But the ideal MVCC is difficult to implement, and when a transaction modifies only one row of records using the ideal MVCC pattern is no problem and can be rolled back by comparing the version number, but when the transaction affects multiple rows of data, the ideal MVCC is powerless.
"The fundamental reason why the ideal MVCC is difficult to achieve is the attempt to replace the two-paragraph submission with optimistic locking. Two rows of data are modified, but to ensure consistency, there is no difference between modifying data in two distributed systems, and two commits are the only means of ensuring consistency in this scenario at this time. Two paragraph of the nature of the submission is locked, the essence of the optimistic lock is to eliminate the lock, the contradiction between the two, so the ideal MVCC is difficult to actually be applied in practice,
InnoDB just borrowed the name of MVCC and provided a non-blocking read .。
Chapter 4:optimizing Schema and Data Types
4.1 Choosing Optimal Data Types
This chapter provides an incisive overview of the general design principles for patterns and data types:
- Smaller is usually better (the smaller is usually the better): because it consumes less disk space, memory, and CPU cache, the smaller it usually represents the faster.
- Simple is good (simply good): Because character sets and collations (Collation) make strings more complex, we should use built-in types such as integers instead of strings to hold datetime or IP addresses.
- Avoid NULL if possible (avoid null if possible): MySQL has special processing logic for NULL, so null makes indexes, index statistics, and value comparisons more complex.
4.2 Using ENUM Instead of A String Type
Inside MySQL, the enumeration is saved as an integer, and the corresponding relationship between the enumeration and the integer is saved through a lookup table. So the use of enumerations is very space-saving (as in principle 1 the smaller the better the faster), depending on the total number of enumerations, only 1 or 2 bytes are consumed. However, the problem with this is that the Add Delete enumeration value will be alter TABLE. And there is also overhead when using lookup table for transformations, especially when joining with columns of char or varchar type, but sometimes this overhead can be offset by the advantage of enumerating space savings.
4.3 Cons of A normalized Schema
The normalization paradigm (normalized Schema) not only increases the number of joins, but also separates columns that could belong to one index into different tables. For example:SELECT ... From message INNER JOIN user USING (user_id)WHERE user.account_type = ' Premium 'ORDER by message.published DESC LIMITThere are two types of execution plans:
- The reverse published index scans the message table, each row goes to the user table to check if type is ' premium ' until 10 rows are found.
- Go account_type index Scan the user table to find all rows of type ' premium ' and return 10 rows after filesort.
The essence of the above question is:
join allows us to complete sorting and filtering at the same time through an index。 Change to non-normalized =SELECT: From User_messageWHERE account_type = ' Premium 'ORDER by published DESC LIMITThen the index on (Account_type, published) can accomplish the task efficiently!
4.4 Cache and Summary Tables
This section immediately discusses the pros and cons of the normalized and denormalized schemas from 4.4 to 4.6, and presents several common and useful methods for redundant data.
These technologies are essentially designed to speed up query operations, but at the cost of slowing down write operations and increasing the complexity of development.
A cache table is a table that contains data that can be easily obtained from the schema, where the data in the table is logical redundancy (logically redundant). A summary/roll-up table is a table that contains data obtained through an aggregate function, such as a table in which the data is obtained through group by.
Why do you need them? The most common scenario is statistical work such as reports. Generate these statistics to scan large amounts of data, real-time computing costs are high and many times not necessary. And querying this data also increases the amount of composite indexes (various dimensions) that can improve performance, but these indexes also affect the operations of normal updates and insertions. The common technique is to add intermediate tables to other engines (leveraging MyISAM's smaller index and full-text retrieval capabilities), and even other systems (Lucene or Sphinx).
With the intermediate table as the cache, we need to update it regularly or rebuild it. The Shadow Table (Shadow table) is a good technology! Mysql>DROP TABLE IF EXISTS my_summary_new, my_summary_old;Mysql>CREATE TABLE my_summary_new like my_summary;Mysql>RENAME TABLE my_summary to My_summary_old, my_summary_new to My_summaryWith just one rename operation, we can atomically replace the shadow table (swap with an atomic rename), and the previous table remains in case it needs to be rolled back.
4.5 materialized views
A materialized view is a view that is pre-computed and really stored on disk (the general view is not actually stored, and the corresponding SQL gets data when the view is accessed). MySQL does not have materialized views, but there is a great open source implementation of Flexviews Tools. It has some very useful features, such as:
- The CDC (change Data Capture) tool can read the log (Binary Logs) and extract the corresponding row changes.
- A set of stored procedures that help define and manage views
- Tools that will change the response to materialized view data
Specifically, it uses row-based logs (row-based Binary log) to contain the data before and after changing rows, so flextviews can know the data before and after changes without having to access the source table, and recalculate materialized views. This is the advantage of it compared to the cache table or summary table that we maintain ourselves.
4.6 Counter Tables
A common problem with Web applications is the concurrent Access Count table, which proposes scenarios to increase concurrency. The overall design idea is to add more slots to disperse concurrent access. Concurrent with Java and the Concurrenthashmap design concept in the contract is somewhat like.
The Count table and the corresponding access SQL can be simplified as follows:mysql>CREATE TABLE hit_counter (cnt int unsigned not NULL) Engine=innodb;Mysql>UPDATE hit_counter SET cnt = cnt + 1;
As you can see, a row of counter data in a table is actually equivalent to a global lock, and updates to it will be serialized. Therefore, the first time you build a table, add a slot column. and initialize 100 of data.CREATE TABLE Hit_counter (slot tinyint unsigned NOT null primary key,CNT int unsigned NOT NULL) Engine=innodb;
Then change the update and query SQL to:mysql>UPDATE hit_counter SET cnt = cnt + 1 WHERE slot = RAND () *;Mysql>SELECT SUM (CNT) from Hit_counter;
PS: If you need to refresh the counters every day, add a time column when you build the table:CREATE TABLE Daily_hit_counter (Day date is not NULL,slot tinyint unsigned NOT null primary key,CNT int unsigned not NULL,primary KEY (day, slot)) Engine=innodb;
PSS: If you do not want to insert the initial data every day, you can use the following sql:mysql>INSERT into Daily_hit_counter (day, slot, CNT)VALUES (Current_date, RAND () * 1)On DUPLICATE KEY UPDATE cnt = cnt + 1;
PSSs: If you want to reduce the number of rows in the counter to save space, you can perform a recurring task to merge all records into slot 0:
Chapter 5:indexing for high performance
5.1 b-tree Family
In general, when we talk about database indexing, we actually refer to the B-tree index, which is also true in MySQL's CREATE table and other statements. In practice, however, a different storage structure may be used inside the storage engine. For example NDB uses a T-tree (for different index types, also mentioned in my other introduction to the memory database.) The T-Tree is ideal for memory storage, while InnoDB uses B + trees.
So, to be precise, we are using various variants of B-trees in the B-tree family.
The core of various variants revolves around the degree of internal node-out (e.g., memory-based T-tree and disk-based B-tree), storage usage (b-tree and + + tree).
The biggest difference between B-tree and other data structures such as red-black tree and ordinary AVL tree is that the nodes of B-Tree have many sub-nodes. This is designed to reduce disk I/O read overhead. Because of the number of sub-nodes, the overall height of the tree is very low, so that only a small amount of disk pages can be loaded to find the target data. So what's the difference between B-and + + trees: There's no data (the pointer to the data row where the key is located), and only the key is stored. Benefits of B + trees:
- Because the internal node does not have data, so on a disk page can save more keys, the height of the tree is further reduced, thereby speeding up the key lookup hit.
- When all tree traversal is required (for example, a field query or even full scan, which is a common and frequent query operation), only a linear traversal of the leaf nodes of the B + tree is required, and a tree traversal is required for the B. Tree. The linear traversal is higher than the tree traversal hit rate (because the adjacent data is very close, not scattered in the left and right sub-tree of the node, the probability of cross-page is lower)
Advantages of B-Tree:
- The lookup in the B-tree may end at the internal node, while B + trees must end at the Leaf junction.
First refer to the example of a B-tree lookup:
"Below, let's simulate the process of finding file 29:
- Locate the root disk Block 1 of the file directory based on the root node pointer, and import the information into memory. "Disk IO operation 1 times"
- In memory, there are two filenames, 17, 35, and three data that store the page addresses of other disks. According to the algorithm we found: 17<29<35, so we find the pointer p2.
- Based on the P2 pointer, we navigate to disk Block 3 and import the information into memory. "Disk IO operation 2 times"
- In memory, there are two filenames 26,30 and three data that store the page addresses of other disks. According to the algorithm we found: 26<29<30, so we find the pointer p2.
- Based on the P2 pointer, we navigate to disk Block 8 and import the information into memory. "Disk IO operation 3 times"
- There are two filenames in memory, 28, 29. According to the algorithm we find the file name 29 and locate the disk address of the memory.
Analyze the process above and discover the need to
3 Disk IO operations and 3 memory lookup operations。 As for the file name lookup in memory, because it is an ordered table structure, you can use binary lookup to improve efficiency. The IO operation is a determinant of the efficiency of the whole B-tree lookup. Of course, if we use the balanced binary tree disk storage structure to find, disk 4 times, up to 5 times, and the more files, B-tree than the balance of binary tree disk IO operations will be less, more efficient.
and the B + tree is like this:
Understanding the physical structure of the B-tree, let's summarize the types of queries that can take advantage of the B-tree index. This is very helpful for us to perform SQL optimizations. With:mysql>CREATE TABLE People (last_name varchar () NOT NULL,first_name varchar () NOT NULL,Date_of_birth date not NULL,gender enum (' m ', ' F ') is not NULL,key (last_name, first_name, Date_of_birth));
- Full column matching query:
- Partial column matching query:
- Column leftmost matching query:
- Scope query:
- Index-only query: Accesses the index only, without having to access the entire row data.
"High performance MySQL" Reading notes (top)