High Performance MySQL Reading Notes (on), high performance mysql

Source: Internet
Author: User

High Performance MySQL Reading Notes (on), high performance mysql
High Performance MySQL is a classic book. From the application layer to the database to the hardware platform, all the tuning techniques and common problems are mentioned. Various concepts and techniques of the database are commonly used, such as indexes, partitions, and Sharding. However, you still have to learn how to improve it.
Chapter 1: MySQL Architecture and History


1.1 Transaction Isolation Level
The transaction isolation level is a common issue, but most of the materials are confused when it comes to dirty reads, Phantom reads, and repeated reads. Therefore, we can learn the most from our own practices. Recommended article: MySQL database transaction isolation level.
1.2 Implicit and Explicit Locking
InnoDB automatically manages locks at the transaction isolation level by default, and supports Explicit Locking not mentioned in two standard SQL statements ):
  • SELECT... LOCK IN SHARE MODE
  • SELECT... FOR UPDATE
  • LOCK/UNLOCK TABLES
InnoDB uses the Two-phase Locking Protocol ). Lock at any time in the transaction, and release all locks together when the transaction is committed or rolled back. The two-phase lock protocol (two-phase commit with Distributed Transaction XA) is as follows. If one of the locks is unavailable, the entire application will fail and the transaction will not be executed. At the end of the transaction, all locks will be released at one time. The Protocol does not cause deadlocks, but the transaction concurrency is not high. "Two-phase lock protocol: the entire transaction is divided into two phases. The first phase is locking, and the latter is unlocking. In the lock phase, transactions can only be locked or operated on data, but cannot be unlocked. After the transaction releases the first lock, it enters the unlock phase. In this process, the transaction can only be unlocked, operated on data, and cannot be locked. Two-phase lock protocol Make the transaction highly concurrentBecause unlocking does not have to happen at the end of the transaction. Its The problem of deadlock is not solved.Because it is not required in order in the locking phase. For example, if two transactions apply for the and B locks respectively, and then apply for the lock of the other party, the transaction enters the Deadlock State.
1.3 Multiversion Concurrency Control
Similar to the Optimistic Locking Mechanism, but some articles have introduced that InnoDB implementation is not purely MVCC. Mark it first and go back to the source code study. Add to favorites: He dengcheng's InnoDB multi-version (MVCC) Implementation brief analysis, MVCC in MySQL of the old canonicone. "The implementation of Innodb is really not MVCC, because it does not implement multi-version coexistence of the core. The content in the undo log is just a serialized result and records the process of multiple transactions, does not belong to multi-version coexistence. However, the ideal MVCC is hard to implement. When a transaction only modifies one row of records and uses the ideal MVCC mode, the transaction can be rolled back by comparing the version number; however, when a transaction affects multiple rows of data, the ideal MVCC data is powerless.
"The root cause of the difficulty in implementing the ideal MVCC is the attempt to replace the second commit with an optimistic lock. Two rows of data are modified, but to ensure consistency, there is no difference with modifying the data in the two distributed systems. Second commit is the only way to ensure consistency in this scenario. The essence of the second commit is locking. The essence of optimistic locks is to eliminate locks. The two are in conflict, so it is difficult to apply the ideal MVCC in practice, Innodb only uses the MVCC name to provide read non-blocking..

Chapter 4: Optimizing Schema and Data Types
4.1 Choosing Optimal Data Types
At the beginning of this chapter, we have put forward the general design principles of models and data types, that is: 4.2 Using ENUM Instead Of A String Type
MySQL internally saves the enumeration as an integer, and saves the correspondence between the enumeration and the integer through a Lookup Table. Therefore, the use of enumeration is very space-saving (the smaller the principle 1, the faster the better), depending on the total number of enumerations, it will only occupy 1 or 2 bytes. However, the following problem arises: alter table is required to add and delete enumeration values. In addition, when using Lookup Table for conversion, there will also be overhead, especially when it is connected to CHAR or VARCHAR columns, but sometimes this overhead can be offset by the advantages of enumerative space saving.
4.3 Cons of A Normalized Schema
The Normalized Schema not only increases the number of joins, but also separates the columns that can belong to an index into different tables. For example: SELECT... FROM message inner join user USING (user_id) WHERE user. account_type = 'premium 'order by message. published desc limit 10 there are two kinds of execution plans:
The above problems are essentially: JOIN makes it impossible for us to complete sorting and filtering through an index at the same time.. Change to non-Standardization => SELECT .. FROM user_message WHERE account_type = 'premium 'order BY published desc limit 10 then the indexes on (account_type, published) can efficiently complete the task!
4.4 Cache and Summary Tables
This section is followed by the previous discussions on the Pros and Cons of Normalized and Denormalized Schema. Several common and practical methods for redundant data are proposed from 4.4 to 4.6. These technologies are essentially designed to accelerate query operations, but the cost is to slow down write operations and increase development complexity.
A Cache Table is a Table that contains data that can be easily obtained from the Schema. That is, the data in the Table is Logically Redundant ). A Summary Table (Summary/Roll-up Table) is a Table that contains data obtained through an aggregate function. For example, the Table data is obtained through group.
Why do we need them? The most common scenario is statistical work such as reports. Generating these statistics requires scanning of a large amount of data. Real-time computing costs are high and often unnecessary. In addition, you need to add a large number of composite indexes (in various dimensions) to query the data to improve the performance. However, these indexes may affect normal operations such as update and insertion. Therefore, common technologies are to add Intermediate tables to other engines (using MyISAM's smaller index and full-text retrieval capabilities), and even other systems (Lucene or sphworkflow ).
With the intermediate table as the cache, We need to regularly update or recreate it. Shadow Table is a good technique! Mysql> drop table if exists my_summary_new, my_summary_old; mysql> create table my_summary_new LIKE my_summary; mysql> rename table my_summary TO my_summary_old, users TO my_summary only need one rename operation, we can replace the shadow table atomically (swap with an atomic rename) and keep the previous table to prevent rollback.
4.5 Materialized Views
Materialized views are views that are pre-computed and actually stored on disks (generally, views are not actually stored, and corresponding SQL statements are executed to obtain data when accessing views ). MySQL does not have materialized views, but there is a great open-source implementation of Flexviews Tools. It has some useful functions, such as: specifically, it uses the Row-based Binary Log to contain the front and back data of the changed rows, therefore, Flextviews can know the data before and after the change without accessing the source table, and recalculate the materialized view. This is its advantage over the self-maintained Cache table or Summary table.
4.6 Counter Tables
A common problem in Web applications is the concurrent access to the counting table. This book proposes a solution to increase the concurrency. The overall design idea is to add more slots to distribute concurrent access. Similar to the ConcurrentHashMap design concept in Java Concurrent and package.
The counting TABLE and the corresponding access SQL statement can be simplified as follows: mysql> CREATE TABLE hit_counter (cnt int unsigned not null) ENGINE = InnoDB; mysql> UPDATE hit_counter SET cnt = cnt + 1;
It can be seen that a row of counter data in the table is actually equivalent to a global lock, and its update will be serialized. Therefore, the Slot column is added when the table is created. And initialize 100 data records. Create table hit_counter (slot tinyint unsigned not null primary key, cnt int unsigned not null) ENGINE = InnoDB;
Change the UPDATE and query SQL statement to mysql> UPDATE hit_counter SET cnt = cnt + 1 WHERE slot = RAND () * 100; mysql> SELECT SUM (cnt) FROM hit_counter;
Ps: If you need to refresh the counter every day, add the Time column: create table daily_hit_counter (day date not null, slot tinyint unsigned not null primary key, cnt int unsigned not null, primary key (day, slot) ENGINE = InnoDB;
Pss: If you do not want to INSERT initial data every day, use the following SQL: mysql> INSERT INTO daily_hit_counter (day, slot, cnt) VALUES (CURRENT_DATE, RAND () * 100, 1) on duplicate key update cnt = cnt + 1;
Psss: To reduce the number of counters to save space, you can execute a scheduled task to merge all records to Slot 0:

Chapter 5: Indexing for High Performance
5.1 B-Tree Family
We generally discuss database indexes, which refer to B-tree indexes. MySQL's CREATE TABLE statements and other statements also use this statement. However, the storage engine may use different storage structures. For example, NDB uses the T-tree (different index types are also mentioned in my introduction to memory databases. T-tree is very suitable for memory storage), while InnoDB uses B + tree.
To be precise, we use various variants of Tree B in the family B. The core of various variants is the internal node degree (such as memory-based T-tree and disk-based B-tree), storage usage (B-tree and B + tree), and so on.
First, the biggest difference between B and other data structures, such as the Red/black tree and the AVL Tree is that B has many subnodes. This is designed to reduce disk I/O read overhead. Because there are many sub-nodes, the overall height of the tree is very low, so that you only need to load a small number of disk pages to find the target data. The difference between B and B + is that the inner node of B + does not store data (that is, the pointer to the data row where the key is located), and only the key is saved. Advantages of B + tree:
  • Because the internal nodes do not store data, more keys can be stored on a disk page, and the height of the tree is further reduced, thus speeding up key search hits.
  • You only need to perform linear traversal on the leaf nodes of the B + tree to perform full-tree traversal (such as range query or full scan of a field, tree B needs to be traversed. Linear traversal has a higher hit rate than tree traversal (because the adjacent data is very close, it will not be scattered in the left and right subtree of the node, so the probability of cross-page traversal is lower)
Advantages of Tree B:
  • The search in Tree B may end at the internal node, while the B + tree must end at the leaf node.



First, we will reference an example of B-tree search:
"Next, let's simulate the process of searching for file 29:
Analyze the above process and find 3 disk IO operations and 3 memory lookup operations. The file name search in the memory is an ordered table structure, which can be used to improve efficiency. IO operations are the deciding factor that affects the efficiency of the entire B-tree search. Of course, if we use the disk storage structure of the balanced binary tree for search, the disk is 4 times, up to 5 times, and the more files, b-tree performs less I/O operations on disks than Binary Trees.
The B + tree looks like this:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.