InnoDB ' s multi-versioning handling can be Achilles ' heel

Source: Internet
Author: User
Tags visibility
multiple versioning of InnoDB ' s multi-versioning handling can be Achilles ' heel >>>>innodb may cause significant problems

Peter Zaitsev |  December 17, 2014 | Posted In:innodb, Insight for DBAs, MySQL

I believe InnoDB Storage engine architecture is great for a lot of online workloads, however, there are no silver bullets In the technology and all design choices have their trade offs.    In this blog post I ' m going to talk about one important InnoDB limitation. Should. >>>> I believe that the InnoDB storage engine is good for most types of business load, yet none of the technologies are omnipotent, able to handle all the scenarios, all of which are pros and cons, and trade-offs are made when choosing. In this blog I will talk about one of the very important limitations of the INNODB storage engine that you need to consider.

InnoDB is a multiversion concurrency control (MVCC) storage engine which means many versions to the single row can Exis T at the same time. In fact there can be a huge amount of such row versions. Depending on the isolation mode your have chosen, InnoDB might have to keep all row versions going back to the earliest act Ive read view, but at the very least it would have to keep all versions going then to the start of SELECT query which be CU Rrently running. The    >>>>INNODB storage engine supports MVCC (InnoDB storage engine supports repeatable-read under the two isolation levels of read-committed and MVCC). This means that a row may have multiple versions in the database at the same time. Depending on the isolation level, INNODB control is also different, read-committed isolation level, at the beginning of the statement to create the Read-view (record the beginning of the statement of all active transactions), and Repeatable-read isolation level, Creates a read-view at the beginning of a transaction. InnoDB determines whether the record is visible based on the transaction ID in the Read-view and clustered index records. For more information on how to determine visibility when InnoDB storage engine versioning, refer to: Ho Dengcheng's "InnoDB Multiple Versioning (MVCC) implementation Profiling"

In most cases this isn't a big deal–if for you have many short transactions happening you'll have only a few row versions To deal with. If you are just use the system for reporting queries but does not modify data aggressively at the same time you also won't ha ve many row versions.    However, if you mix heavy updates with slow reporting queries going in the same time you can get into a lot of trouble. >>>> If your application is more of a trivial matter or your system is used to make report queries but there will be no large number of modifications, InnoDB storage Engine MVCC control will not cause some major problems.

Consider for example ' application with a hot row (something as actively updated counter) which has 1000 updates per se Cond together with some heavy batch job this takes to run.  In such case we'll have 1M of row versions to deal with. >>>> consider a situation where if your application modifies row data 1000 times per second and a large transaction needs to perform 1000s, there will be 1 million versions for that row of data.

Let's now talk about how to those old-row versions are stored in Innodb–they are stored in the undo spaces as an essentially Linked list where each row version points to the previous row version together with transaction visibility information th At helps to decide which version is visible by this query. Such Design Favors short new queries that would typically need to one of the newer rows, so they does not have to go too Far in this linked list. This might is the case with reporting queries, might need to read rather old row version which correspond to the T IME when the query is started or logical backups that use consistent reads (Think mysqldump or mydumper) W Hich often would need to access such very old row versions.    >>>> Now let's talk about how old versions of rows are stored in InnoDB, which are stored in the undo log, which is essentially a list, Each row on the clustered index contains the pointer information to the previous version and the information that is used to determine the visibility of the row (transaction ID information). Such a design is no problem for small transactions that only need access to the newer version. But for those reporting queries that take longer to execute, it may be necessary to access very old row versions, and the query can be costly (since the old version needs to be accessed all the time along the list until it is determined that the versionThe query is visible, and if the undo log that needs to be accessed is not in memory, then the IO cost is significant.

So going through the linked list of versions are expensive, but how do I expensive it can get? In this case a lot depends upon whenever UNDO spaces fits in memory, and so the list would be traversed efficiently–or it does not, in which case for you might is looking at the massive disk IO. Keep in mind undo spaces is isn't clustered by PRIMARY key, as normal data in InnoDB tables, so if you ' re updating multiple R oWS at the "Same time" (typical case) you are looking at the row-version chain stored in many pages, often as little as One row version per page, requiring either massive IO or a large amount of UNDO spaces pages to present in the InnoDB Buffe R pool.    >>>> traversing the entire row version of the list is a very resource-intensive operation, but in the end there is a lot of resource consumption. This depends largely on whether the undo you need to access (the old version of the row is saved in undo) is in memory (buffer pool), if all of the undo needs are in memory, the traversal efficiency is high, and if it is not in memory, it needs to be read from disk, so the IO resource is consumed ( Most are discrete io). Note that the data in undo does not want the normal InnoDB table to be saved in a clustered index format. So if you modify multiple rows at the same time, you will find that a list of rows is saved on multiple page, and when you visit the row list, you need to consume a lot of IO resources and a lot of memory space (buffer pool) to save the undo page

Where It can get even worse is Index Scan. This is because Indexes are structured in InnoDB to include all row versions corresponding to the key value, current and P Ast. This is means for example the index for key=5 'll contain pointers to all rows that either have value 5 now or had value 5 s Ome the past and have not been purged yet. Now where it can really bite is the following–innodb needs to know which of the values stored for the key are visible by    The current transaction–and so might mean going through all long-version to each of the keys. >>>> when you are accessing through the index, the situation is even worse. This is because the index points to all

This are all theory, so lets can we simulate such workloads and can-I-bad things really could get in practice.

I have created 1Bil rows "sysbench" table which takes some 270GB space and I would use a small buffer pool–6gb. I'll run Sysbench with threads Pareto distribution (hot rows) while running a full table scan query Concurrently:sel ECT avg (k) from Sbtest1 are exact sysbench run done after prepare. Shell

1 Sysbench--num-threads = Report-interval =--Max-time = 0--max-requests = 0--Rand-type = Pareto- -Oltp-table-size = 1000000000--Mysql-user root--Mysql-password = password--Test/usr/share/doc/sy SBENCH/TESTS/DB/OLTP. Lua Run

This is the explain for the "reporting" query, which you would the "I" rather efficient index scan query. With just 4 bytes 1 billion of values would is just 4G (really more because of InnoDB overhead) –not a big deal for Moder N Systems:shell

1 2 3 4 5 6 7 8 9 ( mysql > explain select Avg. k) from SB Test1 G * * * * * *

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.