On MySQL's two-phase commit protocol

Source: Internet
Author: User
Tags prepare

Turn from:

Http://www.cnblogs.com/hustcat/p/3577584.html

5.3.1 Transaction Submission Process

The transaction commit logic for MySQL is mainly done in the function Ha_commit_trans. The commit of a transaction involves the transaction commit of the Binlog and the specific storage engine. So MySQL uses 2PC to ensure the integrity of the transaction. The 2PC process for MySQL is as follows:

(1) First call Binglog_hton and Innobase_hton prepare method to complete the first stage, Binlog_hton Papare method actually did nothing, InnoDB prepare the transaction state is set to Trx_ PREPARED, and will redo log brush disk (Innobase_xa_prepareàtrx_prepare_for_mysqlàtrx_prepare_off_kernel).

(2) If the prepare of all the storage engines involved in the transaction are successful, call Tc_log_binlog::log_xid to write the SQL statement to BINLOG, at which point the transaction is already committed. Otherwise, call Ha_rollback_trans to roll back the transaction, and the SQL statement will not actually write to Binlog.

(3) Finally, the commit of the calling engine completes the commit of the transaction. In fact Binlog_hton->commit does nothing (because (2) the Binlog is written to disk), Innobase_hton->commit clears the undo message, brushes the redo log, and sets the transaction to Trx_not_ Started status (Innobase_commitàinnobase_commit_lowàtrx_commit_for_mysqlàtrx_commit_off_kernel).

ha_innodb.cc

Static

Int

Innobase_commit (

/*============*/

/* out:0 */

thd* THD,/* in:mysql thread handle of the user for whom

The transaction should be committed */

BOOL All)/* In:true-commit transaction

False-the current SQL statement ended */

{

...

Trx->mysql_log_file_name = Mysql_bin_log.get_log_fname ();

Trx->mysql_log_offset =

(Ib_longlong) mysql_bin_log.get_log_file ()->pos_in_file;

...

}

The function Innobase_commit commits the transaction, first obtains the current Binlog position, and then writes the transaction system page (Trx_commit_off_kernelàtrx_sys_update_mysql_binlog_offset).

InnoDB log the location of MySQL Binlog to the TRX System header:

Trx0sys.h

/* The offset of the MySQL Binlog offset info in the TRX system header */

#define TRX_SYS_MYSQL_LOG_INFO (univ_page_size-1000)

#define TRX_SYS_MYSQL_LOG_MAGIC_N_FLD 0/* MAGIC number which shows

If we have valid data in the

MySQL Binlog Info; The value

Is ... _magic_n if yes */

#define TRX_SYS_MYSQL_LOG_OFFSET_HIGH 4/* High 4 bytes of the OFFSET

Within that file */

#define TRX_SYS_MYSQL_LOG_OFFSET_LOW 8/* Low 4 bytes of the OFFSET

Within that file */

#define TRX_SYS_MYSQL_LOG_NAME/* MYSQL LOG file NAME */

5.3.2 Transaction Recovery Process

InnoDB at the time of recovery, different state transactions will be handled differently (see trx_rollback_or_clean_all_without_sess function):

<1> for trx_committed_in_memory transactions, clear the rollback segment, and then set the transaction to trx_not_started;

<2> for trx_not_started transactions, indicates that the transaction has been committed, skipped;

<3> for trx_prepared Affairs, according to Binlog to determine the fate of the transaction, temporarily skipped;

<4> rollback for trx_active transactions.

When MySQL opens BINLOG, it checks the status of BINLOG (Tc_log_binlog::open). If the BINLOG does not shut down normally (Log_event_binlog_in_use_f is 1), then the basic flow of the recovery operation is as follows:

<1> scans the Binlog, reads the xid_event transaction, and gets the list of all the XA transactions that have been committed (in fact, the transaction may be in prepare or commit in InnoDB);

<2> for each XA transaction, call Handlerton::recover to check if the storage engine exists in the prepare state of the transaction (see Innobase_xa_recover), which is to check the status of the XA transaction in the storage engine;

<3> If there is an XA transaction in the prepare state, call Handlerton::commit_by_xid to commit the transaction;

<4> Otherwise, call Handlerton::rollback_by_xid to roll back the XA transaction.

5.3.3 several parameters discussion

(1) Sync_binlog

MySQL invokes Mysql_log::write to complete the write Binlog when committing the transaction, and decides whether to brush the disk according to Sync_binlog. The default value is 0, which is to not brush the disk, thereby giving control to the OS. If set to 1, a brush disk is performed each time a transaction is committed, which has an impact on performance (5.6 already supports Binlog group), so many people set it to 100.

BOOL Mysql_log::flush_and_sync ()

{

int err=0, fd=log_file.file;

Safe_mutex_assert_owner (&lock_log);

if (Flush_io_cache (&log_file))

return 1;

if (++sync_binlog_counter >= sync_binlog_period && sync_binlog_period)

{

Sync_binlog_counter= 0;

Err=my_sync (FD, MYF (MY_WME));

}

return err;

}

(2) Innodb_flush_log_at_trx_commit

This parameter controls the behavior of InnoDB to brush redo log when committing a transaction. The default value is 1, which is the brush disk operation each time a transaction is committed. In order to reduce the impact on performance, many production environments are set to 2, or even 0.

If The value of Innodb_flush_log_at_trx_commit is 0, the log buffer was written out to the log file once per second and The flush to disk operation was performed on the log file and nothing was done at a transaction commit. When the value was 1 (the default), the log buffer is written out to the log file at each transaction commit and the flush To disk operation was performed on the log file. When the value was 2, the log buffer is written the-the file at each commit, but the flush to disk operation is not perf Ormed on it. However, the flushing on the log file takes place once per second also when the value is 2. Note that the Once-per-second flushing isn't 100% guaranteed to happen every second, due to process scheduling issues.

The default value of 1 is required to full ACID compliance. You can achieve better performance by setting the value different from 1, and then you can lose up to one second worth of Transactions in a crash. With a value of 0, any mysqld process crash can erase the last second of transactions. With a value of 2, only the operating system crash or a power outage can erase the last second of transactions.

(3) Innodb_support_xa

The 2PC that controls whether the InnoDB supports XA transactions, which is true by default. If closed, InnoDB does nothing at the prepare stage, which may cause binlog order to be inconsistent with the order in which the InnoDB is submitted (for example, a transaction writes Binlog before the B transaction, but within InnoDB it is possible for a transaction to be committed after the B transaction). This causes different data to be generated in the recovery or slave.

Int

Innobase_xa_prepare (

/*================*/

/* out:0 or error number */

thd* THD,/* In:handle to the MySQL thread of the user

Whose XA transaction should be prepared */

bool All)/* In:true-commit transaction

False-the current SQL statement ended */

{

...

if (!THD->VARIABLES.INNODB_SUPPORT_XA) {

return (0);

}

5.3.4 Security/Performance Discussion

Different values for the above 3 parameters will result in different effects. All three are set to 1 (true) for the data to be truly secure. Sync_binlog is not 1, which can cause Binlog to be lost (OS hangs), thus inconsistent with data at the InnoDB level. Innodb_flush_log_at_trx_commit non-1 may cause data loss at the InnoDB level (OS hangs), thus inconsistent with binlog.

For performance analysis, you can refer to

http://www.mysqlperformanceblog.com/2011/03/02/what-is-innodb_support_xa/

http://www.mysqlperformanceblog.com/2009/01/21/beware-ext3-and-sync-binlog-do-not-play-well-together/

On MySQL's two-phase commit protocol

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.