1, why the InnoDB table is best to have self-added key?
InnoDB engine table is a B + tree-based index organization table (IOT)
About B + Trees
(Image from the Internet)
Features of B + trees:
A, all the keywords appear in the leaf node of the linked list (dense index), and the list of keywords is exactly the order;
B, it is impossible to hit the non-leaf node;
C, the non-leaf node is equivalent to the leaf node index (sparse index), the leaf node is equivalent to the storage (keyword) data layer.
1. If we define the primary key (PRIMARY key)
Then InnoDB chooses the primary key as the clustered index, and if no primary key is explicitly defined, InnoDB chooses the first non-packet with a null value as the primary key index, and if there is no such unique index, InnoDB chooses the built-in 6-byte long rowid as the suppressed clustered index ( rowID with the write of the row record and the primary key increment, the ROWID is not as quoted as Oracle's ROWID, which is implied).
2, the data record itself is stored in the main index (a b+tree) on the leaf node
This requires that each data record in the same leaf node (the size of a memory page or a disk page) be stored in the primary key order, so that whenever a new record is inserted, MySQL inserts it into the appropriate node and position according to its primary key, if the page reaches the load factor (InnoDB defaults to 15/16). Opens a new page (node)
3. If the table uses the self-increment primary key
Then each time a new record is inserted, the record is added sequentially to the subsequent position of the current index node, and when a page is full, a new page is automatically opened.
4, if the use of non-self-increment primary key (if the social Security number or school number, etc.)
Since the value of each insert primary key is approximate to random, so each new record is inserted into the existing index page in the middle of a position, at this time, MySQL had to plug in to the appropriate location to move the data, and even the target page may have been written back to disk and cleared from the cache, and then read back from the disk, This adds a lot of overhead, while frequent moves, paging operations cause a lot of fragmentation, a less compact index structure, and subsequent optimize tables to rebuild the table and optimize the fill page.
Summary: If the data write order of the InnoDB table is consistent with the leaf node order of the B + Tree index, then the access efficiency is the highest, which is the highest access efficiency in the following cases:
A, using the self-increment column (int/bigint type) The master key, when the write order is self-increment, and B + number leaf node splitting sequence consistent;
b, the table does not specify the self-increment of the main key, and there is no unique index that can be selected as the primary key (the above conditions), this time InnoDB will choose the built-in ROWID as the primary key, the write order and ROWID growth sequence consistent;
C, if a InnoDB table does not display the primary key, and there is a unique index that can be selected as the primary key, but the unique index may not be an incremental relationship (such as the case of strings, UUID, multi-field federated unique index), the table will have poor access efficiency.
The exact words from "high-performance MySQL"
Citation Link: https://segmentfault.com/q/1010000003856705
2, why need to set up a double 1 to ensure the consistency of master-slave data?
Double 1:innodb_flush_log_at_trx_commit=1 and Sync_binlog=1
Sync_binlog=n, after each commit of n transactions, MySQL will perform a disk synchronization instruction such as Fsny to force the data in the Binlog_cache to disk. In MySQL sync_binlog=0, that is, do not make any mandatory disk refresh instructions, the performance is the best, but the greatest risk. Because once the system crash, all binlog information in the Binlog_cache is lost.
Innodb_flush_log_at_trx_commit=1 is that every transaction commit or transaction instruction requires the log to be written to (flush) the hard disk, which is time consuming when using the battery-powered cache (Battery backed up cache).
Innodb_flush_log_at_trx_commit=2 is not written to the hard disk but write to the system cache, the log will still flush to the hard disk every second, so generally do not lose more than 1-2 seconds of updates, the system hangs can only lose data
Innodb_flush_log_at_trx_commit=0 will be faster and less secure, even if MySQL hangs data that might lose transactions
3, there are several binlog format, what is the difference?
Row,statement,mixed=row+statement
1. Row
The log is recorded in the form of each row of data being modified, and then the same data is modified on the slave side.
Advantage: In row mode, Bin-log can not record the context-sensitive information of the executed SQL statement, just need to record that one record has been modified, what to change. So the log content of row will be very clear to record the details of each row of data modification, very easy to understand. There are no stored procedures or function in certain situations, and trigger calls and triggers cannot be copied correctly.
Disadvantage: In row mode, all executed statements are recorded with the modification of each row of records as they are recorded in the log, which can result in a large amount of log content.
2. Statement
Each SQL that modifies the data is recorded in the Bin-log of master. Slave when replicating, the SQL process parses the same SQL that was executed sing Woo the original master side.
Advantages: In statement mode, the first is to solve the disadvantage of row mode, do not need to record each row of data changes, reduce the Bin-log log volume, save I/O and storage resources, improve performance. Because he only needs to record the details of the statements executed on master, and the context in which the statements are executed.
Cons: In statement mode, because he is the execution statement of the record, so, in order for these statements to be executed correctly at the slave end, he must also record some relevant information about each statement at the time of execution, that is, contextual information, to ensure that all statements in the slave The end Cup is executed with the same results as when it is executed on the master side. In addition, because MySQL is now developing relatively fast, a lot of new features continue to join, so that the replication of MySQL encountered a large challenge, natural replication involves more complex content, bugs will be more prone to appear. In statement, there are a number of things that have been found to cause MySQL replication problems, mainly when modifying the data when using some specific functions or functions, such as: Sleep () function in some versions can not be copied correctly, in the stored procedure used The last_insert_id () function may cause inconsistent IDs on slave and master, and so on. Because row is recorded on a per-row basis, a similar problem does not occur.
3. Mixed
As you can see from the official documentation, the previous MySQL has been only based on the statement copy mode until the 5.1.5 version of MySQL started to support row replication. Starting with 5.0, MySQL replication has resolved issues that are not correctly replicated in a large number of older versions. However, due to the emergence of stored procedures, MySQL Replication has brought more new challenges. In addition, the official documentation says that, starting with version 5.1.8, MySQL provides a third replication mode except for Statement and Row: Mixed, which is actually a combination of the first two modes. In Mixed mode, MySQL distinguishes between the log forms of treated records based on each specific SQL statement executed, that is, choosing between statement and row. The statment in the new version is still the same as before, recording only the statements executed. The new version of MySQL in the row mode is also optimized, not all changes will be in the row mode to record, such as when the table structure changes will be recorded in the statement mode, if the SQL statement is actually update or delete and other modified data statements, Then the changes to all rows are recorded.
Attention:
Condition 1: When Binlog format is set to mixed, there is no problem with normal replication, but cascading replication is binlog lost in special cases.
Condition 2: When a large amount of data (400W or so) is scanned for updates, deletions, insertions, and indeterminate DML statements (for example: Delete from table where data< ' N ' limit).
When condition 1 and condition 2 are met at the same time, the master-slave replication data loss problem occurs. can only set Binlog_format=row
Reference: http://tshare365.com/archives/2054.html
MySQL interview several questions