Java Interview 05| MySQL and InnoDB engine

Source: Internet
Author: User
Tags rollback

1. InnoDB Engine Index

InnoDB supports indexes in the following ways:

(1) Hash index

(2) Full-text indexing

(1) B + Tree index can be divided into clustered index and secondary index

The creation of the index can be done in the CREATE TABLE statement, or you can add indexes to the table by using the CREATE INDEX or ALTER TABLE alone. Deleting an index can be accomplished using ALTER TABLE or the DROP INDEX statement.

(1) Create an index using the ALTER TABLE statement.
The syntax is as follows:

ALTER TABLE table_name ADD index index_name (column_list); ALTER TABLE table_name add unique (column_list); ALTER TABLE TA Ble_name Add primary key (column_list);

These include the Normal index, the unique index, and the primary key index 3 create indexes , table_name is the table name to increase the index, column_list indicates which columns are indexed, and columns are separated by commas. Index name index_name optional, by default, MySQL assigns a name based on the first indexed column. In addition, ALTER TABLE allows you to change multiple tables in a single statement, so you can create multiple indexes at the same time.

An example of creating an index is as follows:

Mysql> ALTER TABLE Table_test add index Index_test1 (name); Query OK, 2 rows affected (0.08 sec)

(2) Use the CREATE INDEX statement to add an index to the table.

can increase both ordinary and unique indexes. The format is as follows:

CREATE INDEX index_name on table_name (column_list), create unique index index_name on table_name (column_list);

An example of creating an index is as follows:

Mysql>create index Index_test2 on table_test (age); Query OK, 2 rows affected (0.08 sec)

Description: TABLE_NAME, index_name, and column_list have the same meaning as in the ALTER TABLE statement, and the index name is not selectable. In addition, the primary key index cannot be created with the CREATE INDEX statement.

(3) Deleting an index

Deleting an index can be accomplished by using the ALTER TABLE or the DROP INDEX statement. DROP index can be handled as a statement inside the ALTER TABLE, in the following format:

DROP INDEX index_name on TABLE_NAME; ALTER TABLE table_name DROP INDEX index_name; ALTER TABLE table_name DROP PRIMARY key ;

In the preceding two statements, the index index_name in table_name is removed. In the last statement, it is only used in the Delete primary key index, because a table may have only one primary key index, so you do not need to specify the index name. If the primary key index is not created, but the table has one or more unique indexes, MySQL deletes the first unique index.

If a column is removed from the table, the index is affected. For multiple-column combinations of indexes, if one of the columns is deleted, the column is also removed from the index. If you delete all the columns that make up the index, the entire index is deleted.
Delete the index operation, as in the following code:

Mysql> DROP index name on table_test; Query OK, 2 rows affected (0.08 sec)


2. Several principles of index establishment

1. The leftmost prefix matching principle, very important principle, MySQL will always match right until it encounters a range query (>, <, between, like) to stop the match, such as a = 1 and B = 2 and C > 3 and D = 4 if established (a,b,c,d The index of the order, D is not indexed, if the establishment (A,B,D,C) of the index can be used, the order of a,b,d can be arbitrarily adjusted.

2. Try to choose a high-differentiated column as the index, the formula for the degree of sensitivity is count (distinct col)/count (*), indicating that the field does not repeat the scale, the greater the proportion of the number of records we scan, the difference between the unique key is 1, and some states, The gender field may be 0 in front of big data, and one might ask, what is the empirical value of this ratio? Using different scenarios, this value is also difficult to determine, generally need to join the field we are required to be more than 0.1, that is, the average 1 scan 10 records

3. The index column cannot participate in the calculation, keep the column "clean", such as from_unixtime (create_time) = ' 2014-05-29 ' can not be used to the index, the reason is simple, B + tree is stored in the Data table field values, but when the retrieval, You need to apply all the elements to the function to compare, obviously the cost is too large. So the statement should be written create_time = Unix_timestamp (' 2014-05-29 ');

3. Several optimizations of MySQL

Reference to "building a high-performance Web site" chapter 11th database performance optimization

(1) Building an index

According to the estimated data volume and query to design the index, about the selection of the index, you should pay attention to:

A, according to the amount of data to determine which tables need to increase the index, the small amount of data can only the primary key

B, depending on the frequency of use to determine which fields need to be indexed, select frequently as a join condition, filter criteria, aggregate query, sorted fields as candidates for the index field

C, combine the fields that often appear together, make up the combined index, the field order of the combined index is the same as the primary key, also need to put the most common fields in front, the low repetition rate of the field (note the order of the index)

D, a table does not add too many indexes, because indexes affect the speed of insertions and updates because indexes may be rebuilt when inserting or updating

(2) Redundancy of Table data fields (inverse paradigm)

(3) Table design vertical and horizontal sub-table, vertical sub-Library and horizontal sub-Library

Vertical splitting of a table

You can refer to the article: one minute master database vertical split Http://

As the demand grows, the columns of a table become more and more, and in order to control the width of the table it is possible to split the table vertically. To split a table vertically:

Cause: The database is stored as a page, the wider the table, the larger the data in each row, and the fewer rows that can be stored on a page. Split into multiple narrow tables, each table contains no large length, optimizing the IO efficiency.

-Frequently queried columns are placed in a small table, reducing table association-TEXT,BLOB to split the horizontal split of a table into an attached table

The original large table has billions of data, you need to reduce the amount of data in the table, in order to control the size of the table can be horizontal splitting of the table. To split a table horizontally:

So how do you allocate data from a large table to more than one small table? The split can be in hash mode, such as:

Each table has a primary key value, by hashing the primary key value, such as the primary key massage value, a large table is distributed evenly to a few small tables, to solve the problem of the data volume in the table.

3. mysql Common function

Convert ()

Cast ()

Truncate () truncating decimals

Round () rounding

Lower ()/upper () turns the parameter to case

Length () to find the lengths of the arguments

concat (parameter 1, parameter 2): Connect parameter 1 and parameter 2.

Floor (parameter): Returns the largest integer less than or equal to the parameter

Ceil (parameter): Returns the smallest integer greater than or equal to the parameter

ABS (parametric): To find the absolute value of a parameter

MoD (parameter 1, parameter 2): The remainder after parameter 1 divided by parameter 2

SUBSTR (X,start, [length]) Take a substring

if ()

Ifnull ()

Date_format ()

The aggregation functions are:

Count () to find the total record for the field

Min ()/max () to find the minimum maximum value of the field

SUM () to ask for the field and

AVG () averaging

Group_concat () Each data row after the iteration is grouped

4. How does the process of inserting and deleting a data in MySQL database execute at the bottom?

MySQL locks, locks are indexed, and row-level locks are index-based. InnoDB uses row-level locks only if the data is retrieved by index criteria, otherwise INNODB will use a table lock!

The main points involved in the analysis are:

(1) Where conditions are split

(2) about the combination of index and transaction isolation level


(1) MySQL transaction and lock InnoDB


5. How does the MySQL transaction be implemented?

Transactions have ACID properties, so it is how to ensure that these features can be implemented.

(1) Isolation is guaranteed by the lock. A transaction sees the results of other transactions, such as Phantom reads, during the operation. Locks are a mechanism for resolving isolation. The isolation level of a transaction is implemented through a locking mechanism.

(2) consistency is guaranteed by undo log, which can be used for transaction rollback and MVCC functions.

(3) atomicity and permanence are guaranteed by redo log. When a transaction is committed, all logs of the transaction must be written to the redo log file for persistence.

6. MySQL's transaction and its isolation level

The transaction isolation level for the database is (in the case of multiple transactions concurrency):


#首先, modify the isolation level set tx_isolation= ' read-uncommitted '; select @ @tx_isolation; +------------------+| @ @tx_isolation |+------------------+| read-uncommitted |+------------------+ #事务A: Start a transaction start Transaction;select * FROM tx;+------+------+| ID |    Num |+------+------+|    1 |    1 | |    2 |    2 | |    3 |  3 |+------+------+ #事务B: Initiates a transaction (then two transactions intersect) executes the UPDATE statement in transaction B without committing the start transaction;update TX set num=10 where Id=1;select * From tx;+------+------+| ID |    Num |+------+------+|   1 |    10 | |    2 |    2 | |    3 | 3 |+------+------+ #事务A: So at this point does transaction a see the updated data? SELECT * FROM tx;+------+------+| ID |    Num |+------+------+|   1 |   10 | ---> Can see!    It means that we have read the data that transaction B has not yet submitted |    2 |    2 | |    3 | 3 |+------+------+ #事务B: Transaction b rollback, still uncommitted rollback;select * FROM tx;+------+------+| ID |    Num |+------+------+|    1 |    1 | |    2 |    2 | |    3 | 3 |+------+------+ #事务A: What you see in transaction A is also B data not submitted SELECT * FROM tx;+------+------+| ID |    Num |+------+------+|    1 |      1 | ---> Dirty reads mean I'm in this business (a), and transaction B, thoughWithout committing but it any one of the data changes I can see!|    2 |    2 | |    3 | 3 |+------+------+


2. Read Committed

#首先修改隔离级别set tx_isolation= ' read-committed '; select @ @tx_isolation; +----------------+| @ @tx_isolation |+----------------+| read-committed |+----------------+ #事务A: Start a transaction start Transaction;select * FROM tx;+------+------+| ID |    Num |+------+------+|    1 |    1 | |    2 |    2 | |    3 | 3 |+------+------+ #事务B: A transaction is also started (then two transactions are crossed) and the data is updated in this transaction, and the start transaction;update TX set num=10 where Id=1;select * from is not committed tx;+------+------+| ID |    Num |+------+------+|   1 |    10 | |    2 |    2 | |    3 | 3 |+------+------+ #事务A: Can we see the data change in transaction A at this time? SELECT * FROM TX; ------------->+------+------+ | | ID |                num |    |+------+------+                ||    1 |    1 |---> Can't see! | |    2 |                2 |    ||    3 |                3 |        |+------+------+ |--> The same SELECT statement, but the result is different | #事务B: What if transaction B is committed?                        |commit;                               | | #事务A: |select * from TX; ------------->+------+------+| ID |    Num |+------+------+|   1 |    ---> Because transaction B has already been submitted, we see data changes in a |    2 |    2 | |    3 | 3 |+------+------+


3. Repeatable READ

#首先, change the isolation level set tx_isolation= ' Repeatable-read '; select @ @tx_isolation; +-----------------+| @ @tx_isolation  |+-----------------+| Repeatable-read |+-----------------+ #事务A: Start a transaction start Transaction;select * FROM tx;+------+------+| ID   | num  |+------+------+|    1 |    1 | |    2 |    2 | |    3 |    3 |+------+------+ #事务B: Open a new transaction (then the two transactions intersect) update the data in transaction B and submit the start Transaction;update TX set num=10 where Id=1;select * from t x;+------+------+| ID   | num  |+------+------+|    1 |   Ten | |    2 |    2 | |    3 |    3 |+------+------+commit; #事务A: Even if transaction B is already committed, can a see the data change? SELECT * FROM tx;+------+------+| ID   | num  |+------+------+|    1 |    1 | ---> Can not be seen! (This level 2 is different, also indicates that level 3 resolves non-repeatable read issues) |    2 |    2 | |    3 |    3 |+------+------+ #事务A: Only if transaction A is committed, can it see data changes Commit;select * from tx;+------+------+| ID   | num  |+------+------+|    1 |   Ten | |    2 |    2 | |    3 |    3 |+------+------+


4, Serializable

#首先修改隔离界别set tx_isolation= ' serializable '; select @ @tx_isolation; +----------------+| @ @tx_isolation |+----------------+| SERIALIZABLE   |+----------------+ #事务A: Starts a new transaction start transaction; #事务B: This cross-transaction is the start of the data that cannot be changed before a commit Transaction;insert TX VALUES (' 4 ', ' 4 '); ERROR 1205 (HY000): Lock wait timeout exceeded; Try restarting Transactionupdate TX set num=10 where id=1; ERROR 1205 (HY000): Lock wait timeout exceeded; Try restarting transaction


To summarize:

√: May appear x: does not appear

  Transaction ISOLATION level dirty read    transaction 1 updated the record, but no commit, transaction 2 reads the updated row, then the transaction T1 Rollback, and now the T2 read is invalid. Violation of isolation caused by the addition of row lock implementation non-repeatable read    transaction 1 read records, transaction 2 updated records and commits, transaction 1 read again can see the transaction 2 modified record (Modify batch update or delete) need to add row lock for implementation

Phantom Read     transaction 1 reads records when transaction 2 adds a record and commits, and transaction 1 reads again to see the new record for transaction 2. Table locks need to be added for implementation. The InnoDB storage engine solves the problem by using the multi-version concurrency control (mvcc,multiversion Concurrency) mechanism

Read uncommitted& nbsp; Yes Yes
Read Co mmitted  x
Repeatable Read x x
Serializable x x< /strong> x

Note: (1) to distinguish between non-repeatable reading and the difference between Phantom read one is to update the record, the other is to read the new record (2) different database storage engine is not strictly according to the standards, such as INNODB default repeatable Read isolation level can be done to avoid the problem of phantom reading (using the Next-key-lock lock algorithm). The InnoDB and Falcon storage engines address this issue through a multi-version concurrency control (mvcc,multiversion Concurrency control) mechanism.

Corresponds to 5 transaction isolation levels in spring (specified by Lsolation property values)

1. Default transaction ISOLATION level. Using the default transaction isolation level for the database

2, read_uncommitted read UNCOMMITTED, one transaction can operate another uncommitted transaction, cannot avoid dirty read, non-repeatable read, Phantom read, the lowest isolation level, the highest concurrency performance

3, read_committed (dirty read) Most databases default transaction isolation level. Read Committed, one transaction can not operate another uncommitted transaction, can prevent dirty read, cannot avoid non-repeatable read, Phantom read

4, Repeatable_read (non-repeatable read) InnoDB The default transaction isolation level. Be able to avoid dirty reading, non-repeatable reading, cannot avoid phantom reading

5. Serializable (Phantom Read) The InnoDB storage engine can have distributed XA transaction support at this level. Highest isolation level, lowest resource consumption, highest cost, ability to prevent dirty reads, non-repeatable reads, Phantom reads

7. Database Paradigm and Inverse paradigm

1. Paradigm

The normalization of database logic design is what we generally call the paradigm, we can simply understand the paradigm:

(1) First paradigm (ensuring that each column remains atomic)

The first paradigm is the most basic paradigm. If all the field values in a database table are non-exploded atomic values, the database table satisfies the first paradigm.

The rational follow-up of the first paradigm needs to be determined according to the actual needs of the system. For example, some database systems need to use the "address" attribute, the "address" attribute should be directly designed as a database table field. However, if the system often accesses the "city" part of the "address" attribute, then it is not to be the "address" attribute to be re-split into provinces, cities, detailed address and other parts of storage, so that in the address of a part of the operation will be very convenient. This design only satisfies the first paradigm of the database.

(2) Second paradigm (ensure that each column in the table is related to the primary key)

The second paradigm is based on the first paradigm in a more advanced layer. The second paradigm needs to ensure that each column in a database table is related to the primary key, not just one part of the primary key (primarily for the Federated primary key). In other words, in a database table, only one data can be saved in a table, and multiple data cannot be saved in the same database table.

For example, to design an order information table, because there may be more than one item in the order, the order number and the product number are used as the federated primary key for the database table, as shown in the following table.

order Information Form

This creates a problem: The table is the Union primary key with the order number and the product number. In this table, the product name, unit, commodity price and other information is not related to the table's primary key, but only related to the product number. So this violates the design principle of the second paradigm.

And if the Order Information table is split, the product information is separated into another table, the Order Item table is also separated into another table, it is perfect. as shown below.

(3) Third paradigm (ensure that each column is directly related to the primary key column, not indirectly)

The third paradigm needs to ensure that each column of data in a data table is directly related to the primary key, not indirectly.

For example, when designing an order data table, the customer number can be used as a foreign key and order table to establish the corresponding relationship. Instead of adding fields to the order form about other customer information (such as name, company, etc.). The design shown in the following two tables is a database table that satisfies the third paradigm.

In this way, when the order information is queried, the customer number can be used to refer to the records in the Customer information table, and the data redundancy is reduced without having to enter the contents of the customer information multiple times in the Order Information table.

The higher paradigm requires no further introduction here, and the individual believes that if all of the second paradigm is reached, most of the third paradigm, the system produces fewer columns and more tables, thus reducing data redundancy and improving performance.

It is almost impossible to completely standardize a system, unless the system is particularly small, and it is necessary to systematically add redundancy after a standardized design.

From the performance point of view, redundant database can disperse the pressure of database, redundant table can disperse the concurrency pressure of the large scale data, also can speed up the speed of special query, the redundant field can reduce the connection of database table effectively and improve the efficiency.

2. Inverse paradigm

Improve read efficiency with appropriate data redundancy

How do I inquire about order details?

SELECTB. User name, B. Telephone, b. Address, A. Order Id,sum (c. Commodity price * C. Quantity of goods) as order price, C. Commodity price, D. Commodity name from ' order form ' ajoin ' user table ' B on a. User ID = B. User Idjoin ' order vendor Product table ' C on c. Order id = B. Order idjoin ' commodity table ' d on d. Commodity id = C. Commodity idgroup by B. Username, B. Telephone, b. Address, A. Order Id,c. Commodity price, D. Product Name

The query needs to correlate multiple tables and then summarize the price with sum, and the query is less efficient. If you use the redundancy of some of the data in the table, the inverse normalization design, such as:

Query to simplify SQL

SELECTB. User name, B. phone, b. Address, A. Order Id,a. Order price, C. Commodity price, C. Commodity name from ' order form ' ajoin ' user table ' B on a. User ID = B. User Idjoin ' Order Commodity table ' C on c. Order id = B. Single ID

Internet projects, read and write ratios are about 3:1 or 4:1 of the relationship, reading is much higher than writing, writing when the increase in data redundancy, increase the efficiency of reading, it is still worthwhile.

The goal of the inverse paradigm is to reduce the overhead of reading data, and then it comes with more overhead of writing data. Because we need to pre-finalize a large number of copies of data.

The inverse paradigm also leads to inconsistent data, which can be collated by asynchronous writes to fix inconsistent data.

Must-see article: Talk about redundant table data Consistency (architect's path) Http://

Java Interview 05| MySQL and InnoDB engine

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.