Then a study: http://www.jb51.net/article/70528.htm
Seven, MySQL database schema design performance optimization
high-efficiency model design
Moderate redundancy-let query do two less join
Optimization of-summary table with large segment vertical splitting
Large table horizontal splitting-based on type-splitting optimization
Statistics-quasi-real-time optimization
Appropriate type of data
Time storage format Total class is not too much, we commonly used mainly datetime,date and timestamp these three kinds of. In terms of storage space, timestamp is the least, four bytes, while the other two data types are eight bytes, one more times. The disadvantage of timestamp is that he can only store time from 1970 onwards, while the other two types of time can be stored as early as 1001. If there is a need to store the time before 1970, we must discard the timestamp type, but as long as we do not need to use the time 1970 years ago, it is best to use timestamp to reduce the footprint of storage space.
Character Storage type
The char[(M)] type belongs to the static length type, and the storage length is entirely in characters, so the final store length is based on the character set, such as Latin1, which has a maximum storage length of 255 bytes, but the maximum storage length is 510 bytes If the GBK is used. The storage characteristic of the char type is that no matter how long we actually hold the data, it will be stored in the database m characters, not enough through the space to fill up, M defaults to 1. Although Char will make up the space for storage through a space, when accessing the data, MySQL ignores all the last spaces, so if we actually need a space at the end of our actual data, we cannot use the char type to store it.
varchar[(M)] belongs to the dynamic storage-length type, saving only the length of the actual stored data. The four types of Tinytext,text,mediumtext and Longtext belong to one type of storage, all of which are dynamic storage length types, with the difference being only the maximum length limit.
Transaction optimization
1. Dirty reads: Dirty reading means that when a transaction is accessing the data and the data has been modified, and the modification has not been submitted to the database, another transaction accesses the data and then uses the data.
2. Non-repeatable reading: refers to the same data read multiple times within a transaction. When this transaction is not finished, another transaction accesses the same data. Then, between the two read data in the first transaction and the modification of the second transaction, the data read by the first transaction two times may be different. This makes it happen that the data read two times within a transaction is not the same, so it is called a non repeatable read.
3. Phantom reading: a phenomenon that occurs when a transaction is not executed independently, for example, the first transaction modifies the data in a table that involves all the rows of data in the table. At the same time, the second transaction modifies the data in this table, which is inserting a row of new data into the table. So, it's going to happen later. The user of the first transaction discovers that there are no modified data rows in the table, as if there were hallucinations.
The information that InnoDB supports at the transaction isolation level is as follows:
1.READ UNCOMMITTED
Often become dirty reads (dirty read), which can be said to be the lowest level of isolation on a transaction: The execution of a select in normal, unlocked mode allows us to see that the data may not be the data from which the query originated, and therefore is consistent reads (consistent read) under this isolation;
2.READ committed
Dirtyread is not present at this isolation level, but non-repeatablereads (not repeatable reads) and phantomreads (phantom reads) may occur.
3. Repeatable READ
The REPEATABLE read isolation level is the INNODB default transaction isolation level. Under the REPEATABLE READ isolation level, there will be no dirtyreads and non-repeatable read, but the possibility of phantomreads still exists.
4.SERIALIZABLE
The Serializable isolation level is the highest level in the standard transaction isolation level. When set to the Serializable isolation level, the data seen at any time in the transaction is the state of the transaction startup time, regardless of whether any other transactions have modified some data and submitted it during this period. Therefore, Phantomreads does not appear under the SERIALIZABLE transaction isolation level.
Viii. Data segmentation of scalable design
Vertical segmentation of data
Vertical segmentation of data can also be called vertical segmentation. Imagine the database as a large chunk of chunks of "data block" (table), we vertically cut these "chunks", and then spread them across multiple database hosts. This method of segmentation is a vertical (vertical) data segmentation.
The advantages of vertical segmentation
The splitting of the database is simple and clear, and the splitting rules are definite;
The application module is clear and easy to integrate;
Data maintenance is convenient and easy to locate;
The disadvantage of vertical segmentation
Some table associations cannot be completed at the database level and need to be completed in the program;
The table with extremely frequent access and large amount of data still has a quiet performance and does not necessarily meet the requirements.
Transaction processing is relatively more complex;
After the segmentation reaches a certain degree, extensibility will encounter the limitation;
Cross-reading segmentation may lead to complex system transitions and difficult maintenance.
Horizontal segmentation of data
The vertical segmentation of the data can be simply understood as the segmentation of the data according to the module, and the horizontal segmentation is no longer based on the table or functional modules. In general, a simple horizontal segmentation is mainly to distribute a table with a very trivial access to a number of tables, each of which contains part of the data.
The advantages of horizontal segmentation
Table Association can be completely completed on the database side;
There is no problem that some super large data and high load tables encounter bottlenecks;
Application-side Overall schema changes are relatively small;
Transaction processing is relatively simple;
As long as the segmentation rules can be defined well, it is more difficult to encounter extensibility constraints.
The disadvantage of horizontal segmentation
The segmentation rule is more complicated, it is difficult to abstract a segmentation rule that satisfies the whole database.
The maintenance difficulty of the data is increased, and it is more difficult to manually locate the data.
The coupling degree of each module in the application system is very high, which may cause some difficulties to the migration and splitting of the data later.
Possible problems in data segmentation and integration
1. The issue of introducing distributed services
It is possible to split a distributed transaction across multiple databases into small transactions that are only on a single database, and to control individual small transactions through the application. Of course, the requirement is that our Russian application must be robust enough, and of course it will bring some technical difficulties to the application.
2. Cross-node Join problem
It is recommended that the application be processed by first taking the corresponding drive result set in the MySQLServer where the driver table is located, and then taking the corresponding data out of the MySQL server where the driver table is located, based on the drive result set.
3. Merging sorted paging issues across nodes
Fetch data in parallel from multiple data sources, and then the application summarizes the processing.
Ix. the application of cache and search in scalable design
By introducing Cache (Redis, Memcached), reducing database access and increasing performance.
Through the introduction of search (Lucene, SOLR, Elasticsearch), the use of search engine efficient full-text indexing and word segmentation algorithm, as well as efficient data retrieval, to solve the database and traditional cache software completely unable to solve the full text of fuzzy search, classification and statistical query functions.
The above is the entire content of this article, I hope you can enjoy.