I. 58 design ideas of database architecture in the same city
(1) Usability design
Solution: Copy + redundancy
Side effects: Replication + redundancy will definitely cause consistency issues
Guaranteed "read" Highly available method: Copy from library, redundant data, such as
Problems: Master-Slave inconsistency
Solution: See below
A general approach to "write" High Availability: Dual-master mode, that is, copying the main library (many companies use single-master, there is no guarantee of write availability at this time), redundant data such as
The problem: Dual-Master Synchronization key conflict, inconsistent
Solution:
A) programme I: guaranteed by the database or the business layer key is not in conflict with two Masters
b) Programme II: see below
58 The same city Guarantee "write" high-availability method: "Double master" when the "main" use, do not do read and write separation, in the "main" hanging off the case, "from" (actually another master), on top, such as
Advantages: Both read and write to the master, solve the consistency problem; "Dual master" when "Master and slave" to solve the usability problem
Problem: How does read performance expand? Solution See below
(2) Read performance design: How to extend read performance
The most common approach is to build an index
Creating very many indexes, the side effects are:
A) Reduced write performance
b) The index accounts for more memory, the data in memory is less, the data hit rate is low, the number of Io more
But do you think that different libraries can build different indexes? Such as
TIPS: Different libraries can create different indexes
The main library only provides write, not indexed
Online read-only online from the library, creating an online read index
Offline only provides offline reading from the library, establishing offline read index
Improve read performance common scenario two, increase from library
As mentioned above, this method will cause the master-slave inconsistency problem, the more from the library, the longer the master-slave delay, the worse the inconsistency problem
This scenario is common, but 58 does not use
Improve read performance scenario three, increase the cache
the traditional cache usage is:
A) When a write request occurs, the cache is first retired, and then the database is written
b) When a read request occurs, the cache is read first, hit is returned, Miss reads the database and the data is cached (possibly old data into the cache), as
The problems brought by:
A) As mentioned above, data replication can cause consistency issues, which may cause the cache to be inconsistent with database data due to the existence of master-slave delays
b) All app business tiers are concerned with caching and cannot mask the complexity of "Master + slave + cache"
58 The same city cache usage Scenario : Service + data + cache
The benefits are:
1) Introduction of service Layer masking "database + cache"
2) do not read and write separation, read and write to the main mode, will not cause inconsistencies
(3) Consistent design
Master-Slave inconsistent solution
Scenario One: Introducing middleware
The middleware routes the write on the key to the master, within a certain time frame (experience time for master-slave synchronization), and the read on the key is routed to the master
Scenario Two: Read and write to master
As mentioned above, 58 of the same city adopted this method, do not do read and write separation, not inconsistent
Inconsistent database and cache solutions
Two-time elimination method
Abnormal read-write timing, or lead to old data into the cache, one-time elimination is not enough, to two elimination
A) When a write request occurs, first retire the cache, then write the database, an additional timer, a certain time (master-Slave synchronization to complete the experience of time) after re-elimination
b) When a read request occurs, the cache is read first, hit is returned, Miss reads the database and the data is cached (at this point the old data may be cached, but will be eliminated by two elimination, and will not result in inconsistencies)
(4) Extensibility Design
(4.1) 58 data expansion in the same city second level
Requirements: The original horizontal cut into n libraries, now to expand to 2N libraries, I hope it does not affect the service, in the second level completed
Initially, divided into 2 libraries, 0 libraries and 1 libraries, all using "dual master when master-slave" mode to ensure the availability of
Next, the upgrade from the library, and modify the server configuration, the second stage to complete the expansion of the library
Since the 2 expansion is 4, there will be no data migration, the original 0 library becomes 0 Library +2 Library, the original 1 library becomes 1 library and 3 library
The loss of data at this time is the availability of
Finally, the old two-master synchronization (0 libraries and 2 libraries do not have data conflicts), in order to ensure the availability of new dual-master synchronization, and delete the redundant data
This scheme can achieve the capacity of N library to 2N library in seconds.
Problem: can only complete the expansion of N Library 2N library (no data Migration), non-general expansion scheme (such as 3 Library expansion 4 library can not be completed)
(4.2) Non-exponential expansion, database add field, data migration
[These methods have been described in the (above) chapter, here is no longer redundant, interested friends reply "with the city" back to see (the) Article]
Scenario One: Chase log scheme
Scenario Two: Double write scheme
(4.3) How to cut the horizontal slice
Four types of scenarios covering 99% of the demolition business
A) "Single key" scenario, how the user library is split: username (UID, xxoo)
b) "1-to-many" scenario, how the Post library is split: Tiezi (TID, uid, xxoo)
c) "Many-to-many" scenarios, how the Friend library is split: Friend (UID, friend_uid, Xxoo)
d) "Multi-Key" scenario, how the Order library is split: order (OID, buyer_id, seller_id, Xxoo)
[These are removed from the library in the (above) chapter has been introduced, here is no longer redundant, interested friends reply "with the city" back to see (the) Article]
(5) How to play SQL under massive data
Don't play like that.
A) various joint inquiries
b) Sub-query
c) Trigger
d) user-defined functions
e) "Transactions" are seldom used
Cause: Significant impact on database performance
how to play in query after you have removed the library [Reply to "the same city" back to see (previous) Article]
How to play non-partition key queries after the library is removed [Reply to "the same city" back to see (previous) Article]
after the library, how to play the library paging? [Reply to "the same city" back to see (previous) Article]
Issue with abstraction: ORDER by xxx OFFSET xxx LIMIT xxx
Stand-alone scenario: ORDER by Time OFFSET 10000 LIMIT 100
Post-Library puzzles: How to confirm global offsets
Legacy solution After Library: query rewrite + memory sequencing
A) ORDER by time OFFSET 0 LIMIT 10000+100
b) sort the 20,200 records
c) return to 10,000th to 10,100th record
Optimization Scenario One: Increase the secondary ID to reduce the number of queries
Optimization Scheme II: Fuzzy query
A) on the business: prohibit querying the data after XX page
b) on the business: Allow the fuzzy return to the accuracy of the 100th page is that really so important?
The last big trick!!! (Because of the time problem, only on the DTCC2015 share yo)
Optimization Scenario Three: The ultimate solution, business Lossless, query rewrite and two-segment query
Requirement: ORDER by x OFFSET 10000 LIMIT 4; How to implement under a sub-library (assuming 3 libraries)
Step one, query rewrite : ORDER by x OFFSET 3333 LIMIT 4
[4,7,9,10] <= 1 Library return
[3,5,6,7] <= 2 library return
[6,8,9,11] <= 3 library return
Step Two, find step one returns the Min and Max, i.e. 3 and 11
step three, through Min and max two queries : ORDER by x WHERE x between 3 and 11
[3,4,7,9,10] <= 1 Library return, 4 in 1 library offset is 3333, so 3 in the 1 library offset is 3332
[3,5,6,7,11] <= 2 library returned, 3 in 2 library offset is 3333
[3,5,6,8,9,11] <= 3 library return, 6 in 3 library offset is 3333, so 3 in the 3 library offset is 3331
Step four, find out the global offset
3 is the global offset3332+3333+3331=9996
When when, skipping over 3,3,3,4, then global offset 10000 LIMIT 4 is [5,5,6,6]
Summary: 58 design ideas of database architecture in the same city
(1) Usability, the solution is redundancy (replication)
(1.1) read availability : multiple slave libraries
(1.2) write availability : Dual master mode or dual Master for Master and Slave (58 gameplay)
(2) Read performance, three ways to expand read performance
(2.1) Increase index : Index on master and slave can be different
(2.2) increased from library
(2.3) increased cache : Service + cache + data Set (58 gameplay)
(3) Consistency
(3.1) master-Slave inconsistency : the introduction of the middle tier or read and write both the main library (58 gameplay)
(3.2) cache inconsistency : Dual elimination to resolve cache inconsistency issues
(4) Extensibility
(4.1) Data expansion : Promotion from library, double main library, second level expansion
(4.2) field extension : Follow-up log or double-notation
(4.3) horizontal slicing
(single key) how the user library is split:, User (UID Xxoo)
(1-to-many) how to split a post library: Tiezi (TID, uid, xxoo)
How to split a (many-to-many) friend Library: Friend (uid, friend_uid, Xxoo)
How to split a (multiple key) Order Library: Order (OID, buyer_id, seller_id, Xxoo)
(5) SQL play
(5.0) Do not play: federated queries, subqueries, triggers, custom functions, transactions
(5.1) inquery : Distribute Mr or assemble into different SQL statements
(5.2) non-partition key query : Locating a library or distributing Mr
(5.3) Kua library pagination
(5.3.1) Modifying SQL statements, ordering within services
(5.3.2) introduce a special ID to reduce the number of returns
(5.3.3) business optimization, allowing fuzzy queries
(5.3.4) query rewrite, two-segment query
58 cases of the same city here
Second, the database of the father of the Codd 12 rule
In addition, we review the 12 rules of the parent codd of the database as a guideline for database design:
- Information Law
All information in a relational database is represented in a unique way-the values in the table.
- Guaranteed access laws
The combination of table name, primary key value, and column name guarantees access to each data item.
- The systematic processing of null values
Null values are supported and null values are handled in a systematic manner, and null values do not depend on the data type.
- Dynamic online catalog based on relational model
The description of the database should be self-describing, at the logical level and the normal data in the same way, that is, the database must contain a description of the database structure of the system table or database description information should be included in the user can access the table.
- Uniform data sub-linguistic rules
A relational database system can support several languages and multiple terminal uses, but must have at least one language whose statements can be represented as strings in a well-defined syntax and fully support all of the following rules: Data definition, view definition, data manipulation, constraints, authorization, and transactions. (This language is SQL)
- Rule of view Update
All views that can theoretically be updated can also be updated by the system.
- Advanced INSERT, UPDATE, and delete operations
The ability to handle an underlying relationship or derivation as a single operand is not only suitable for retrieving data, but also for inserting and modifying data, i.e. data rows are treated as collections in insert, modify, and delete operations.
- Physical independence of data
Regardless of how the database's data changes in the way it is stored or accessed, both the application and terminal activity maintain a logical invariance.
- Logical independence of data
The application and terminal activity are logically invariant when the table is made theoretically without compromising the change in information.
- Independence of data integrity
The integrity constraints that are specific to a relational database must be defined in a relational database sub-language and can be stored in a data directory, not in a program.
- Distribution independence
The RDBMS's data manipulation sub-language must be able to keep the application and terminal activity logically invariant, regardless of whether the data is physically distributed or if it changes the distribution strategy at any time.
- Non-destructive laws
If a relational database system supports a low-level (single-record-processing) language, this low-level language cannot violate or circumvent the integrity rules or constraints imposed by higher-level languages (processing multiple records at a time), that is, users cannot violate database constraints in any way.
There are also some experiences:
- Reduce the reliance on database functionality
The function should be implemented by the program, not the DB implementation. The reason is that if the function is implemented by DB, once the replacement DBMS is not as powerful as the previous system and some functionality cannot be implemented, then we will have to modify the code. Therefore, in order to prevent this kind of situation, the function should have the program implementation, the database is only responsible for the data storage, to achieve the lowest coupling.
- Principles for defining entity relationships
When defining a relationship between an entity and another entity, you need to consider the following:
- The entity involved identifies all the entities involved in the relationship.
- ownership takes into account the situation in which an entity "owns" another entity.
- Cardinality considers the number of instances of an entity associated with another entity instance.
From network data collection and integration, hope to help your software development. Other articles you might be interested in:
Evolution of real-time measurement system of enterprise application performance
A few examples of cloud computing reference architectures
Smart Mobile Guide Solution Brief
Evolution of human resource management system
If you want to know more software, System IT, enterprise information information, please follow my subscription number:
Petter Liu
Source: http://www.cnblogs.com/wintersun/
This article is copyright to the author and the blog Park, Welcome to reprint, but without the consent of the author must retain this paragraph, and in the article page obvious location to the original link, otherwise reserves the right to pursue legal responsibility.
The article was also published in my Independent blog-petter Liu blog.
Design ideas of Internet database architecture