Design ideas of Internet database architecture

Source: Internet
Author: User

I. 58 design ideas of database architecture in the same city

(1) Usability design

Solution: Copy + redundancy

Side effects: Replication + redundancy will definitely cause consistency issues

Guaranteed "read" Highly available method: Copy from library, redundant data, such as


Problems: Master-Slave inconsistency

Solution: See below

A general approach to "write" High Availability: Dual-master mode, that is, copying the main library (many companies use single-master, there is no guarantee of write availability at this time), redundant data such as


The problem: Dual-Master Synchronization key conflict, inconsistent

Solution:

A) programme I: guaranteed by the database or the business layer key is not in conflict with two Masters

b) Programme II: see below

58 The same city Guarantee "write" high-availability method: "Double master" when the "main" use, do not do read and write separation, in the "main" hanging off the case, "from" (actually another master), on top, such as


Advantages: Both read and write to the master, solve the consistency problem; "Dual master" when "Master and slave" to solve the usability problem

Problem: How does read performance expand? Solution See below

(2) Read performance design: How to extend read performance

The most common approach is to build an index

Creating very many indexes, the side effects are:

A) Reduced write performance

b) The index accounts for more memory, the data in memory is less, the data hit rate is low, the number of Io more

But do you think that different libraries can build different indexes? Such as


TIPS: Different libraries can create different indexes

The main library only provides write, not indexed

Online read-only online from the library, creating an online read index

Offline only provides offline reading from the library, establishing offline read index

Improve read performance common scenario two, increase from library

As mentioned above, this method will cause the master-slave inconsistency problem, the more from the library, the longer the master-slave delay, the worse the inconsistency problem

This scenario is common, but 58 does not use

Improve read performance scenario three, increase the cache

the traditional cache usage is:

A) When a write request occurs, the cache is first retired, and then the database is written

b) When a read request occurs, the cache is read first, hit is returned, Miss reads the database and the data is cached (possibly old data into the cache), as


The problems brought by:

A) As mentioned above, data replication can cause consistency issues, which may cause the cache to be inconsistent with database data due to the existence of master-slave delays

b) All app business tiers are concerned with caching and cannot mask the complexity of "Master + slave + cache"

58 The same city cache usage Scenario : Service + data + cache


The benefits are:

1) Introduction of service Layer masking "database + cache"

2) do not read and write separation, read and write to the main mode, will not cause inconsistencies

(3) Consistent design

Master-Slave inconsistent solution

Scenario One: Introducing middleware


The middleware routes the write on the key to the master, within a certain time frame (experience time for master-slave synchronization), and the read on the key is routed to the master

Scenario Two: Read and write to master

As mentioned above, 58 of the same city adopted this method, do not do read and write separation, not inconsistent

Inconsistent database and cache solutions

Two-time elimination method

Abnormal read-write timing, or lead to old data into the cache, one-time elimination is not enough, to two elimination

A) When a write request occurs, first retire the cache, then write the database, an additional timer, a certain time (master-Slave synchronization to complete the experience of time) after re-elimination

b) When a read request occurs, the cache is read first, hit is returned, Miss reads the database and the data is cached (at this point the old data may be cached, but will be eliminated by two elimination, and will not result in inconsistencies)

(4) Extensibility Design

(4.1) 58 data expansion in the same city second level

Requirements: The original horizontal cut into n libraries, now to expand to 2N libraries, I hope it does not affect the service, in the second level completed


Initially, divided into 2 libraries, 0 libraries and 1 libraries, all using "dual master when master-slave" mode to ensure the availability of


Next, the upgrade from the library, and modify the server configuration, the second stage to complete the expansion of the library

Since the 2 expansion is 4, there will be no data migration, the original 0 library becomes 0 Library +2 Library, the original 1 library becomes 1 library and 3 library

The loss of data at this time is the availability of


Finally, the old two-master synchronization (0 libraries and 2 libraries do not have data conflicts), in order to ensure the availability of new dual-master synchronization, and delete the redundant data

This scheme can achieve the capacity of N library to 2N library in seconds.

Problem: can only complete the expansion of N Library 2N library (no data Migration), non-general expansion scheme (such as 3 Library expansion 4 library can not be completed)

(4.2) Non-exponential expansion, database add field, data migration

[These methods have been described in the (above) chapter, here is no longer redundant, interested friends reply "with the city" back to see (the) Article]

Scenario One: Chase log scheme

Scenario Two: Double write scheme

(4.3) How to cut the horizontal slice

Four types of scenarios covering 99% of the demolition business

A) "Single key" scenario, how the user library is split: username (UID, xxoo)

b) "1-to-many" scenario, how the Post library is split: Tiezi (TID, uid, xxoo)

c) "Many-to-many" scenarios, how the Friend library is split: Friend (UID, friend_uid, Xxoo)

d) "Multi-Key" scenario, how the Order library is split: order (OID, buyer_id, seller_id, Xxoo)

[These are removed from the library in the (above) chapter has been introduced, here is no longer redundant, interested friends reply "with the city" back to see (the) Article]

(5) How to play SQL under massive data

Don't play like that.

A) various joint inquiries

b) Sub-query

c) Trigger

d) user-defined functions

e) "Transactions" are seldom used

Cause: Significant impact on database performance

how to play in query after you have removed the library [Reply to "the same city" back to see (previous) Article]

How to play non-partition key queries after the library is removed [Reply to "the same city" back to see (previous) Article]

after the library, how to play the library paging? [Reply to "the same city" back to see (previous) Article]

Issue with abstraction: ORDER by xxx OFFSET xxx LIMIT xxx

Stand-alone scenario: ORDER by Time OFFSET 10000 LIMIT 100

Post-Library puzzles: How to confirm global offsets

Legacy solution After Library: query rewrite + memory sequencing

A) ORDER by time OFFSET 0 LIMIT 10000+100

b) sort the 20,200 records

c) return to 10,000th to 10,100th record

Optimization Scenario One: Increase the secondary ID to reduce the number of queries

Optimization Scheme II: Fuzzy query

A) on the business: prohibit querying the data after XX page

b) on the business: Allow the fuzzy return to the accuracy of the 100th page is that really so important?

The last big trick!!! (Because of the time problem, only on the DTCC2015 share yo)

Optimization Scenario Three: The ultimate solution, business Lossless, query rewrite and two-segment query

Requirement: ORDER by x OFFSET 10000 LIMIT 4; How to implement under a sub-library (assuming 3 libraries)

Step one, query rewrite : ORDER by x OFFSET 3333 LIMIT 4

[4,7,9,10] <= 1 Library return

[3,5,6,7] <= 2 library return

[6,8,9,11] <= 3 library return

Step Two, find step one returns the Min and Max, i.e. 3 and 11

step three, through Min and max two queries : ORDER by x WHERE x between 3 and 11

[3,4,7,9,10] <= 1 Library return, 4 in 1 library offset is 3333, so 3 in the 1 library offset is 3332

[3,5,6,7,11] <= 2 library returned, 3 in 2 library offset is 3333

[3,5,6,8,9,11] <= 3 library return, 6 in 3 library offset is 3333, so 3 in the 3 library offset is 3331

Step four, find out the global offset

3 is the global offset3332+3333+3331=9996

When when, skipping over 3,3,3,4, then global offset 10000 LIMIT 4 is [5,5,6,6]

Summary: 58 design ideas of database architecture in the same city

(1) Usability, the solution is redundancy (replication)

(1.1) read availability : multiple slave libraries

(1.2) write availability : Dual master mode or dual Master for Master and Slave (58 gameplay)

(2) Read performance, three ways to expand read performance

(2.1) Increase index : Index on master and slave can be different

(2.2) increased from library

(2.3) increased cache : Service + cache + data Set (58 gameplay)

(3) Consistency

(3.1) master-Slave inconsistency : the introduction of the middle tier or read and write both the main library (58 gameplay)

(3.2) cache inconsistency : Dual elimination to resolve cache inconsistency issues

(4) Extensibility

(4.1) Data expansion : Promotion from library, double main library, second level expansion

(4.2) field extension : Follow-up log or double-notation

(4.3) horizontal slicing

(single key) how the user library is split:, User (UID Xxoo)

(1-to-many) how to split a post library: Tiezi (TID, uid, xxoo)

How to split a (many-to-many) friend Library: Friend (uid, friend_uid, Xxoo)

How to split a (multiple key) Order Library: Order (OID, buyer_id, seller_id, Xxoo)

(5) SQL play

(5.0) Do not play: federated queries, subqueries, triggers, custom functions, transactions

(5.1) inquery : Distribute Mr or assemble into different SQL statements

(5.2) non-partition key query : Locating a library or distributing Mr

(5.3) Kua library pagination

(5.3.1) Modifying SQL statements, ordering within services

(5.3.2) introduce a special ID to reduce the number of returns

(5.3.3) business optimization, allowing fuzzy queries

(5.3.4) query rewrite, two-segment query

58 cases of the same city here

Second, the database of the father of the Codd 12 rule

In addition, we review the 12 rules of the parent codd of the database as a guideline for database design:

  1. Information Law
    All information in a relational database is represented in a unique way-the values in the table.
  2. Guaranteed access laws
    The combination of table name, primary key value, and column name guarantees access to each data item.
  3. The systematic processing of null values
    Null values are supported and null values are handled in a systematic manner, and null values do not depend on the data type.
  4. Dynamic online catalog based on relational model
    The description of the database should be self-describing, at the logical level and the normal data in the same way, that is, the database must contain a description of the database structure of the system table or database description information should be included in the user can access the table.
  5. Uniform data sub-linguistic rules
    A relational database system can support several languages and multiple terminal uses, but must have at least one language whose statements can be represented as strings in a well-defined syntax and fully support all of the following rules: Data definition, view definition, data manipulation, constraints, authorization, and transactions. (This language is SQL)
  6. Rule of view Update
    All views that can theoretically be updated can also be updated by the system.
  7. Advanced INSERT, UPDATE, and delete operations
    The ability to handle an underlying relationship or derivation as a single operand is not only suitable for retrieving data, but also for inserting and modifying data, i.e. data rows are treated as collections in insert, modify, and delete operations.
  8. Physical independence of data
    Regardless of how the database's data changes in the way it is stored or accessed, both the application and terminal activity maintain a logical invariance.
  9. Logical independence of data
    The application and terminal activity are logically invariant when the table is made theoretically without compromising the change in information.
  10. Independence of data integrity
    The integrity constraints that are specific to a relational database must be defined in a relational database sub-language and can be stored in a data directory, not in a program.
  11. Distribution independence
    The RDBMS's data manipulation sub-language must be able to keep the application and terminal activity logically invariant, regardless of whether the data is physically distributed or if it changes the distribution strategy at any time.
  12. Non-destructive laws
    If a relational database system supports a low-level (single-record-processing) language, this low-level language cannot violate or circumvent the integrity rules or constraints imposed by higher-level languages (processing multiple records at a time), that is, users cannot violate database constraints in any way.

There are also some experiences:

    • Reduce the reliance on database functionality
      The function should be implemented by the program, not the DB implementation. The reason is that if the function is implemented by DB, once the replacement DBMS is not as powerful as the previous system and some functionality cannot be implemented, then we will have to modify the code. Therefore, in order to prevent this kind of situation, the function should have the program implementation, the database is only responsible for the data storage, to achieve the lowest coupling.
    • Principles for defining entity relationships
      When defining a relationship between an entity and another entity, you need to consider the following:
      • The entity involved identifies all the entities involved in the relationship.
      • ownership takes into account the situation in which an entity "owns" another entity.
      • Cardinality considers the number of instances of an entity associated with another entity instance.


From network data collection and integration, hope to help your software development. Other articles you might be interested in:
Evolution of real-time measurement system of enterprise application performance
A few examples of cloud computing reference architectures
Smart Mobile Guide Solution Brief
Evolution of human resource management system

If you want to know more software, System IT, enterprise information information, please follow my subscription number:


Petter Liu
Source: http://www.cnblogs.com/wintersun/
This article is copyright to the author and the blog Park, Welcome to reprint, but without the consent of the author must retain this paragraph, and in the article page obvious location to the original link, otherwise reserves the right to pursue legal responsibility.
The article was also published in my Independent blog-petter Liu blog.

Design ideas of Internet database architecture

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.