First, the basic concept
Big data volume, to do MySQL, the following concepts need to agree first
1) Library, no more said, is a vault
2) Shard (sharding), split horizontally, to resolve extensibility issues, split table by day
3) Replication (replication) and Grouping (group) to resolve usability issues
4) sharding + grouping, which is the actual situation of the architecture under the large data volume
Second, big data volume, MySQL frequently asked questions and the solution idea
1) FAQs
How can I ensure availability?
Different kinds of reading and writing ratio, how to do?
How do I do seamless rewind, add fields, expand?
How to solve the large amount of data?
2) Solution Ideas
2.1) Usability Approach: Replication
Read Library availability
Copy multiple from library, for example: 1 Master 2 from
Read the main library from the library, for example: 1 Master 1 from
Write Library availability
Dual Master Mode
"Dual master" when "master-slave" with
2.2) reading and writing than solving ideas-design for features
Read more few scenes: improve read performance, 3 common scenarios:
A) New indexes improve read performance, what tips?
b) Read/write separation, increased read performance from library expansion
c) Increase the cache to extend read performance
a) b) c) What is the problem with the programme?
How to solve these problems?
Read and write similar scenarios: Do not use caching, consider horizontal slicing
Write-read-less scenario: Do not use caching, consider horizontal slicing
2.3) Seamless rewind [expansion, add field, data migration]
Chase Log Scheme
A) record write log
b) Inverted Library
c) Inverted Library complete
D) chasing logs
E) Chase log complete + data check
f) Çeku
Dual Write scheme
A) service double write
b) Inverted Library
c) Inverted library + data check
D) Çeku
2.4) Large amount of data to solve the idea: the demolition of the library
Third, database demolition of the actual combat
Four types of scenarios covering 99% of the demolition business
A) "Single key" scenario, how the user library is split: username (uid, xxoo)
b) "1-to-many" scenario, how the Post library is split: Tiezi (tid, uid, xxoo)
c) "Many-to-many" scenarios, how the Friend library is split: friend (uid, friend_uid, Xxoo)
d) "Multi-Key" scenario, how the Order library is split: order (OID, buyer_id, seller_id, Xxoo)
1) How the user library is split
User library, 1 billion data volume
User (uid, uname, passwd, age, sex, create_time);
Business requirements are as follows
A) 1% login request = where uname=xxx and passwd=xxx
b) 99% Query request = = where uid=xxx
Conclusion: "Single key" scenario uses "single key" to split the library
2) How the Post library is split
Posts library, 1.5 billion data volume
Tiezi (tid, uid, title, content, time);
Business requirements are as follows
A) Check the post details (90% requests)
SELECT * from Tiezi WHERE tid= $tid
b) query user all posts (10% requests)
SELECT * from Tiezi WHERE uid= $uid
Conclusion: The "1-to-many" scenario uses the "1" sub-Library, such as the Post library 1 uid corresponding to multiple tid, the use of the UID library, the TID generated by adding the library tag
3) How to split a friend library
Friends Library, 100 million data volume
Friend (uid, Friend_uid, Nick, Memo, Xxoo);
Business requirements are as follows
A) Check with my friends (50% requests) + = for interface display
SELECT friend_uid from friend WHERE uid= $my _uid
b) query add me as a friend of the user (50% request) and user reverse notification
SELECT uid from friend WHERE friend_uid= $my _uid
Conclusion: "Many-to-many" scenarios, the use of data redundancy scheme, multiple data using a variety of sub-library means
4) How to split the order library
Order library, 1 billion data volume
Order (OID, buyer_id, seller_id, Order_info, Xxoo);
Business requirements are as follows
A) query order information (80% requests)
SELECT * from order WHERE oid= $oid
b) Check what I bought (19% requests)
SELECT * from order WHERE buyer_id= $my _uid
c) Inquire what I have sold (1% requests)
SELECT * from order WHERE seller_id= $my _uid
Conclusion: "Multi-key" scenario generally has two kinds of schemes
A) party I, using a comprehensive programme of 2 and 3
b) Request for scenario two, 1% with multi-Library queries
Four, after the operation of the warehouse after the actual combat
Problems after a library: MySQL's SQL function is no longer supported in libraries
1) How to play MySQL SQL with massive data
Don't play like that.
A) various joint inquiries
b) Sub-query
c) Trigger
d) user-defined functions
e) "Transactions" are seldom used
Cause: Significant impact on database performance
2) After the library, in query how to play
How user libraries perform UID in queries
User (UID, uname, passwd, age, sex, photo, create_time, ...);
Partition Key:uid
Query demand: In query: WHERE uid in (1,2,3,4,5,6)
Solution: Service to do Mr
Programme I: Direct distribution
Scenario Two: Assemble different SQL to locate different libraries
3) After the library, non-partition key query how to play
Scenario One: The business party does not care about which library the data comes from and can only locate one library
For example, a user who has an avatar is queried
Scenario Two: The result set has only one piece of data, the business layer does the distribution, and only one record returns
For example: When a user logs on, queries using username and passwd
4) After the library, how to play the page of the library?
Issue with Abstraction : ORDER by xxx OFFSET xxx LIMIT xxx
A) Sort by time
b) 100 records per page
c) Take the record on page 100th
Stand-alone Solution
ORDER by Time OFFSET 10000 LIMIT 100
Post-Library puzzles: How to confirm global offsets
Post-Library legacy solutions, query rewriting + memory sequencing
A) ORDER by time OFFSET 0 LIMIT 10000+100
b) sort the 20,200 records
c) return to 10,000th to 10,100th record
Optimization Scenario One: Increase the secondary ID to reduce the number of queries
A) Technically, a special ID is introduced as a query condition (or a sort condition that is brought into the previous page)
b) on business, try to prohibit cross-page query
Single-Library Scenarios
A) first page, direct check
b) Get the first page of Max (ID) =123 (usually the last record)
c) second page, with id>123 query: WHERE id>123 LIMIT 100
Multi-Library Scenarios
A) distributing the where id>xxx LIMIT 100
b) Sort 300 results
c) return to the first 100 articles
Pros: Avoid global sorting, sort only a small amount of records
Optimization Scheme II: Fuzzy query
A) on the business: prohibit querying the data after XX page
b) on the business: Allow the fuzzy return to the accuracy of the 100th page is that really so important?
Optimization Scenario Three: The ultimate solution, query rewrite and two-segment query
Programme I and Programme II in business have a compromise, the former does not allow cross-page query, the latter, the loss of data accuracy, the ultimate solution to solve the problem of the library paging is the order by + offset + limit query rewrite, divided into two sections of the query.
V. Summary
Concept
Library, shard, copy, group
"Common Problems and solutions"
1) Availability, solution idea is redundant (replication)
2) read/write ratio
2.1) Read less: Improve read performance from libraries, caches, indexes
2.2) Business layer control forced read Master to solve the problem from library inconsistency
2.3) Double elimination to resolve cache inconsistency issues
2.4) Reading and writing are similar, write more read less: Do not use the cache, how to complete the whole
3) Seamless Guide Library
3.1) Write log chase data
3.2) Double write
4) Large data volume, the solution is fragmented (library)
Four types of thinking of the demolition of the library
1) User library, "single key" scenario using "single key" to split the library
2) Post Library, "1 to many" scenes using "1" sub-Library, such as the Post library 1 uid corresponding to multiple tid, the use of the UID library, the TID generated by adding the library tag
3) Friends Library, "Many-to-many" scenarios, using data redundancy solutions, multiple data using a variety of library methods
4) Order Library, "multi-key" scenarios generally have two scenarios
4.1) Programme I, use of 2 and 3 integrated programmes
4.2) Scenario two, 1% request with multi-Library query
"Post-demolition business"
1) Don't play like this: Federated queries, subqueries, triggers, user-defined functions, Kua-
2) in query how to play
2.1) Distribute Mr
2.2) Assemble into different SQL statements
3) non-partition key query how to play
3.1) Locate a library
3.2) Distribute Mr
4) How to play the library page
4.1) Modify the SQL statement, sort within the service
4.2) Introduce a special ID to reduce the number of returns
4.3) business optimization, prohibit cross-page query, allow fuzzy query
Teach you to be a great DBA (under Big Data)