Teach you to be a great DBA (under Big Data)

Last Update:2015-04-17 Source: Internet

Author: User

Tags rewind

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, the basic concept

Big data volume, to do MySQL, the following concepts need to agree first

1) Library, no more said, is a vault

2) Shard (sharding), split horizontally, to resolve extensibility issues, split table by day

3) Replication (replication) and Grouping (group) to resolve usability issues

4) sharding + grouping, which is the actual situation of the architecture under the large data volume

Second, big data volume, MySQL frequently asked questions and the solution idea

1) FAQs

How can I ensure availability?

Different kinds of reading and writing ratio, how to do?

How do I do seamless rewind, add fields, expand?

How to solve the large amount of data?

2) Solution Ideas

2.1) Usability Approach: Replication

Read Library availability

Copy multiple from library, for example: 1 Master 2 from

Read the main library from the library, for example: 1 Master 1 from

Write Library availability

Dual Master Mode

"Dual master" when "master-slave" with

2.2) reading and writing than solving ideas-design for features

Read more few scenes: improve read performance, 3 common scenarios:

A) New indexes improve read performance, what tips?

b) Read/write separation, increased read performance from library expansion

c) Increase the cache to extend read performance

a) b) c) What is the problem with the programme?

How to solve these problems?

Read and write similar scenarios: Do not use caching, consider horizontal slicing

Write-read-less scenario: Do not use caching, consider horizontal slicing

2.3) Seamless rewind [expansion, add field, data migration]

Chase Log Scheme

A) record write log

b) Inverted Library

c) Inverted Library complete

D) chasing logs

E) Chase log complete + data check

f) Çeku

Dual Write scheme

A) service double write

b) Inverted Library

c) Inverted library + data check

D) Çeku

2.4) Large amount of data to solve the idea: the demolition of the library

Third, database demolition of the actual combat

Four types of scenarios covering 99% of the demolition business

A) "Single key" scenario, how the user library is split: username (uid, xxoo)

b) "1-to-many" scenario, how the Post library is split: Tiezi (tid, uid, xxoo)

c) "Many-to-many" scenarios, how the Friend library is split: friend (uid, friend_uid, Xxoo)

d) "Multi-Key" scenario, how the Order library is split: order (OID, buyer_id, seller_id, Xxoo)

1) How the user library is split

User library, 1 billion data volume

User (uid, uname, passwd, age, sex, create_time);

Business requirements are as follows

A) 1% login request = where uname=xxx and passwd=xxx

b) 99% Query request = = where uid=xxx

Conclusion: "Single key" scenario uses "single key" to split the library

2) How the Post library is split

Posts library, 1.5 billion data volume

Tiezi (tid, uid, title, content, time);

Business requirements are as follows

A) Check the post details (90% requests)

SELECT * from Tiezi WHERE tid= $tid

b) query user all posts (10% requests)

SELECT * from Tiezi WHERE uid= $uid

Conclusion: The "1-to-many" scenario uses the "1" sub-Library, such as the Post library 1 uid corresponding to multiple tid, the use of the UID library, the TID generated by adding the library tag

3) How to split a friend library

Friends Library, 100 million data volume

Friend (uid, Friend_uid, Nick, Memo, Xxoo);

Business requirements are as follows

A) Check with my friends (50% requests) + = for interface display

SELECT friend_uid from friend WHERE uid= $my _uid

b) query add me as a friend of the user (50% request) and user reverse notification

SELECT uid from friend WHERE friend_uid= $my _uid

Conclusion: "Many-to-many" scenarios, the use of data redundancy scheme, multiple data using a variety of sub-library means

4) How to split the order library

Order library, 1 billion data volume

Order (OID, buyer_id, seller_id, Order_info, Xxoo);

Business requirements are as follows

A) query order information (80% requests)

SELECT * from order WHERE oid= $oid

b) Check what I bought (19% requests)

SELECT * from order WHERE buyer_id= $my _uid

c) Inquire what I have sold (1% requests)

SELECT * from order WHERE seller_id= $my _uid

Conclusion: "Multi-key" scenario generally has two kinds of schemes

A) party I, using a comprehensive programme of 2 and 3

b) Request for scenario two, 1% with multi-Library queries

Four, after the operation of the warehouse after the actual combat

Problems after a library: MySQL's SQL function is no longer supported in libraries

1) How to play MySQL SQL with massive data

Don't play like that.

A) various joint inquiries

b) Sub-query

c) Trigger

d) user-defined functions

e) "Transactions" are seldom used

Cause: Significant impact on database performance

2) After the library, in query how to play

How user libraries perform UID in queries

User (UID, uname, passwd, age, sex, photo, create_time, ...);

Partition Key:uid

Query demand: In query: WHERE uid in (1,2,3,4,5,6)

Solution: Service to do Mr

Programme I: Direct distribution

Scenario Two: Assemble different SQL to locate different libraries

3) After the library, non-partition key query how to play

Scenario One: The business party does not care about which library the data comes from and can only locate one library

For example, a user who has an avatar is queried

Scenario Two: The result set has only one piece of data, the business layer does the distribution, and only one record returns

For example: When a user logs on, queries using username and passwd

4) After the library, how to play the page of the library?

Issue with Abstraction : ORDER by xxx OFFSET xxx LIMIT xxx

A) Sort by time

b) 100 records per page

c) Take the record on page 100th

Stand-alone Solution

ORDER by Time OFFSET 10000 LIMIT 100

Post-Library puzzles: How to confirm global offsets

Post-Library legacy solutions, query rewriting + memory sequencing

A) ORDER by time OFFSET 0 LIMIT 10000+100

b) sort the 20,200 records

c) return to 10,000th to 10,100th record

Optimization Scenario One: Increase the secondary ID to reduce the number of queries

A) Technically, a special ID is introduced as a query condition (or a sort condition that is brought into the previous page)

b) on business, try to prohibit cross-page query

Single-Library Scenarios

A) first page, direct check

b) Get the first page of Max (ID) =123 (usually the last record)

c) second page, with id>123 query: WHERE id>123 LIMIT 100

Multi-Library Scenarios

A) distributing the where id>xxx LIMIT 100

b) Sort 300 results

c) return to the first 100 articles

Pros: Avoid global sorting, sort only a small amount of records

Optimization Scheme II: Fuzzy query

A) on the business: prohibit querying the data after XX page

b) on the business: Allow the fuzzy return to the accuracy of the 100th page is that really so important?

Optimization Scenario Three: The ultimate solution, query rewrite and two-segment query

Programme I and Programme II in business have a compromise, the former does not allow cross-page query, the latter, the loss of data accuracy, the ultimate solution to solve the problem of the library paging is the order by + offset + limit query rewrite, divided into two sections of the query.

V. Summary

Concept

Library, shard, copy, group

"Common Problems and solutions"

1) Availability, solution idea is redundant (replication)

2) read/write ratio

2.1) Read less: Improve read performance from libraries, caches, indexes

2.2) Business layer control forced read Master to solve the problem from library inconsistency

2.3) Double elimination to resolve cache inconsistency issues

2.4) Reading and writing are similar, write more read less: Do not use the cache, how to complete the whole

3) Seamless Guide Library

3.1) Write log chase data

3.2) Double write

4) Large data volume, the solution is fragmented (library)

Four types of thinking of the demolition of the library

1) User library, "single key" scenario using "single key" to split the library

2) Post Library, "1 to many" scenes using "1" sub-Library, such as the Post library 1 uid corresponding to multiple tid, the use of the UID library, the TID generated by adding the library tag

3) Friends Library, "Many-to-many" scenarios, using data redundancy solutions, multiple data using a variety of library methods

4) Order Library, "multi-key" scenarios generally have two scenarios

4.1) Programme I, use of 2 and 3 integrated programmes

4.2) Scenario two, 1% request with multi-Library query

"Post-demolition business"

1) Don't play like this: Federated queries, subqueries, triggers, user-defined functions, Kua-

2) in query how to play

2.1) Distribute Mr

2.2) Assemble into different SQL statements

3) non-partition key query how to play

3.1) Locate a library

3.2) Distribute Mr

4) How to play the library page

4.1) Modify the SQL statement, sort within the service

4.2) Introduce a special ID to reduce the number of returns

4.3) business optimization, prohibit cross-page query, allow fuzzy query

Teach you to be a great DBA (under Big Data)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More