Distributed Database 2

Last Update:2016-06-24 Source: Internet

Author: User

Tags prepare

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, the design of distributed database system

1. Shard Design

In the design of distributed database system, the most basic problem is the problem of data distribution, that is, how to divide the global data logically and physically. The logical partition becomes the Shard, the actual physical allocation is the allocation content. General design strategies we have two forms of top-down and bottom-up. Top-down helps to understand the content of fresh things, from the topmost level, from the abstraction of the highest point to the cobwebs to the smallest unit. The bottom-up is different, it is to understand things on the basis of improving the bottom, gradually from the bottom to the top layer optimization process. Therefore, the analysis can be based on different characteristics and requirements of different analytical methods.

The data distribution design includes the design of shard design and the allocation of fragment location, the former is the logical division of the global schema, the latter is mapped to the appropriate physical site. That is, the "same property" of the tuple is divided horizontally, will have the "same nature" of the properties of the vertical Shard, not put in a group, each group formed a fragment. The advantages of sharding are as follows: (1) Reduce network traffic, (2) increase the locality of transaction processing, (3) Provide data availability and query efficiency (4) is advantageous to load balance.

The classification of shards has horizontal classification, vertical classification, mixed shards, induced shards and so on. The principle of sharding is to satisfy completeness, reconfigurable and disjoint. Completeness is that the global must be assigned to a certain shard, reconfigurable, that is, all shards can reconstruct the global schema, disjoint, the divided data fragment intersection is empty. Therefore, shards are to satisfy the principle of sharding. Horizontal sharding is the selection of the same relationship pattern, grouping tuples into local logic concepts, and by grouping merge can form a global pattern. Generally through the semi-connection to reduce the amount of communication information, the end of resources. A vertical shard is a selection projection of attributes, requiring each record to contain its primary key columns, which can form a vertical shard. This can reduce the cost of user queries, meet the requirements of users, which is generally achieved through clustering algorithm. A mixed Shard is a mixed-shard process of horizontal and vertical shards. The induced shard is also called the Relational Shard, he is not the horizontal shard, but uses the horizontal shard to induce, the Shard relation is not its own relation attribute, but another one has the correlation relation attribute to divide. It is a semi-connection that completes the projection of a relationship on the basis of a natural connection, or the same attribute in two relationships is projected first and then connected naturally. The representation of shards can be represented by graphical representation and Shard tree notation, which has the advantages of visual analysis.

2. Assigning Designs

For the assigned design, the global data is fragmented after the Shard design, for the resulting segmentation of fragments, fragments to the physical site of the storage mapping process. The types of allocations include non-replication allocations and replication allocations, which are non-redundant, the latter with redundancy, and the latter in both partial and full replication. The use of replication allocation can increase the locality of read-only transaction processing, provide the reliability and availability of the system, but increase the operation and maintenance of the system, but the full segmentation of the data can make the system load balanced, reduce the cost of operation and maintenance, but the inverse aspect will reduce the system transaction locality, and reliability availability. So we're going to make a comprehensive compromise.

Allocation should be based on the actual characteristics of the data, application requirements, site storage and processing costs and other aspects of the comprehensive consideration.

Second, distributed database query and optimization

1. Distributed Query and optimization

Query processing is the main content of database management. Compared with centralized query processing, the process of distributed query processing is divided into four aspects: query transformation, data localization, query access optimization and local query optimization. The factors affecting the cost of the query are network transmission delay, local IO cost, CPU calculation cost, so it can be seen that the network cost and the local IO cost are the most common. The remote distributed database may mainly consider the communication delay, and the near distributed database generally considers the IO cost.

In the processing level of optimization query can be divided into global control and local control two pieces, the global control is divided into query decomposition, data localization, query Access optimization three parts, local control is the local query optimization and optimization of the ontology implementation strategy. As shown in 5. The query decomposition mainly transforms the query into the relational algebra expression, eliminates the redundant expression, localization and the proximity to the query, in fact, is mainly based on the Shard pattern and the allocation pattern. This includes the following four steps: Query normalization, query analysis, query simplification, query rewriting. Query localization is also data localisation. This stage includes the global query and Fragment query optimization, the global query is based on the Shard pattern decomposition to the Shard query; its output is the converted query, which is the input of the cost-based query access optimization, that is, the next stage of input. Global optimization of Access query optimization, mainly considering the communication cost, that is, the choice of the site. An optimized fragment of information on the relational algebra query, is based on the statistics of the database. This step generates a query side with the inter-shard communication operator. Finally, the local query optimization, the site to determine the copy, the corresponding site access, and optimization. This step is mainly to use centralized db, the way, based on the local pattern of logical query plan to select the optimal physical query plan, so that the corresponding time of the local query minimum, local IO minimum. Compared with a centralized database, the step of data localization is unique to distributed, which is based on sharding and allocation, and is also a key point to be considered in distributed database, which is not in the centralized type.

The factors that affect the efficiency of distributed query optimization are the cost of network communication, local IO optimization, and CPU computing, as previously mentioned. Therefore, the optimization of the distributed optimization is mainly for the communication cost and the local IO cost. This is also the goal of our query processing, the latter two are centralized database system query optimization objectives. So in the high-speed LAN we can only consider the performance constraints similar to the centralized database, and the main factor that we may consider in the remote communication network is the communication cost.

The classification of distributed queries can have local queries, remote queries, and global queries. The criteria for classification are different from the query site.

2. Access optimization for distributed queries

Because of the importance of distributed query, this paper describes the access optimization of distributed query. Distributed access optimization is mainly to generate query strategy with inter-partition communication operators, design on-the-spot connection and communication transmission mode, etc. The main steps are: (1) issuing a query command from a query site (2) obtaining data from the source site (3) Determining the best execution site (4) returns the execution result. Important is the selection of the source site and the choice of execution site. Some of the strategies for query optimization are mentioned here: (1) Determine physical copies, perform functions separately, optimize performance, and share costs. The main consideration of the strategy is that the minimum physical copy of the site, the minimum amount of data is preferred, the network communication cost of the smallest physical copy first, binary operation is best to operate in this site, reduce data transmission. (2) Determine the most executed order of the operators in the fragment query expression, so the benefit of processing is to reduce the amount of data. (3) Choose the method to execute each operator, the best. Therefore, the two trends of distributed connection Operation optimization are to use semi-connection in WAN, reduce the amount of network data transmission, reduce the cost of network transmission, and reduce the cost of local processing by direct connection in LAN.

The cost model of the query, for the centralized and distributed cost is, the visible distribution has the communication cost, some time the communication cost and occupies the main component. At the same time, communication costs include, X is the amount of information, respectively, the number of initialized messages and the amount of information transmitted each time, C for the corresponding transmission costs. This is the query cost model of distributed database system.

There are two forms of semi-connected and direct connection for optimization. Semi-connection, on the basis of natural connection, projection, reduce the amount of data selected, although this provides the ability of local query processing, but this may increase the number of data communication. In the case of A and r relationships, the communication cost is similar to the expression: (a formula is missing), the length of the tuple, the length of the projection property, card for the base, similar to the number of the projection after the tuple, but the actual calculation is complex. Based on the method of direct query, nested loops join algorithm, merge sort join algorithm, hash join algorithm and index-based connection algorithm are based on enumeration method. In these methods, nested loops are concatenated, one of the relationships is read-only, another loop is nested and then concatenated, or sorted by attributes, and then concatenated, quickly and efficiently, you must select the most read-only relationship with a small amount of data. Merge the sorting, merging the attributes in the same order, merges the merged connection, and then uses the two relation sort sub-table to perform the merge join operation, thus saving the read and write operation to the relationship. Hash method, the same hash function is used for two relationships at the same time for the connection attribute hash, the property with the same key value on the connection property is placed in the same bucket, connected to the tuple in the bucket. The most indexed-based form is indexing between relationships and optimizing speed through indexes. At the same time, the query also uses some artificial intelligence or computational intelligence methods, such as genetic algorithm, simulated annealing, dynamic planning and so on, to improve the speed of the system has a great impact.

Third, distributed database transaction processing, concurrency control, error control

1. Centralized transaction processing, concurrency control, error control

A transaction is a sequence of operations that are either wholly or wholly, an indivisible whole. For a centralized database transaction acid, atomicity, consistency, isolation, persistence, and also serializable. Faults are categorized into types: Transaction failure, system failure, media failure, and communication failure class four. Transaction failure is divided into predictable and unpredictable failures, mainly for recovery, log plus undo. For system failure redo or undo. The solution for media failure is backup copy and log file Recovery plus redo. Communication failures mainly include network segmentation and packet loss. Transaction failures, system failures, and communication failures together become soft failures, while media failures become hard failures. And for concurrency control, mainly with the inconsistency of the database, the measures taken are mainly blocked, which involves two kinds of locks for resource occupation, if improper operation will cause deadlock, how to avoid the problem, it is this part of the research content.

2. Distributed transaction management, concurrency control, error control

In the case of distributed transaction management, the concept and characteristics of transactions are the same as those of centralization, which has the characteristics of distribution. In addition to the ACID properties, the distributed global and local databases are guaranteed to be consistent with the global transaction and local transaction. With execution characteristics: A control process, a coordinator process, and the coordination of the execution of the sub-transactions; Operation characteristics: The data access operation sequence, but also have a large number of communication primitives Control message: Increase the control message and coordinate the operation of each sub-transaction. So the local transaction management program of the distributed database system LTM can use the transaction management mechanism provided by local site to realize the atomicity of distributed transaction (equivalent to centralized transaction management and transaction recovery mechanism). The implementation model of distributed system has two kinds of process model and server model. At the same time, distributed control model has master-slave model, triangular model, hierarchical model and so on. The goal of the transaction management of the distributed system is to realize the high execution efficiency, reliability and concurrency of the transaction.

the commit protocol for distributed transactions generally uses two-segment commit protocol, 2PC protocol, but also provides a non-blocking transaction commit protocol to solve the 2PC can cause blocking wait. For 2PC, which includes coordinators and participants, a special agent specified in the various agents of the Coordinator affairs, is responsible for determining the commit and discarding of all child transactions; The participant is a proxy other than the coordinator who is responsible for the submission or abandonment of all the sub-transactions. The basic idea of the two-paragraph submission agreement is (1) to decide on the voting phase, where the facilitator issues a pre-commit command to each participant, and then waits for an answer, which satisfies the submitted condition if all the child transaction participants return the reply that is ready to be submitted. If a child of a contributor returns to be ready to discard, the transaction cannot be committed. (2) The implementation stage, in case the transaction has the submission, the coordinator sends the submission order to each participant, executes the submission, otherwise the coordinator sends the discard command to all the participants, executes the obsolete command. Whether committed or discarded, participants are asked to send a confirmation message to the facilitator after the execution is complete. The method of 2PC implementation has centralized method, distributed 2PC method, layered method, linear method and so on, centralized simple is a commonly used distributed transaction commit protocol implementation method, mainly in the coordinator and participants see the delivery of messages, the cost of the hour transmission with the increase in the number of participants in linear growth , distributed 2PC, all the participants are coordinators, can decide the commit and discard the transaction, similar to peer, equivalent (multi-point), it only needs a phase is the commit phase, by mutual communication to obtain their own state, to jointly decide whether to prepare for submission or to abandon, each participant can decide for themselves, Do not need the same command of the Coordinator, but can not transmit a large number of mutual messages, suitable for small systems; Layered method is also called the tree-like implementation method, the root is the coordinator, the participants constitute the middle node or leaf node, the root of the node to the bottom of the command, the lower stage to upload the confirmation information, so the layered 2PC report But because of the low stratification efficiency, the linear method can construct a linear table-like participant linked list, is a kind of chain table iteration Preparation and confirmation, the process of submission, discarding is also the same, the number of linear 2PC message transmission is less, but because the iteration makes the response inefficient, suitable for the communication cost of the system.

For the 2PC commit protocol, when a contributor cannot commit its child transactions, all local sub-transactions are terminated, so the efficiency becomes not high, the lost contact participant passively waits, waits for enough time, then abandons, and does not cooperate with other participants to make a common decision, is in a blocking state. Therefore, a 3PC non-blocking Distributed Transaction submission protocol is proposed. Compared with 2PC, 3PC in the voting and implementation phase in the middle of a phase to prepare the submission phase, colleagues after the vote, the preparation stage, before implementation if one participant and the coordinator lost contact, you can use the information of the remaining participants, by understanding the status of other participants to infer the coordinator's orders, and run independently. The compatibility matrix is used here. As shown in table 1. This is based on the compatibility of the mutual state, you can judge the order of the Coordinator issued, so you can judge the state from the already. This is what we can use in other aspects of the design of the strategy, such as one participant status is pro-commit, there is a commit state, and others are normal, so we just need to change the state of the pro-commit to ready, through the state-compatible principle, can infer the submission, because the readiness and submission are incompatible with obsolescence.

Refer to: http://blog.csdn.net/long636/article/details/51733358

Participant 1\ Participant 2	Pro-Submission	Get ready	Submit	Abandoned
Pro-Submission	Compatible	Compatible	Not compatible	Compatible
Get ready	Compatible	Compatible	Compatible	Not compatible
Submit	Not compatible	Compatible	Compatible	Not compatible
Abandoned	Compatible	Not compatible	Not compatible	Compatible

3. Distributed Failure recovery

For the recovery model, it is mainly based on the log files and backup copy, using the combination of undo and redo strategy. For soft faults, the reverse is reversed, then forward redo; for hard faults, the combination of copy and log files can be used, the backup before the fault, the import, and the use of log recovery at the point of import point of failure. For a centralized database recovery, our update policy is (1) Update policy in-place update and geo-update, in-situ update is the database update operations directly modify the database buffer old values, geo-update new values do not replace the old value, but stored in another location. (2) Buffer update, fixed/non-fixed, refresh/non-flush, whether the former should wait for the local recovery manager to issue a command to write the buffer contents to the external memory database; the latter is whether the local recovery manager forces the cache to write back to the external memory database in order to modify the data after the transaction execution is complete. The second can be composed of four combinations of fixed refresh, fixed non-flush, non-fixed refresh, non-fixed non-flush form, fixed refresh can write committed transactions to the external memory database, but the discarded transaction will not be written, discarded do not do recovery; fixed non-flush committed transactions can be written, but obsolete is still not, so for committed redo The non-fixed flush commits the transaction to write external memory, but the obsolete can be partially written as, the discarded part is reversed, and the non-fixed non-flush needs to be reversed and redo combined.

There are 2 submission protocols and three-paragraph commit protocols for the recovery of distributed transactions. However, there are two kinds of classification of site fault and communication fault mainly. The site failure is divided into two categories: the participant site and the coordinator site. Communication failure includes two kinds of message loss and network segmentation. Colleagues for the two-and three-paragraph submission agreement, the various categories under the situation will be different, which is based on the characteristics of each of the causes.

Distributed and reliable protocol, for reliability and availability, this is a combination of contradictions, reliability in the given environmental conditions and the specified time, the database system does not occur any failure probability. Availability is the probability that the database system will not fail at a given moment. We can see that reliability is a period of time, availability is a point of time, the reliability of the system is the correctness of the system can be used to operate the capacity; reliability from the time period we will be able to see the improvement is difficult, two availability at a certain point in time to improve the ease. Raising the accesses than either must be at the expense of the other, so in practice we need to compromise on the actual situation.

4. Distributed concurrency control

Concurrency control is one of the basic tasks of transaction management, and the main purpose is to ensure data consistency of distributed database. When the distributed transaction executes concurrently, the concurrent transaction must realize the serializable of the distributed transaction, and ensure the good concurrency of the transaction, and ensure the good performance of the system. Serializable is a number of transactions that execute concurrently, and the result of its operation should be the same as the results obtained in a sequential serial execution of these firms. The parallel scheduling of this kind of jargon is accomplished by the concurrency control mechanism of the database, and the consistency of database is ensured when the database executes concurrently. Serializable scheduling is the concurrency dispatch which is equivalent to some kind of serialization execution result. The technologies that can be used are distributed locking (blocking) technology and distributed forms of technology. The former and the same centralized, mainly in the form of shared locks and mutexes, access to resources to lock the resources before the process access to the resources have been locked, according to the lock type, if it is an exclusive lock, the subsequent process when to use the resource must wait for the last process to complete the access to the resource, release its exclusive lock, can occupy. Sharing a lock is easier. But then if, two processes and two resources, each process consumes one resource, locks it up, but then needs each other's resources, then they fall into a deadlock. How to avoid deadlock, can have the avoidance method and the appearance of detection and elimination of two forms, to avoid the law is the resource ordering, one occupation, or sequential occupation, to avoid the occurrence of deadlocks. The detection exclusion method, can allow the deadlock, but if the wait time out, you can automatically schedule the resource, check out the deadlock, you can use the graph method detection. It can be seen that the former resource waste is serious, but can ensure that there is no deadlock, the latter resource utilization is high, but the scheduling and algorithm complex. For distributed database system, centralized parallel scheduling can be used to dispatch on the respective site, and it can be used in the form of non-fixed primary site.

For the parallel protocol, two Customs can be implemented according to Protocol 2PL, can be used for loss of modification errors, can not repeat the read error, read dirty data errors and other control, and can be more stringent 2PL protocol for the submission or disposal. At the same time, pessimistic method and optimistic method can be used for concurrency control algorithm. Pessimistic law as mentioned above locking method, time stamp method, mixed method, and the lock method can also be divided into centralized, distributed, the main copy of the lock form, time stamp also has classification and so on. The principle of optimism is to include the locking method and the time stamp method. It can be said that they have advantages and disadvantages, the choice can be based on the specific circumstances.

The above content is only a summary report of the individual during the learning process.

Distributed Database 2

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More