Distributed Database System

Source: Internet
Author: User
Tags dba mixed naming convention cpu usage

A distributed Database system

Distributed Database system: A physically dispersed and logically centralized database system.

Physical dispersion: Refers to the site scattered in different places, large can be different countries, and the same buildings in different locations.


Logically centralized: refers to each site is not unrelated to each other, they are a logical whole, and by a unified distributed database management system to manage.


Distributed Database system = hardware + software (Os,compiler,app.) +DB (Global DB, Local db) +dbms (global DBMS, local DBMS) +DBA (Global DBA, local DBA) + user (global user, Local user)
Homogeneous type: If the data model of a database at each site is the same data model (for example, it is relational), it is called a homogeneous distributed database system.
Homogeneous type: If it is the same DBMS (for example, ORACLE,DB2, etc.), it is called homogeneous, not heterogeneous.



What is the difference between a distributed database system and a decentralized database system?
If a user can either perform a local application to a database on the local server through a client, or perform a global application on a database in two or more nodes, such a system is Distributed Database System
Systems that do not support global applications are not known as distributed database systems, which are simply decentralized database systems.


Distributed feature site autonomy (Local Autonomy)
Non-centralized management (noreliance on central Site)
High reliability (Contiuous operation) (continuous operation)
Positional independence (location Transparency and independence)
Data segmentation Independence (fragmentation Independence)
Data replication Independence (Replication Independence)
Distributed query processing (distributed query processing)
Distributed transaction Management (distributed Transaction Management)
Hardware independence (Hardware Independence)
Operating system Independence (Operating system independence)
Network Independence (Independence)
Database management System Independence (DBMS Independence)


3 Data sharding

Data sharding(data fragmentation), also known as Split
------in the distributed database, the global database is composed by the logical combination of the local database, whereas the local database is separated by some logic of the global database.
Logical fragment: Each part of a tuple that is split and stored at the appropriate site
Units for------data storage
Shard (Cut) method: 1 Horizontal shard: The entire tuple of global relationships is divided into a subset of disjoint subsets, one logical fragment of the global relationship for each subset, by specific conditions. To apply a selection operation to a global relationship
.
2 Vertical sharding: Divides the attribute set of a global relationship into subsets. Applies a projection operation to a global relationship.
3 Mixed shards: The horizontal shards are then vertically fragmented, or the vertical shards are then horizontally fragmented.
Sharding principle: 1. Complete condition: Each data item of a global relationship must be in at least one shard
2. Conditions of Reconstruction: Global relationships can be generated from each shard
3. Disjoint condition: No public data for different shards (Control data redundancy)
Location transparency: After the global relationship is partitioned into shards, the shards are positioned on each node so that each shard has at least one node address. If the application does not have to include the node address of the data, known as the system's location transparency, the application only provides the relationship name or shard name to be accessed, and the node address is isolated from the data dictionary by the system. A system with positional transparency, where shards are moved between nodes, as long as the login data in the data dictionary is changed, the availability of the application is not affected.
When the location is transparent, the data dictionary should have the address information of the Shard, and the system will handle the data manipulation statement with multiple replicas: Modify the statement to automatically modify all the replicas, the query statement to select the most appropriate copy.
The vertical segmentation of global relationships is to group attributes, and then project the relationship into each group to get each shard. This method is applicable to the global relationship where each node has its own special attribute.
1 Complete Condition: The attribute that takes all the nodes to the global relationship is satisfied
2 the disjoint condition and the reconstructed condition of the vertical segmentation method are contradictory
------Because the global relationship is a natural join of each shard, the reconstructed condition is the non-destructive condition of the relational decomposition, that is, to satisfy the projection join dependency. For example, when decomposing into two shards, the global relationship is rebuilt only if the two shards intersect, and the intersection can be multivalued to determine any shard.
Workaround: Some systems have a system-maintained tuple identifier, such as Oracle's ROWID
(This tuple identifies all the attributes in the global relationship and is transparent to the user (which can be used without modification), so that the vertical splits are disjoint and rebuilt in terms of user-defined properties in these systems, as long as the tuple identity of the global relationship is used as an additional property of each shard. )
The advantage of data fragmentation is that data is stored by fragments rather than by relationship, which helps to better organize the distribution of data according to user needs, and also helps to control the redundancy of data.
4 Data distribution

The so-called data distribution is that the data in the distributed database is not stored on the computer storage device of a site, but instead divides the data into logical fragments according to the need, and in some way, the fragments are stored on each site.
Distribution transparency includes shard transparency, location transparency, and local data model transparency.
1 shard transparency refers to a shard in which a user or application operates on a global relationship only without having to consider the relationship. When the Shard pattern changes, the global schema is unchanged, and the application does not have to be rewritten because the global schema is mapped to the Shard pattern.
2 position transparency means that the user or application does not have to know the storage site of the fragment, and when the storage site has changed, the application does not have to change due to the mapping of the Shard pattern to the distribution pattern. At the same time, if the number of duplicate copies of the fragments changes, the redundancy of the data changes, and the user does not have to care how to maintain the consistency of the replicas, which is the transparency of the duplicate copies.
3 Local data model transparency means that the user or user program does not have to know what data model is used on the local site.
Why do these problems occur?
Because of the problem with the sharding policy and the problem of the database system, when we design the table we first shard it, select the horizontal or vertical or hybrid allocation policy, and then select site Storage. So there is the above phenomenon. However, this phenomenon should be solved according to the system dictionary. At design time, we set up the corresponding table. This allows programmers to create the highest distribution transparency when using tables.
------------------------------------
Second, the design of distributed database System 1 basic steps of database design database designing:
Demand analysis
Conceptual structure Design
Logical Structure Design
Physical Structure Design
Establishment and testing of databases
Database operation and Maintenance.
2 The naming convention does not use tab or TBL as the table prefix (it is a table, why is it explained)
The table name consists of one and more nouns that represent the contents of the table, separated by an underscore, and capitalized with the first letter of each noun.
Use table content classifications as prefixes for table names: for example, tables related to user information use the prefix user_, and content-related information uses the prefix content_.
After the table is prefixed, it is a description of the specific contents of the table. For example: The user login information table name is: User_login, the user in the Forum information table name is: User_bbs_info
Some tables that are many-to-many joins can use a prefix of two tables as the table name:
such as: User Login table User_login, user Group table Group_info, these two tables establish a many-to-many relationship table named: user_group_relation
When there are some small, recurring values in the system, use a dictionary table to conserve storage space and optimize queries. such as the name of the user type in the region and system. Such values do not change during the program's run time, but need to be stored in the database.
The field does not use any prefix (the table name represents a namespace, and the field is preceded by a prefix that looks verbose)
Dictionary names also avoid overly common and simplistic names: for example, the User name field in the user table is username better than name.
Boolean field, with some auxiliary verbs beginning, more directly vivid: for example, whether the user has a message hasmessage, whether the user through check ischecked and so on.
The field name is English phrases, adjectives + nouns or auxiliary verbs + verb tenses in the form of a mixed case, followed by the principle of "see the name of understanding."
3 SQL statement specification in some block form of SQL statements, even if there is only one line of code, also add the begin ... End block.
such as: If EXISTS (...)
SET @nVar = 100
Should be written as:
IF EXISTS (...)
BEGIN
SET @nVar = 100
END
All SQL keyword capitalization
4 Stored procedure encoding specification allows applications to access the database through stored procedures only
Encapsulating major business logic in stored procedures avoids writing large amounts of code at the application layer (SQL statements that insert too long in the application are inefficient and maintenance-intensive)
In the Web-site application system, SQL Injection vulnerability has been a very difficult to completely eliminate the vulnerability. This type of security can be greatly reduced if the database is accessed only through stored procedures.
Changes in demand, the structure of the table must be changed. With stored procedures, as long as the parameters are the same, we only need to modify the corresponding stored procedures without modifying the application's code.
Redundant tables (split tables) are used to improve efficiency and to make partial field redundancy;
Use stored procedures to centrally optimize system performance later in the project or in operation.
Problem one: After a stored procedure is compiled, it is saved as a global object of the database, and too many stored procedures consume a large amount of memory on the database server.
Problem two: Implementing a large amount of logic in a stored procedure will allow a large number of operations to be done on the database server, rather than on the application server. When the traffic is large, the CPU usage of the database server is greatly consumed.
The code inside each stored procedure must be preceded by set NOCOUNT on and set NOCOUNT OFF.


5 Database Design Specifications

5.1 Data Integrity Specification (encoding period)

1, in order to facilitate the coding period of the program to check the wrong, you can design the database as much as possible with the constraints (check). For example, the value range of the field of integer type, etc., is often field>0.
2. Similarly, use triggers as much as possible during development to verify the integrity of the data.
3. If there is redundancy between the fields, you should write a trigger to manage the redundant fields
3, in the development phase to save the full primary key, foreign key and unique index constraints.
4. Principle: During encoding, data integrity takes precedence over performance. As far as possible to improve the efficiency of the system to ensure the correct operation.


6 Database Optimization

6.1 Database Performance Optimization specification

1. Remove unnecessary constraints (check) at run time.
2. Try not to use triggers
3, try to retain the primary key constraints
4, appropriate deletion of foreign keys to improve performance
5, during operation, by analyzing the system's traffic, create indexes to optimize performance
6, analyze the possible data growth of each table, define the rules of automatic splitting table. The large table is split to improve performance.
7. Pre-consider data cleansing rules: in which case the old data in the database is deleted to improve performance.
8. Develop a database backup and disaster recovery plan.
9. For efficiency consideration, you can add redundant fields or redundant tables appropriately during the system testing phase.
10, paging records output must be implemented through a stored procedure, you can not use API cursors to page, so as to improve the efficiency of paging.
6.2 Example of a split representation

Case: The site has 2 million users, there are many modules around users to provide services.
To increase efficiency, each table holds a maximum of 100,000 records related to the user, and 2 million records are split into 20 tables. A user numbered 1.1 million saves records to table one, 100001-200000 numbered records to table two, and so on.
Create a split information table that holds which tables are split, how far apart, and what the split rules are.
When inserting a record, first determine the ID of the user who inserted the record. The stored procedure automatically inserts the table into the corresponding split table based on the scope of the ID.
When queried by criteria, the stored procedure automatically joins all the split tables, filtering out the records in the bundle. (In general: Queries of the same type are far greater than the whole of the conditions of the query)
6.3 Example of redundant field setup

Case: Message in this table, you want to save the user's ID as a foreign key. Usually, by connecting the message table and user table to know which user posted the message.
In order to improve the efficiency, in the message this table to build the user name field. When inserting records, save both the user ID and the user name. This way, when querying, you do not have to connect two tables, which makes the efficiency much more efficient.
However, when the user modifies the user name, do you want to update the user name in the other table, ignoring the effect of this inconsistent user name. How it is handled depends on how important the user name is in the module.


6.4 Example of redundant table building

Case: There are user tables and grouped tables, two tables are many-to-many relationships, and a user-group relationship table is established to achieve this relationship.
There are millions records in the user table, thousands of records in the group table. If each user belongs to more than one group, there will be millions of records in the associated table.
The user table and the associated table are now split, splitting the rule to the user's ID range. When querying a user's group, the efficiency is greatly improved. However, when querying a group of users, it is inefficient to correlate all the split tables.
For efficiency, create a redundant user and Group relationship table that holds the unified content from the first relational table, but the split rule is the range of the group ID. In this way, when querying a group of users, the cluster second relational table query, the efficiency greatly improved.
6.5 Paging scenarios in stored procedures

Scenario One: 1, first statistics to get the number of records matching the conditions
2. Define table variables: The first field of a table variable is a self-growing type, the second is a unique value field in the recordset (typically a primary key)
3. Use the Insert () SELECT statement to save the unique value field of a qualifying record in a table variable.
4. Use the Where ID in (the Select ID from table variable WHERE ...) method to read the required unique value fields from both sides of the table.
Scenario two: 1. First count the number of records that match the criteria and count the pages based on the page size
2, if read the first page, directly using the TOP clause read
3, if the number of pages in the first half:
Result set 1:select TOP curpage*pagesize fields from Table ORDER by ID ASC
Result set 2:select TOP PageSize * FROM (Result set 1) ORDER by ID DESC
Final Result: SELECT * FROM (Result set 2) ORDER by ID ASC
4, if the number of pages in the latter half:
Result set 1:select TOP (Pagecount-curpage) *pagesize fields from Table ORDER by ID DESC
End result: SELECT TOP PageSize * from Table ORDER by ID ASC





Statement:Blog Reprint address: blog.csdn.net/longronglin/article/details/5469463.

Distributed Database System

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.