Take out the Reading Notes of distributed databases in college

Last Update:2018-05-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger. Directory (?) [] Chapter II Design of Distributed Database System 1 Database Design overview Database Design refers to the construction of the optimal database mode for a given application environment, the establishment of databases and their application systems, to effectively store data. Basic Steps for Database Design (

Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

Directory (?) [+]

Chapter 2 Design of Distributed Database System 1 Overview of Database Design

Database Design refers to constructing the optimal database mode for a given application environment, establishing a database and its application system, so that it can effectively store data.
Basic Steps for Database Design (2.1 ):

Requirement Analysis

Conceptual Structure Design

Logical Structure Design

Physical Structure Design

Database creation and Testing

Database operation and maintenance.

. 1

Database stage design description, 2.2

. 2

2 naming rules 2.1 General naming rules

1. All names have the following character ranges: A-Z, a-z, 0-9, and _ (underline ). Other characters are not allowed as names.

2. Use English words or phrases (including abbreviations) as names. You cannot use meaningless characters or Chinese pinyin.

3. The name should be clear and clear, and be able to accurately express the meaning of a thing. It is best to read it and follow the principle of "knowing what you mean by name.

2.2 naming conventions for tables

1. Do not use tab or tbl as the table prefix (originally a table, why)

2. The table name is composed of one or more nouns that represent the content of the table. The following lines are separated and the first letter of each noun is capitalized.

3. Use the table Content category as the table name prefix. For example, use User _ as the prefix for a table related to User information, and use Content _ as the prefix for Content-related information _.

4. After the table prefix, It is the description of the specific table content. For example, the user login information table is named User_Login, and the user's information table in the Forum is named User_BBS_Info.

5. For Multiple-to-multiple join tables, you can use the prefix of the two tables as the table name:

For example, the user logon table User_Login and the user grouping table Group_Info. the names of the two tables that establish many-to-many relationships are User_Group_Relation.

6. When there are a few duplicate values in the system, use a dictionary table to save storage space and optimize queries. For example, user code of the region or system. These values do not change during the running period of the program, but must be stored in the database.

For the region, if we want to query the records of a region, the database needs to query the records by string matching. If we change the region to the region code in the table, when you query by region code, the query efficiency will be greatly improved.

In the program, a large number of dictionary tables should be used to represent such values. The dictionary table stores the code of this type of value and the set of objects. The foreign key is associated with the table using this type of value. However, in the coding stage, programmers do not use dictionary tables because they first query the entity code in the dictionary table, which violates the original intention of improving the query efficiency. With the help of Data Dictionary, programmers directly use code to represent entities, thus improving efficiency.

Although dictionary tables are not actually used, they should still be kept in the database (at least during the development period ). A dictionary table appears as another form of "data dictionary document" to show which tables in the database use dictionary tables.

To improve the data integrity of the database, you can retain the foreign key constraints of the complete dictionary table and common table during the development phase. However, in the running stage of the database, the foreign keys of common tables and dictionary tables should be deleted to improve the running efficiency, especially when many dictionary tables are used in some tables.

Case: a database contains millions of user information, and the application system often needs to query user information by region. The user information table was previously saved according to the specific region name. Now, the specific name is changed to the region code in the dictionary table, which greatly improves the query efficiency.

The dictionary table is prefixed with dic.

2.3 Field naming rules

4. The field does not use any prefix (the table name represents a namespace, And the prefix added to the field is too long)

2. Avoid using too common and too simple names for dictionary names. For example, in the User table, the UserName field is better than the Name field.

5. boolean fields begin with some help verbs and are more vivid: for example, whether a user has a message HasMessage or whether the user has passed the IsChecked check.

6. The field name is expressed in the form of an English phrase, adjective + noun or auxiliary verb + verb tense. It is case-insensitive and follows the principle of "seeing the name and knowing the meaning.

3. SQL statement Specification

1. SELECT * FROM ......, The specific fields to be read must be specified.

2. You cannot directly write SQL statements in application code to access the database.

3. Avoid writing too long SQL statements in one row. It is clearer to divide SQL statements into multiple rows where SQL keywords exist.

For example, SELECT UserID, UserName, UserPwd FROM User_Login WHERE AreaID = 20

Modify:

SELECT UserID, UserName, UserPwd

FROM User_Login

WHERE AreaID = 20

More intuitive

4. In some block-form SQL statements, even if there is only one line of code, you must add BEGIN... END Block.

For example, if exists (...)

SET @ nvar= 100

It should be written:

If exists (...)

BEGIN

SET @ nvar= 100

END

5. the blank lines and indentation of SQL batch processing statements are consistent with the general structured programming language and should be in good code format.

6. All SQL keywords are capitalized.

4 stored procedure code specification 4.1 only allow applications to access the database through stored procedures

Only applications can access the database through stored procedures, rather than directly writing SQL statements in the code.

In database development projects, there are many advantages to using a large number of stored procedures. First, let's look at the information provided by Microsoft:

The advantages of using the stored procedures in SQL Server instead of using the locally stored Transact-SQL program on the client's computer are as follows:

Modular programming is allowed: you only need to create a process once and store it in the database, and then you can call the process any time in the program. Stored procedures can be created by people with expertise in Database Programming and can be modified independently of the program source code.

Allow faster execution: If an operation requires a large amount of Transact-SQL code or repeated execution, the stored procedure will be faster than the execution of the Transact-SQL batch code. The stored procedure will be analyzed and optimized when it is created, and the memory version of the process can be used after the first execution of the process. Each time you run a Transact-SQL statement, it must be repeatedly sent from the client and compiled and optimized every time the SQL Server executes these statements.

Reduce network traffic: an operation that requires hundreds of rows of Transact-SQL code is implemented by a separate statement that executes the Process Code without sending hundreds of lines of code in the network.

It can be used as a security mechanism: users who do not have the permission to directly execute statements in the stored procedure can also grant them the permission to execute the stored procedure.

In addition, the advantages of using stored procedures include:

1. Logically, stored procedures separate the application layer from the physical structure of the database. Stored Procedures form an interface between applications and databases. This interface abstracts complex database structures and conforms to the idea of "interface-based programming" in extreme programming.

2. Encapsulating the main business logic in the stored procedure can avoid writing a lot of code at the application layer (inserting too long SQL statements in the application program through strings affects the efficiency, and difficult to maintain ). This helps improve the development efficiency and debug the stored procedure directly in the query analyzer. It can detect logical problems in the system earlier and improve the code quality.

3. SQL Injection Vulnerabilities in application systems such as websites have been difficult to completely eliminate. If you access the database only through stored procedures, this type of security issues can be greatly reduced. (Therefore, even a simple SQL statement should be written as a stored procedure .)

4. Because stored procedures are used, the application layer can focus on the calling of Stored Procedure interfaces rather than the specific database structure. Therefore, the advantages of stored procedures are obvious in the following situations:

· Requirement change: The table structure must be changed. When using stored procedures, as long as the parameters remain unchanged, we only need to modify the corresponding stored procedures without modifying the application code. This design will reduce the impact of demand changes on the project.

· To improve efficiency and make some fields redundant: some fields that are frequently accessed can be stored in related tables. This not only improves the efficiency, but also shields redundant details through the storage process.

· To improve efficiency, use redundant tables (sharding tables): For large tables, you may need to save records to multiple tables to improve query efficiency. When using stored procedures, a stored procedure is used to determine which split tables to obtain or insert data. This improves the efficiency and eliminates the need to consider specific splitting rules at the application layer.

5. Use the stored procedure to optimize the system performance in the later or running stages of the project. During project development, efficient Code cannot be written due to various reasons. This problem is often manifested in the later stage of the project or during the runtime. Access to the database is encapsulated through the storage process. After Project Integration, you can test run to observe the system operation efficiency, so as to easily identify the bottleneck of the system, and can improve the system running efficiency by optimizing the stored procedure code. Such optimization is more effective and easier than Optimization in application programs.

At the same time, there are also some concerns about the excessive use of stored procedures:

Problem 1: After the stored procedure is compiled, it will be saved as the Global Object of the database. Too many stored procedures will occupy a large amount of memory on the database server.

Problem 2: implementing a large number of logic in the stored procedure will allow a large number of operations to be completed on the database server, rather than on the application server. When the traffic is high, the CPU usage of the database server is greatly consumed.

This case also exists here: there is a website with a huge access volume. Multiple WEB servers form a server cluster of Server Load balancer, but there is only one central database server. When the number of visits continues to increase, more WEB servers are connected to meet the needs of high-concurrency access, but the database servers cannot continue to increase. Therefore, you need to complete the business logic on the WEB server as much as possible to avoid consuming Database Server resources.

My thoughts on these two concerns are:

Problem 1: The stored procedure is a compiled SQL statement, and the binary code in the memory does not consume too much memory. In addition, stored procedures are much more efficient than SQL statements. In other words, this is a "space for Time" solution. It is worthwhile to consume more memory to improve efficiency.

Solution 2: first, it is easier to implement the business logic in the stored procedure than in the application. Second, in terms of development efficiency, the development of stored procedures is simpler than that of applications (for the same logic ). In a high-traffic system, the allocation of application server and database server resources should start from the perspective of cost: the cost in software development, labor costs are much higher than hardware costs. We can easily purchase better servers with money, but it is difficult for developers to greatly improve their programs.

Using Stored Procedures to encapsulate business logic saves a lot of development time and debugging time, and greatly improves the code quality. Therefore, the stored procedure should be used at the cost.

The simplest way to deal with large traffic volumes is to invest more hardware costs: faster hard disks, larger memory, more CPUs, and better NICs.

Second, at the application layer, a large number of static file caching methods can be used to reduce the pressure on the database. For example, you can read the infrequently changed information from the database server and save it as an XML static file on the application server.

If this is not the case, the system should be designed to be distributed based on the possible access volume at the beginning of the system design. This will fundamentally solve the problem of large traffic volumes.

4.2 naming rules

1. the prefix of the stored procedure is similar to the prefix of the table name. A series of tables are treated as an object, fields are the attributes of the object, and the stored procedure is the method of accessing the object. For example, the stored procedure of adding a user is named User_AddUser.

2. The stored procedure is named by the module prefix. For example, user-managed stored procedures use the prefix user _.

3. After the prefix of the stored procedure, it is the stored procedure name in the form of Verb + noun (or verb phrase ).

4.3 parameter naming of Stored Procedures

1. The parameter name adopts the Hungarian naming method and the type prefix.

2. Each stored procedure has two output parameters: @ errno int and @ errmsg varchar (255. The stored procedure execution can be obtained based on the two parameters in the application. (These two parameters use the default value, which can be ignored)

Error code with errno being an integer. If the error code is executed successfully, 0 is returned. The specific meaning of the Errno value is described through the errmsg parameter, or through comments or documents in the code.

Errmsg is a string description of the error message. This parameter is mainly used in the debugging period as a description to avoid using this value in the application. At the same time, pay attention to the impact of the language selection of information on the program in both the English and Chinese versions.

4.4 set of records returned by the Stored Procedure

1. Output record set of the stored procedure: It is recommended that only one record set be returned for the stored procedure because the program structure is clear. However, in some cases, multiple record sets can be output to improve performance.

2. In the record set, the alias of each output field is specified at the end, and the real field name information is lost to the client, thus increasing the possibility of hackers finding system vulnerabilities.

4.5 format conventions

1. All SQL keywords are capitalized

2. Use good variable naming rules

3. Maintain a good structure, including blank lines, indentation, and spaces.

4. For block statements, you must write BEGIN... END

5. Add detailed notes at the beginning of each stored procedure, including the stored procedure name, parameter description, function description, returned dataset description, and author and copyright statement.

6. set nocount on and set nocount off must be added before and after the code in each stored procedure.

7. Example of Stored Procedure format:

/*************************************** ******/

* Name: User_AddUser

* Purpose: Add a user to system

* Parameters: @ strUserName the user's login name

* @ StrPwd the user's password, encrypted by MD5

* @ Errno the error number, it's a "output" param

* @ Errmsg the error message, it's a "output" param

* Returns: the user's ID, as a result set.

* Author: ahfu, ah_fu126@hotmail.com

* Create At: 2005-03-18 Last Modify: 2005-03-19

/*************************************** ******/

Create procedure User_AddUser

(

@ StrUserName varchar (20 ),

@ StrPwd varchar (50 ),

@ Errno int = 0 OUTPUT,

@ Errmsg varchar (255) = NULL OUTPUT

)

BEGIN

SET NOCOUNT ON

/* The following is the Stored Procedure Code */

SET NOCOUNT OFF

END

4.6 transaction usage conventions

1. If you perform more than one change operation on the record, you need to write these operations in the transaction.

2. The transaction must be committed or canceled explicitly.

4.7 cursors

1. Do not use a cursor unless necessary

2. The performance of a stored procedure containing a cursor must be carefully tested.

5 Database Design Specifications 5.1 data integrity specifications (coding period)

1. You can add as many constraints (check) when designing the database to facilitate error checking during the coding period of the program ). For example, the value range of an integer field is usually field> 0.

2. Similarly, Use Triggers during development to verify data integrity.

3. If redundant fields exist, you should write a trigger to manage redundant fields.

3. Constraints for saving the complete primary key, foreign key, and unique index during the development phase.

4. Principle: Data integrity takes precedence over performance during encoding. Improve the efficiency as much as possible while ensuring the correct operation of the system.

5.2 design tools and version Division

3. Design the database with ER-WIN while designing the logical view and physical view (it can also be a tool such as PowerDesigner to express the relationship between data tables through ER graphics)

4. Make sure that the design document and physical table structure are consistent during the development process.

5. databases are available in two versions: the development phase and the runtime phase. The development phase mainly reflect the complete constraints and the runtime version reflects the efficiency optimization.

6 database optimization 6.1 database performance optimization specifications (runtime)

1. Delete unnecessary constraints (check) during the running stage ).

2. Do not use triggers whenever possible

3. retain primary key constraints as much as possible

4. Delete Foreign keys appropriately to improve performance

5. During operation, you can analyze the system access volume and create indexes to optimize the performance.

6. analyze the possible data growth volume of each table and define automatic table sharding rules. Split large tables to improve performance.

7. Pre-consider data clearing rules: When to delete old data in the database to improve performance.

8. Develop database backup and disaster recovery plans.

9. For efficiency considerations, you can add redundant fields or redundant tables during system testing.

10. Paging record output must be implemented through the stored procedure. API cursors cannot be used for paging, which improves paging efficiency.

6.2 split table example

Case: the website has 2 million users, and many modules provide services for users.

To improve efficiency, each table can only store up to 0.1 million user-related records, and 2 million records can be split into 20 tables. If the number is 1-, the record is saved to table 1, and the number between-is saved to table 2, and so on.

Create a split information table, which tables are saved in the table, to what extent, and the sharding rules.

When a record is inserted, first determine the user ID of the record to be inserted. The stored procedure automatically inserts the table into the corresponding split table according to the ID range.

When the query is performed based on conditions, the stored procedure automatically connects to all split tables and filters records from the tables. (Generally, queries of the same type are much larger than all queries according to the condition)

6.3 redundant field creation example

Case: In this message table, the user ID is saved as the foreign key. Generally, you can connect the message table to the user table to determine which user posted the message.

To improve efficiency, add the username field in the message table. When you insert a record, you can save both the user ID and user name. In this way, when querying, you do not have to connect two tables, greatly improving the efficiency.

However, when a user modifies a user name, do you want to update the user name in other tables? Do you want to ignore the impact of such user name inconsistency. How the user name is processed depends on the importance of the user name in the module.

6.4 redundant table creation example

Case: The user table and group table are used. The two tables have many-to-many relationships. This relationship is achieved by establishing a user-group relationship table.

The User table contains millions of records and thousands of records in the group table. If each user belongs to multiple groups, millions of records will exist in the associated table.

The User table and associated table are split to the user ID range. When querying a user's group, the efficiency is greatly improved. However, when you query users in a group, you need to associate all the sharding tables with low efficiency.

To improve efficiency, create a redundant user-group relationship table, which stores the unified content of the first relational table, but the sharding rule is the range of group IDs. In this way, when users in the query group are queried in the second relational table, the efficiency is greatly improved.

6.5 paging solution in Stored Procedures

Solution 1:

1. Calculate the number of records that meet the conditions.

2. Define table variables: the first field of the table variable is the auto-increment type, and the second field is the unique value field in the record set (usually the primary key)

3. Use the insert () select statement to save the unique value fields of the matching records in the table variables.

4. Use where ID in (select ID From Table variable WHERE ......) To read the required unique value fields from both sides of the table.

Solution 2:

1. Calculate the number of qualified records and calculate the number of pages based on the page size.

2. If you read the first page, use the TOP clause to directly read

3. If the first half of the page number is:

Result set 1: select top CurPage * PageSize Fields FROM Table ORDER BY ID ASC

Result set 2: select top PageSize * FROM (result set 1) ORDER BY ID DESC

Final Result: SELECT * FROM (result set 2) ORDER BY ID ASC

4. If the page number is in the last half:

Result set 1: select top (PageCount-CurPage) * PageSize Fields FROM Table ORDER BY ID DESC

Final Result: select top PageSize * FROM Table ORDER BY ID ASC

7. Distributed Design Example

Hainan population management information system:

Based on the data distribution characteristics of the Population Management Information System, the Oralce database system is used as the basis of the distributed system based on the distributed system technology of the customer/server architecture, A Distributed Database Processing Solution for the Population Management Information System is designed and implemented.

1. database data features and Distributed Processing Analysis

Generally, the Population Management Information System database stores and manages the province's population data. The data volume is large, the region coverage is wide, data entry and access points are scattered, and the data processing regional distribution is strong, traditional central database management methods are difficult to manage large databases across regions. Therefore, distributed databases can be used to store and manage population information data.

According to the business characteristics of Hainan Special Economic Zones, the population is small, and municipal branches do not participate in population information management. Therefore, the provincial/municipal/police station level-3 model can be used for population information data management in Hainan. In addition, because of its scattered business offices, the Population Management Information System needs to run in a distributed manner. According to the principle of venue autonomy designed by the distributed database system, at least one database should be created on each independent node where data is stored, and multiple databases can be created in the same location, to meet different application needs. Based on this, multiple physical databases are established in the provincial, municipal, and provincial departments, and they are linked through the network and database links to form a unified global database.

2 Distributed Database Design

Database Design Principles

The main task of the system is to obtain the data required by population management. The data includes basic population information, ID card information, image information, code information, and tail code distribution information. There are many demographic information data items, many data tables and dictionary tables are involved. Therefore, when designing a database, you must establish corresponding data tables and Constraints Based on the relationship between the information. A good data organization structure and database should be established to store and manage a large amount of data, so that the entire system can quickly, conveniently, securely and accurately call and manage data. In the specific design process of PMIS databases, the following design principles are adopted:

(1) adopt an open multi-layer Distributed Data access structure;

(2) 3NF meeting the relational database theory, while appropriately retaining data redundancy for both convenience of programming, thus improving the efficiency of the entire system;

(3) Make full use of the data dictionary function.

Distributed Database System Structure

The entire system uses the customer/server architecture and standard SQL for data access to achieve Distributed Data Processing. Each node is responsible for maintaining the integrity of the local database. Through the distributed database management system, allows you to dynamically connect to and distribute databases on each node. The population information data of each city is stored in the database of the municipal Bureau, and the database links of the provincial departments and departments are established to the databases of each Municipal Bureau respectively. The provincial departments and departments query the data through view creation or snapshots, city Bureau and police station users directly access the data in the city Bureau database server. In addition, different police stations cannot directly perform data operations on each other, and cross-Office information query is completed through views or snapshots on the municipal Bureau database, for cross-city population information query, you can create a table name or view name in the provincial/municipal bureau and create a synonym for the table name or view name. In this way, the table name or view name of the remote database and other corresponding synonyms can be used to achieve transparent access to data, and regular copying through snapshots to achieve non-real-time access to remote data. The topology of the distributed database system is shown in Figure 1.

The system adopts a centralized-distributed database storage solution. The data is shared between different cities and Bureaus through a leased line, which forms the population of the province.

Distributed Data solution. The database operation of the police station can directly access the local municipal bureau database or download local data related to the exchange through remote code pulling. After local processing, it is uploaded to the corresponding Municipal Bureau server, the Municipal Bureau reports the additions, deletions, and changes made by the police stations to the municipal Bureau database through the data update module in the background. For remote data access, you can use the distributed database technology provided by Oracle through the Public Security computer network to achieve transparent access to remote databases. Police stations, municipal bureaus, and provincial departments share population information across the province through distributed databases between municipal departments.

Data Replication Technology

The Population Management Information System of Hainan Province is a distributed data system. database servers are distributed across provincial departments and municipalities. The provincial/municipal database servers store and manage the population information of the province, and the servers of various municipal bureaus store and manage the population information of the city. database servers distributed all over the city need to exchange information in a timely and effective manner, provincial/Municipal database servers are connected to municipal servers through leased lines for real-time communication. Because the storage and management of distributed databases are distributed at multiple points, it is critical to ensure data integrity of each database server. The system uses Oracle Data replication technology. By establishing a database link, configuring the main group, the system regularly copies the population data of the databases of different municipalities to the provincial/municipal departments database server. In addition, the replication technology is also used to distribute and process loads in a distributed system, so that flow staff can contact the primary data site.

Distributed Data Query and Data Update Processing

Access to tables or views of a remote database is performed using SELECT, INSERT, UPDATE, DELETE, and other statements on the database link. The database link defines a path pointing to a remote database, which is created by DBA, application developer, or network service. It is transparent to users of distributed databases.

When accessing data, if the data to be accessed comes from the same table, you can directly access the table or view or snapshot Based on the table as needed; if the data to be accessed comes from different data tables, you can use a connection operation or a corresponding view. The source of each data item in the view is as follows: one or more tables from the same database; two or more tables from different databases on the same node; two or more tables from databases on different nodes. To query Provincial/Municipal Data, you must first create a data snapshot for each city and Bureau, and then query the snapshot.

For data updates at the same node, the local data table is directly accessed by the municipal Bureau and the police station. For data updates at different nodes, the local data table can be directly accessed, you can also create read/write views for remote tables. You can directly access these views to achieve some complex data access.

8. This chapter describes the database design skills and an application example. The use of underlines in naming rules is no longer recommended by Microsoft. Please use it as needed. An example of the Population Management System is from a paper. There are other instances on the Internet, but this is a good one I have seen. Many standards in this article are practical. For example, select * cannot appear because the query volume is too large and maintenance is not good. For example, add new fields in the future. For images and videos, the query results can be imagined. You cannot directly write SQL statements in the program, so as to better maintain the code in the future. Instead of viewing the source code, you only need to modify the set variables. The use of stored procedures is now more widely used, and the famous PetShop program has used the example of stored procedures.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Take out the Reading Notes of distributed databases in college

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Take out the Reading Notes of distributed databases in college

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support