Distributed database at university Reading notes take out to share

Source: Internet
Author: User
Tags one table cpu usage


Chapter Two
I. Design of distributed database System 1 Database Design Overview

Database design is to construct the optimal database schema for a given application environment, establish the database and its application system so that it can store the data effectively.
Basic Steps for Database design (2.1):

Demand analysis

Conceptual structure Design

Logical Structure Design

Physical Structure Design

Establishment and testing of databases

Database operation and Maintenance.

Figure 2.1

Design description for each stage of the database, 2.2

Figure 2.2

2 Naming Conventions 2.1 naming general rules

1. All names have a range of characters: A-Z, A-Z, 0-9 and _ (underscore). Other characters are not allowed to be used as names.

2, using English words or English phrases (including abbreviations) as the name, you can not use meaningless characters or Hanyu Pinyin.

3, the name should be clear, can accurately express the meaning of things, it is best to read, follow the "see the name of the" principle.

2.2 Table Naming conventions

1, do not use Tab or TBL as a table prefix (originally a table, why also explain)

2. The table name consists of one and more nouns representing the contents of the table, separated by an underscore, and capitalized with the first letter of each noun.

3. Use the table's content classification as the prefix for the table name: For example, the table related to user information uses the prefix user_, and the content-related information uses the prefix content_.

4. After the table prefix, it is the description of the specific contents of the table. For example: The user login information table name is: User_login, the user in the Forum information table name is: User_bbs_info

5, some as a multi-to-many connection table, you can use the prefix of two tables as the table name:

such as: User Login table User_login, user Group table Group_info, these two tables establish a many-to-many relationship table named: user_group_relation

6, when there are some small, repeating values in the system, use the dictionary table to save storage space and optimize queries. such as the name of the user type in the region and system. Such values do not change during the program's run time, but need to be stored in the database.

As far as the region is concerned, if we want to query a region's records, the database needs to be queried by string matching, and if the code for a region is saved in a table and queried by the code of the region, the efficiency of the query will be greatly improved.

It is advisable to use a dictionary table to represent such values in a program. The dictionary table holds the names of such values and the collection of entities, in a way that is associated to a table that uses such values. However, in the coding stage, the programmer does not use the dictionary table, because first querying the dictionary table entity's code name, violates the original intention of improving the query efficiency. The programmer, with the help of a data dictionary, directly uses the code name to represent the entity, thus improving efficiency.

Although the dictionary tables are not actually used in practice, they should remain in the database (at least during the development period). The dictionary table appears as another form of "Data dictionary document" to illustrate which of the tables in the database are using the dictionary table.

In order to improve the data integrity of the database, the complete dictionary table and the foreign key constraints of the normal table can be preserved during the development phase. However, in the run phase of the database, the foreign keys of the normal table and the dictionary table should be deleted to improve the efficiency of operation, especially when some tables use many dictionary tables.

Case: There is millions user information in a database, the application system often needs to query the user's information according to the region. The User information table was previously saved according to the specific region name, and now the specific name is changed to the region code in the dictionary table, the query efficiency greatly improved.

The dictionary table is uniformly prefixed with dic_.

2.3 Field Naming conventions

4, the field does not use any prefix (the table name represents a namespace, the field preceded by a prefix appears wordy)

2, the dictionary name also avoids too common too simple name: For example, user table, User Name field username is better than name.

5, the Boolean field, with some auxiliary verbs to start, more directly vivid: for example, whether the user has a message hasmessage, whether the user through check ischecked and so on.

6, the field name is English phrases, adjectives + nouns or auxiliary verbs + verb tenses in the form of expression, mixed case, follow the "see the name of understanding" principle.

3 SQL statement Specification

1. Do not allow to write select * from ..., you must indicate the specific fields that need to be read.

2. Do not allow direct write SQL statements in application code to access the database.

3. Avoid writing SQL statements that are too long in a row, and dividing the SQL statements into multiple rows is clearer where the SQL keyword is.

such as: SELECT userid,username,userpwd from User_login WHERE areaid=20

Modified to:

SELECT userid,username,userpwd

From User_login

WHERE areaid=20

More intuitive

4, in some block form of SQL statements, even if there is only one line of code, but also to add the begin ... End block.

such as: If EXISTS (...)

SET @nVar = 100

Should be written as:

IF EXISTS (...)

BEGIN

SET @nVar = 100

END

5. The empty lines and indents of SQL batch statements are consistent with the general structured programming language and should be kept in good code format.

6. All SQL keyword capitalization

4 Stored Procedure Encoding specification 4.1 allows applications to access the database through stored procedures only

Allows the application to access the database only through stored procedures, without allowing direct SQL statements to be accessed directly from the code.

In a database development project, there are many benefits of using stored procedures in large numbers, first looking at Microsoft for information:

The advantages of using a stored procedure in SQL Server without using a Transact-SQL program stored locally on the client computer are:

Allows for modular programming: Simply create the procedure once and store it in the database, and you can call the procedure any time in the program. Stored procedures can be created by people with expertise in database programming and can be individually modified independently of the program source code.

Allow faster execution: If an operation requires a large amount of Transact-SQL code or needs to be repeated, the stored procedure will be faster than the Transact-SQL batch code execution. It will be parsed and optimized when the stored procedure is created, and the in-memory version of the procedure can be used after the first execution of the procedure. Each time a Transact-SQL statement is run, it is sent repeatedly from the client and compiled and optimized every time SQL Server executes the statements.

Reduced network traffic: an operation that requires hundreds of lines of Transact-SQL code is implemented by a separate statement that executes the procedure code without the need to send hundreds of lines of code across the network.

Can be used as a security mechanism: even for users who do not have permission to directly execute statements in the stored procedure, they can be granted permission to execute the stored procedure.

In addition, the benefits of using stored procedures include:

1. Logically, the stored procedure separates the application layer from the physical structure of the database. A stored procedure forms an interface between an application and a database. Such interfaces abstract complex database structures and conform to the idea of "interface-based programming" in extreme programming.

2. Encapsulating the main business logic in a stored procedure avoids writing large amounts of code at the application layer (SQL statements that insert too long into an application are inefficient and difficult to maintain). Helps improve the efficiency of development and debug stored procedures directly in Query Analyzer, enabling earlier discovery of logical problems in the system, thus improving the quality of the code.

3, in the site-class application system, SQL injection type of vulnerability has been difficult to completely eliminate the vulnerability. This type of security can be greatly reduced if the database is accessed only through stored procedures. (therefore, even a simple SQL statement should be written as a stored procedure.) )

4, because of the use of stored procedures, the level of the application can not be concerned about the specific database structure, but only concerned about the stored procedure interface calls. Therefore, the advantages of a stored procedure are obvious in the following situations:

• Changes in requirements, the structure of the table must be changed. With stored procedures, as long as the parameters are the same, we only need to modify the corresponding stored procedures without modifying the application's code. Such a design would reduce the impact of changes in requirements on the project.

• For increased efficiency, partial field redundancy: Some frequently accessed fields, we can make redundant storage in the related tables. This improves efficiency and masks redundant details through stored procedures.

• To improve efficiency, use redundant tables (split tables): Some large tables, in order to improve query efficiency, you may need to save records to more than one table. Using stored procedures, there are stored procedures that determine which of the split tables to get or insert data from. This improves efficiency without having to care about specific split rules at the application level.

5, the use of stored procedures, to facilitate the project later or in the operation of centralized optimization of system performance. In the project development process, due to various reasons, often can not write efficient code, this problem is often in the late phase of the project or during the run time to reflect. Through the storage procedure to encapsulate the access to the database, after the project integration, through the trial operation to observe the efficiency of the system, so it is easy to identify the bottleneck of the system, and can optimize the code of the stored procedure to improve the system's operational efficiency. This optimization is more efficient and easier than optimizing in the application process.

At the same time, excessive use of stored procedures, there are some of the following concerns:

Problem one: After a stored procedure is compiled, it is saved as a global object of the database, and too many stored procedures consume a large amount of memory on the database server.

Problem two: Implementing a large amount of logic in a stored procedure will allow a large number of operations to be done on the database server, rather than on the application server. When the traffic is large, the CPU usage of the database server is greatly consumed.

There is also this case: there is a site with a huge amount of traffic, and multiple Web servers make up a load-balanced server cluster, but only one central database server. When the traffic continues to increase, access to more Web servers to meet the high concurrency of access, but the database server has not been able to increase. Therefore, it is necessary to complete the business logic on the Web server as far as possible to avoid consuming the resources of the database server.

For these two worries, my idea is:

Problem one: Stored procedures are compiled SQL statements that are binary code in memory and do not consume too much memory. Also, stored procedures are significantly more efficient than directly using SQL statements. In other words, this is a "space-for-time" scheme, it is worthwhile to consume a bit more memory for efficiency improvement.

Problem two: First, it is easier to implement the business logic in the stored procedure than in the application, and secondly, from the development efficiency, the development of the stored procedure is simpler than the application (in the same logic). In high-traffic system, the problem of resource allocation of application server and database server should be open from the angle of cost: The cost of software development, the cost of labor expenditure is much higher than that of hardware expenditure. We can easily afford to buy better servers, but it's hard to spend money on developers making the program a lot more expensive.

Using stored procedures to encapsulate business logic first saves a lot of development time and debugging time, and can greatly improve the quality of your code. Therefore, in terms of cost, stored procedures should be used.

For large traffic, the simplest way is to invest more hardware: faster hard drives, larger memory and more CPUs, better NICs, and more.

Second, at the application level, you can use a large number of static file caching methods to alleviate the pressure on the database. such as: Infrequently changing information, can be read from the database server, save as an XML static file on the application server, and so on.

If not, the system should be designed in the first place, considering the possible traffic, and the system should be distributed. This can fundamentally solve the problem of large traffic.

4.2 Naming conventions

1. The prefix of the stored procedure is similar to the prefix of the table name: Consider a series of tables as an object, a field as an object, and a stored procedure as a method of accessing an object. For example: Add a user's stored procedure named: User_adduser

2. The stored procedure is named using the prefix of the module. For example, user-managed stored procedures use the prefix user_.

3, the stored procedure prefix, is the verb + noun form of the stored procedure name (also can be a verb phrase).

4.3 parameter naming of stored procedures

1, the parameter name uses the Hungarian name law, uses the type prefix

2, each stored procedure has: @errno int and @errmsg varchar (255) Two output parameters. In the application, the stored procedure can be executed according to these two parameters. (These two parameters use default values, which can be ignored)

errno is an integer error message code that executes successfully returns 0. The specific meaning of the errno value is explained by the ErrMsg parameter, or through comments or documents in the code.

ErrMsg is a string description of the error message, which is used primarily as a description for the debug period and avoids using the value in the application. At the same time, pay attention to the English version of the system and the Chinese version of the system, the language selection of information on the impact of the program.

4.4 Record sets returned by stored procedures

1, the output recordset of the stored procedure: for the structure of the program is clear, the stored procedure is best to return only one recordset. However, in some cases to improve performance, it is possible to output multiple recordsets

2, the record set, each output field finally specifies the alias of the field, to the real field name information loss to the client, thereby increasing the potential for hackers to find a system vulnerability.

4.5 Format Conventions

1. Capitalize all SQL keywords

2, the use of good variable naming norms

3, maintain a good structure, including blank lines, indentation and space, and so on.

4, the block statement, must be written on the begin ... END

5. Add a detailed comment at the beginning of each stored procedure: including the stored procedure name, parameter description, feature description, return data set description, and author and copyright notice.

6, the code inside each stored procedure must add set NOCOUNT on and set NOCOUNT OFF before and after.

7. Examples of stored procedure formats are as follows:

/*********************************************/

* Name:user_adduser

* Purpose:add a user to System

* Parameters: @strUserName The user ' s login name

* @strPwd The user ' s password, encrypted by MD5

* @errno the error number, it ' s a "output" param

* @errmsg the error message, it ' s a "output" param

* returns:the user ' s ID, as a result set.

* Author:ahfu, [email protected]

* Copyright:copyrights by Synet 2005.All rights Reserved.

* Create at:2005-03-18 last modify:2005-03-19

/*********************************************/

CREATE PROCEDURE User_adduser

(

@strUserName varchar (20),

@strPwd varchar (50),

@errno int = 0 OUTPUT,

@errmsg varchar (255) =null OUTPUT

)

As

BEGIN

SET NOCOUNT on

/* The following is the code for the stored procedure */

SET NOCOUNT OFF

END

4.6 Conventions on the use of things

1. If you change the record more than once, you need to write these actions in the transaction.

2. The transaction must be explicitly committed or canceled.

4.7 Cursor Usage Conventions

1, if not necessary, do not use cursors

2. A stored procedure containing cursors must be carefully tested for performance.

5 Database Design Specification 5.1 Data integrity Specification (encoding period)

1, in order to facilitate the coding period of the program to check the wrong, you can design the database as much as possible with the constraints (check). For example, the value range of the field of integer type, etc., is often field>0.

2. Similarly, use triggers as much as possible during development to verify the integrity of the data.

3. If there is redundancy between the fields, you should write a trigger to manage the redundant fields

3, in the development phase to save the full primary key, foreign key and unique index constraints.

4. Principle: During encoding, data integrity takes precedence over performance. As far as possible to improve the efficiency of the system to ensure the correct operation.

5.2 Design tools and version partitioning

3, design the database with Er-win, design the logical view and physical view (also can be a tool such as PowerDesigner, show the relationship between data table through ER graph)

4, the development process as far as possible to ensure that the design documents and physical table structure consistent changes

5, the database development period and the running period of two versions: the development period version mainly embodies the complete constraint, the operation period version embodies the efficiency optimization.

6 Database Optimization 6.1 Database performance Optimization specification (run time)

1. Remove unnecessary constraints (check) at run time.

2. Try not to use triggers

3, try to retain the primary key constraints

4, appropriate deletion of foreign keys to improve performance

5, during operation, by analyzing the system's traffic, create indexes to optimize performance

6, analyze the possible data growth of each table, define the rules of automatic splitting table. The large table is split to improve performance.

7. Pre-consider data cleansing rules: in which case the old data in the database is deleted to improve performance.

8. Develop a database backup and disaster recovery plan.

9. For efficiency consideration, you can add redundant fields or redundant tables appropriately during the system testing phase.

10, paging records output must be implemented through a stored procedure, you can not use API cursors to page, so as to improve the efficiency of paging.

6.2 Example of a split representation

Case: The site has 2 million users, there are many modules around users to provide services.

To increase efficiency, each table holds a maximum of 100,000 records related to the user, and 2 million records are split into 20 tables. A user numbered 1.1 million saves records to table one, 100001-200000 numbered records to table two, and so on.

Create a split information table that holds which tables are split, how far apart, and what the split rules are.

When inserting a record, first determine the ID of the user who inserted the record. The stored procedure automatically inserts the table into the corresponding split table based on the scope of the ID.

When queried by criteria, the stored procedure automatically joins all the split tables, filtering out the records in the bundle. (In general: Queries of the same type are far greater than the whole of the conditions of the query)

6.3 Example of redundant field setup

Case: Message in this table, you want to save the user's ID as a foreign key. Usually, by connecting the message table and user table to know which user posted the message.

In order to improve the efficiency, in the message this table to build the user name field. When inserting records, save both the user ID and the user name. This way, when querying, you do not have to connect two tables, which makes the efficiency much more efficient.

However, when the user modifies the user name, do you want to update the user name in the other table, ignoring the effect of this inconsistent user name. How it is handled depends on how important the user name is in the module.

6.4 Example of redundant table building

Case: There are user tables and grouped tables, two tables are many-to-many relationships, and a user-group relationship table is established to achieve this relationship.

There are millions records in the user table, thousands of records in the group table. If each user belongs to more than one group, there will be millions of records in the associated table.

The user table and the associated table are now split, splitting the rule to the user's ID range. When querying a user's group, the efficiency is greatly improved. However, when querying a group of users, it is inefficient to correlate all the split tables.

For efficiency, create a redundant user and Group relationship table that holds the unified content from the first relational table, but the split rule is the range of the group ID. In this way, when querying a group of users, the cluster second relational table query, the efficiency greatly improved.

6.5 Paging scenarios in stored procedures

Programme One :

1, first statistics to get the number of records matching the conditions

2. Define table variables: The first field of a table variable is a self-growing type, the second is a unique value field in the recordset (typically a primary key)

3. Use the Insert () SELECT statement to save the unique value field of a qualifying record in a table variable.

4. Use the Where ID in (the Select ID from table variable WHERE ...) method to read the required unique value fields from both sides of the table.

Scenario Two :

1. First count the number of records that match the criteria and count the pages based on the page size

2, if read the first page, directly using the TOP clause read

3, if the number of pages in the first half:

Result set 1:select TOP curpage*pagesize fields from Table ORDER by ID ASC

Result set 2:select TOP PageSize * FROM (Result set 1) ORDER by ID DESC

Final Result: SELECT * FROM (Result set 2) ORDER by ID ASC

4, if the number of pages in the latter half:

Result set 1:select TOP (Pagecount-curpage) *pagesize fields from Table ORDER by ID DESC

End result: SELECT TOP PageSize * from Table ORDER by ID ASC

7 Examples of distributed design

Hainan Province Population Management Information System:

According to the characteristics of data distribution of population management Information System, distributed system technology based on client/server architecture and Oralce database system as the basis of distributed system, the distributed processing scheme of population management Information system database is designed and implemented.

Data characteristics and distributed processing analysis of 1 database

In general, the population Management Information system database on the province's population data storage and management, its large data volume, geographical coverage, data entry and access points scattered, the geographical distribution of processing, the traditional central database management method is difficult to handle large-scale database management across the region, Therefore, the distributed database can be used to store and manage population information data.

Combined with the business characteristics of Hainan Special Economic Zone, the population size is small, the municipal branch offices do not participate in population information management, so the Hainan Province population Information Data management can use the provincial Hall-municipal Bureau-police station level three model; Moreover, because of the decentralized business location, the population management information system needs to adopt a distributed operation mode. According to the principle of distributed database system design site autonomy, at least one database should be established on each independent node where data is stored, and multiple databases can be set up in the same location to meet different application needs. In this way, a number of physical databases are established in the provincial and municipal offices, which are linked together through network and database links to form a unified global database.

2 Design of distributed database

Database design Principles

The main task of the system is to obtain the data needed for population management, which includes population basic information, identity card information, image information, code information, tail code distribution information, population information data items, and many data tables and dictionary tables, so in the design of the database, according to the relationship between the information, Set up the corresponding data table and the constraints between each other. Due to the storage and management of large amounts of data, a good data structure and database should be established so that the whole system can invoke and manage data quickly, conveniently, safely and accurately. In the PMIs database design process, the following design principles are adopted:

(1) Adopt an open multi-layer distributed data access structure;

(2) satisfying the 3NF of relational database theory, and preserving the data redundancy appropriately for the convenience of programming, thus improving the efficiency of the whole system;

(3) Make full use of the data dictionary function.

Distributed Database System architecture

The whole system uses the client/server architecture, data access with standard SQL, data distributed processing, each node is responsible for maintaining the integrity of the local database, through the Distributed database management system, to achieve the dynamic connection and distribution of each node database. The city population information data is stored in the City Bureau database, in the provincial Office database is set up to the municipal Bureau of the database link, the provincial office through the view creation or snapshot query these data, the city Bureau and police station users directly access the data of the Municipal Bureau database server. In addition, the police can not directly between the data interoperability, cross-information query through the City Council database on the view or snapshot completed, cross-city population information query through the City Bureau to establish the database table name or view name and other corresponding synonyms completed. This enables transparent access to data using the appropriate synonyms, such as the table name or view name of the remote database, which is periodically replicated through the snapshot for non-real-time access to remote data. The distributed database system topology is shown in 1.

The system adopts the centralized-distributed database storage scheme, which is shared by the Municipal bureau through leased line data, and eventually forms the province's population

Distributed solutions for data. Police station database operation through the remote dial can directly access the local city Council database or the relevant data download local, in the local processing, and then uploaded to the corresponding municipal council server, by the city bureau through the background of the data Update module to the police stations to do the increase, delete, change, etc. reflected in the City Council database. For off-site data access, the distributed database technology provided by Oracle can be used to realize transparent access to remote database through public security computer networks. The police station, the city bureau, the provincial hall through the city bureau's distributed database realizes to the province population information to share.

Data replication Technology

Hainan Province Population Management Information System for distributed Data Systems, database server multipoint distribution in the provincial office and the Municipal Councils. Provincial Office Database Server Storage Management population information of the province, the city's server storage management of the city's population information, all over the database server needs to exchange information in a timely and effective manner, the Provincial Office database server through the dedicated line and the municipal servers connected to the real-time communication. Because the distributed database storage and management distributed in multipoint, how to ensure the integrity of data information of each database server is a key issue. The system uses Oracle's data replication technology, through the establishment of database links, configuration of the main group, the population data information of the Municipal bureau database is regularly copied to the provincial Office Database server. In addition, replication technology is used to distribute processing load in distributed systems and to enable mobile workers to connect with the master data site.

Distributed data query and Data update processing

Access to the table or view of the remote database, through the database link using SELECT, INSERT, UPDATE, delete and other statements to the data. Where a database link defines a path to a remote database created by a DBA or application developer or network service, it is transparent to the users of the distributed database.

In the case of data access, if the data to be accessed comes from the same table, direct access to the table or view or snapshot based on that table is required, and if the data to be accessed comes from a different data table, it can be achieved through a connection operation or a corresponding view. The data items in the view are sourced from one or more tables in the same database, from two or more tables in different databases from the same node, and from two or more tables in the database on different nodes. To query the data for the province hall, first set up a snapshot of the data for each city board and then query the snapshot.

For the data update of the same node, because the municipal bureau and police station can access directly, so directly access the local data table, for the data update of different nodes, can directly access the local data table, also can create a read-write view of the remote table, the user directly access these views to achieve some complex data access.

8 Summary This chapter mainly introduces the technique of database design and an application example. The use of underscores in the naming conventions is now deprecated by Microsoft. Please use it according to actual usage. An example of a population management system comes from an essay. There are other examples on the Internet, but this is one of the better things I've seen. Many of the standards in this paper are practical. For example, select * may not appear because the query volume is too large and is not well maintained. For example, add a new field later, if it is an image, video, the query effect is conceivable. In the program can not directly write SQL statements, in order to better maintain the code later. Instead of having to look at the source code, you just need to modify the set of variables. The use of stored procedures is now much more, and the famous PetShop program already uses examples of stored procedures.

Distributed database at university Reading notes take out to share

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.