Distributed database in university reading notes take out share __ database

Source: Internet
Author: User
Tags one table sql injection cpu usage server memory


Chapter Two
I. Design of distributed Database System 1 Database design Overview

Database design refers to the construction of an optimal database schema for a given application environment, the establishment of a database and its application system, so that it can effectively store data.
Basic Steps for database design (Figure 2.1):

Demand analysis

Conceptual structure Design

Logical Structure Design

Physical Structure Design

Establishment and testing of databases

Database operation and Maintenance.

Figure 2.1

Design descriptions for each phase of the database, as shown in Figure 2.2

Figure 2.2 2 naming conventions 2.1 name Total rule

1, all the names of the range of characters: A-Z, A-Z, 0-9 and _ (underline). You are not allowed to use other characters as names.

2, the use of English words or English phrases (including abbreviations) as a name, not using meaningless characters or Hanyu Pinyin.

3, the name should be clear, can accurately express the meaning of things, it is best to read, follow the "see known meaning" principle. 2.2 Table naming conventions

1, do not use Tab or TBL as a table prefix (is a table, why also want to explain)

2. The table name consists of one and more nouns representing the contents of the table, separated by an underscore, with the first letter of each noun capitalized.

3. Use the table's content classification as the prefix for the table name: for example, tables related to user information use prefix user_, and content-related information uses prefix content_.

4, after the prefix of the table, is the description of the specific content of the table. For example: The user login information table name is: User_login, the user in the Forum information table name is: User_bbs_info

5, some as a many-to-many connection table, you can use the prefix of two tables as the table name:

such as: User Login table User_login, user Group table Group_info, the two tables to establish a many-to-many relationship table name: user_group_relation

6, when the system has a number of small, recurring values, the use of dictionary tables to save space and optimize the query. such as the region, the system of user type Code, and so on. Such values do not change during the run-time of the program, but need to be stored in the database.

In the case of a region, if we want to query the records of a certain region, the database needs to be queried by string matching, and if the region is changed to the code of a region to be stored in the table, the query through the region's code to query, the query efficiency will be greatly improved.

The program should use a large number of dictionary tables to represent such values. In a dictionary table, a collection of code names and entities that hold such values is associated with a table that uses such values. However, in the coding phase, programmers do not use dictionary tables, because first of all, query the dictionary List of entity code, violates the purpose of improving query efficiency. With the help of a data dictionary, programmers use code names directly to represent entities, thereby improving efficiency.

Although the dictionary table is not actually used, it should remain in the database (at least for the duration of the development period). The dictionary table appears as another form of "Data dictionary document" to show which of the tables in the database use the dictionary table.

In order to improve the data integrity of the database, the complete dictionary table and the foreign KEY constraint of ordinary table can be kept in the development phase. However, in the runtime of the database, the foreign key of the normal table and dictionary table should be deleted to improve the efficiency of the operation, especially when some tables use many dictionary tables.

Case: A database has millions user information, the application system often needs to query the user's information according to the region. The User information table was previously saved according to the specific region name, and now the specific name is changed to the area code in the dictionary table, and the query efficiency is greatly improved.

The dictionary table is uniformly prefixed with dic_. 2.3 field naming conventions

4, the field does not use any prefix (the table name represents a namespace, the field preceded by a prefix appears verbose)

2, dictionary names also avoid the use of too common too simple name: For example, in the user table, the User Name field username better than name.

5, the Boolean field, with some auxiliary verb beginning, more direct vivid: for example, the user whether has the message hasmessage, whether the user passes the examination ischecked and so on.

6, the field name is the English phrase, the adjective + noun or the auxiliary verb + verbs tense form expression, the case mixes, follows "sees the name to know the meaning" the principle. 3 SQL Statement specification

1. Do not allow the write select * from ..., you must indicate the specific fields you want to read.

2, not allowed in the application code directly write SQL statements to access the database.

3. Avoid writing too long SQL statements in one line, and dividing the SQL statements into multiple rows in the SQL keyword places is clearer.

For example, SELECT userid,username,userpwd from User_login where areaid=20

Modified into:

SELECT userid,username,userpwd

From User_login

WHERE areaid=20

More intuitive

4, in some block form of SQL statements, even if only one line of code, but also to add the begin ... End block.

such as: If EXISTS (...)

SET @nVar = 100

should be written:

IF EXISTS (...)

BEGIN

SET @nVar = 100

End

5. The blank lines and indents of the SQL batch statements are consistent with the general structured programming language, and should be kept in good code format.

6. All SQL keywords capital 4 Stored Procedure Code specification 4.1 only allows applications to access the database through stored procedures

Only allow applications to access the database through stored procedures, rather than writing SQL statements directly in code to access the database.

In a database development project, there are many benefits to using stored procedures, first of all, to see Microsoft Provide information:

The advantages of using stored procedures in SQL SERVER without using Transact-SQL programs that are stored locally on a client computer are:

Allows modular programming: simply create the process once and store it in the database, and then call the procedure any time in the program. Stored procedures can be created by people who specialize in database programming and can be modified independently of the program source code.

Allow faster execution: If an operation requires a large number of Transact-SQL code or is repeated, the stored procedure will execute faster than Transact-SQL batch code. It is parsed and optimized when the stored procedure is created, and can be used in an in-memory version of the procedure after the process is first executed. Each time a Transact-SQL statement is run, it is sent repeatedly from the client and is compiled and optimized every time SQL Server executes the statements.

Reduce network traffic: an operation that requires hundreds of lines of Transact-SQL code can be implemented by a separate statement that executes the process code, without the need to send hundreds of lines of code across the network.

Can be used as a security mechanism: even for users who do not have permission to execute statements directly in the stored procedure, they can be granted permission to execute the stored procedure.

In addition, the benefits of using stored procedures are:

1. Logically, the stored procedure separates the application layer from the physical structure of the database. A stored procedure forms an interface between an application and a database. Such interfaces abstract complex database structures and conform to the idea of "interfacing programming" in extreme programming.

2, the main business logic encapsulated in the stored procedures, to avoid the application layer to write a large number of code (in the application of the string inserted too long SQL statements affect efficiency, and maintenance difficulties). It can improve the development efficiency, and debug the stored procedure directly in Query Analyzer, and find the logic problem in the system earlier, thus improving the quality of the code.

3, in the Web site type of application system, SQL injection vulnerability has been difficult to completely eliminate the loophole. This type of security problem can be greatly reduced if the database is accessed only through stored procedures. (therefore, even a simple SQL statement should be written as a stored procedure.) )

4, as a result of the use of stored procedures, the level of the application can not care about the specific database structure, but only concerned about the interface calls stored procedures. Therefore, the advantages of stored procedures are obvious in the following situations:

• Requirements change, the structure of the table must be changed. Using stored procedures, as long as the parameters are unchanged, we just need to modify the corresponding stored procedures, without the need to modify the application code. Such a design would reduce the impact of the requirements change on the project.

• To improve efficiency, make part of the field redundant: some of the frequently accessed fields, we can make redundant storage in related tables. This improves efficiency and masks redundant details through stored procedures.

• To improve efficiency, use redundant tables (split tables): Some large tables, in order to improve query efficiency, you may need to save records to multiple tables. Using stored procedures, there are stored procedures that determine which split tables to get or insert data from. This improves efficiency without having to care about specific split rules at the application level.

5, the use of stored procedures to facilitate the project later or in operation to centrally optimize system performance. In the process of project development, because of various reasons, often unable to write efficient code, this problem is often in the late project or in the runtime reflected. Through stored procedures to encapsulate access to the database, after the project integration, through the trial operation to observe the efficiency of the system, so it is easy to find the bottleneck of the system, and can optimize the code of stored procedures to improve the efficiency of the system. This optimization is more efficient and easier than optimizing in the application process.

At the same time, excessive use of stored procedures, there are some of the following concerns:

Problem one: After the stored procedure is compiled, it will be saved as a global object for the database, and too many stored procedures will consume a large amount of database server memory.

Problem two: Implementing a large amount of logic in a stored procedure will allow a large number of operations to be performed on the database server rather than on the application server. When the traffic is very large, the database server's CPU usage is greatly consumed.

This is also a case in point: There is a massive web site with multiple Web servers that form a load-balanced server cluster, but only a single central database server. When the traffic continues to increase, access to more Web servers to meet the high concurrency of access, but the database server has not been able to increase. Therefore, it is necessary to complete the business logic on the Web server as far as possible and avoid consuming the resources of the database server.

For both of these concerns, my idea is:

One solution to the problem: stored procedures are compiled SQL statements that are binary code in memory and do not consume too much memory. Also, stored procedures are significantly more efficient than using SQL statements directly. In other words, this is a "space for Time" scheme, a little more memory to exchange for efficiency improvement, is worthwhile.

Solution of problem Two: first of all, in the implementation of business logic, the implementation of the stored procedures is easier than in the application, and secondly, from the development efficiency, the development of the stored procedure is simpler than the application (to complete the same logic). In the high traffic system, the problem of resource allocation of application server and database server should be opened from the cost angle: The cost of software development is much higher than the cost of hardware expenditure. We can easily spend money to buy better servers, but it's hard to spend money on developers making the program a big improvement.

Use stored procedures to encapsulate business logic, first saving a lot of development time and debugging time, and can greatly improve the quality of the code. Therefore, in terms of cost, you should use stored procedures.

For large traffic, the easiest way is to put more hardware costs: faster hard drives, larger memory and more CPUs, better network cards, and so on.

Second, at the application level, you can use a lot of static file caching methods to reduce the pressure on the database. For example: Infrequently changing information can be read from the database server, saved as an XML static file on the application server, and so on.

If you can't do it, you should design the system as a distributed system at the beginning of the design, considering possible traffic. This will fundamentally solve the problem of large traffic. 4.2 naming conventions

1. The prefix of the stored procedure is similar to the prefix of the table name: a series of tables is considered an object, the field is the object's property, and the stored procedure is the method of accessing the object. For example: Add a user's stored procedure named: User_adduser

2, stored procedures using the prefix of the module to name. For example, a user-managed stored procedure uses a prefix user_.

3, after the prefix of the stored procedure, is the verb + noun form of stored procedure name (also can be a verb phrase). 4.3 parameter naming of stored procedures

1, the parameter name uses the Hungarian naming method, uses the type prefix

2, each stored procedure has: @errno int and @errmsg varchar (255) Two output parameters. In your application, you can get the execution of the stored procedure based on these two parameters. (These two parameters use the default values and can be ignored)

errno is an integer error message code, and execution successfully returns 0. The specific meaning of the errno value is explained by the errmsg parameter, or by a comment or document in the code.

ErrMsg is a string description of the error message, which is used primarily for debugging periods as a description to avoid using the value in the application. At the same time, pay attention to the English version system and the Chinese version system, the information language choice to the program influence. 4.4 recordset returned by stored procedures

1. The output recordset of the stored procedure: for the structure of the program is clear, the stored procedure is best to return only one recordset. But in some cases to improve performance, it is possible to output multiple recordsets

2, the recordset, each output field finally specifies the alias of the field, with the face real field name information loss to the client, thereby increasing the hacker to find the possibility of system vulnerabilities. 4.5 format conventions

1, all SQL keywords Capital Letter

2, the use of a good variable naming norms

3, to maintain a good structure, including blank lines, indentation and space and so on.

4, block of statements, must be written on the begin ... End

5. Add detailed comments at the beginning of each stored procedure: including stored procedure name, parameter description, function description, return data set description, and author and copyright notices.

6, the code within each stored procedure must be preceded by set NOCOUNT on and set NOCOUNT off.

7. Examples of stored procedure formats are as follows:

/*********************************************/

* Name:user_adduser

* Purpose:add a user to System

* Parameters: @strUserName The user ' s login name

* @strPwd The user ' s password, encrypted by MD5

* @errno the error number, it ' s a ' output ' param

* @errmsg the error message, it ' s a ' output ' param

* returns:the user ' s ID, as a result set.

* Author:ahfu, ah_fu126@hotmail.com

* Copyright:copyrights by Synet 2005.All Rights Reserved.

* Create at:2005-03-18 last modify:2005-03-19

/*********************************************/

CREATE PROCEDURE User_adduser

(

@strUserName varchar (20),

@strPwd varchar (50),

@errno int = 0 OUTPUT,

@errmsg varchar (255) =null OUTPUT

)

As

BEGIN

SET NOCOUNT on

/* The following is the code for the stored procedure * *

SET NOCOUNT off

End 4.6 Convention for the use of things

1. If you change the record more than once, you need to write these actions in a transaction.

2, the transaction must be explicitly submitted or canceled. 4.7 cursor Usage Conventions

1, if not necessary, do not use cursors

2. The stored procedure that contains the cursor must be carefully tested for performance. 5 Database Design specifications 5.1 Data Integrity Specification (coding period)

1, in order to facilitate the coding of the program during the error, you can design the database as much as possible with constraints (check). For example, the value range of the field of integral type, etc., is often field>0.

2. Similarly, use triggers to verify the integrity of the data as much as possible during development.

3, if there is redundancy between the fields, you should write triggers to manage the redundant fields

3, in the development phase to save the complete primary key, foreign key and unique index constraints.

4. Principle: Data integrity takes precedence over performance during coding. Improve the efficiency as far as possible under the premise of guaranteeing the system running correctly. 5.2 Design Tools and version partitioning

3, the use of Er-win design database, while the design of logical and physical views (can also be powerdesigner and other tools, through the ER graph to express the relationship between the data table)

4, the development process as far as possible to ensure that the design documents and physical table structure consistent changes

5, the database development period and the operating period of two versions: the development period version mainly embodies the complete constraints, the runtime version of the efficiency of the optimization. 6 database optimization 6.1 Database Performance Optimization specification (running period)

1. Remove unnecessary constraints (check) at run time.

2, try not to use triggers

3, try to retain the primary key constraints

4, the appropriate removal of foreign keys to improve performance

5, during the run, through the analysis of system access to create indexes to optimize performance

6. Analyze the possible data growth of each table and define the automatic split table rule. Split the large table to improve performance.

7. Consider the data cleansing rule in advance: Under what circumstances can you improve performance by deleting old data from a database?

8, the development of database backup and disaster recovery plan.

9. For efficiency consideration, you can add redundant fields or redundant tables in the system testing phase appropriately.

10. Paging record output must be implemented through stored procedures and cannot be paged using API cursors, which can increase paging efficiency. 6.2 Split Representation example

Case: The website has 2 million users, there are many modules around the user to provide services.

For efficiency, each table holds a maximum of 100,000 records that are related to the user, and 2 million records are split into 20 tables. A user with number 1.1 million saves records to table one, 100001-200000 numbered records to table two, and so on.

Create a split information table that holds which tables are split, split to what extent, and what rules are split.

When you insert a record, you first determine the ID of the user who inserted the record. The stored procedure automatically inserts the table into the corresponding splitter table, depending on the scope of the ID.

When you follow the criteria query, the stored procedure automatically connects all the split tables, and the records are filtered in the bundle. (generally: the same type of query is much larger than the query according to the criteria) 6.3 Redundant fields to establish the example

Case: Message in this table, you want to save the user's ID as a foreign key. Usually, you know which user posted the message by connecting to the message list and the user table.

In order to improve the efficiency, in the message this table added the User name field. When you insert a record, both the user ID and the user name are saved. In this way, when querying, you do not have to connect two tables, so that the efficiency is greatly improved.

However, when the user modifies the user name, do you want to update the user name in the other tables to ignore the effect of this user name inconsistency. How you handle it depends on how important the user name is in the module. 6.4 redundant table Building Example

Case: There are user table and Group table, two tables are many-to-many relationship, establish a user and Group relationship table to achieve this relationship.

There are millions records in the user table, thousands of records in the group table. If each user belongs to more than one group, there will be millions of records in the association table.

Now split the user table and the associated table, splitting the rule into the user's ID range. When querying a user's group, the efficiency is greatly improved. However, when querying users under a group, it is inefficient to correlate all of the split tables.

To improve efficiency, create a redundant user and Group relational table that holds the unified content in the first relational table, but the split rule is the scope of the group ID. Thus, when querying a user in a group, the query in the second relational table of the cluster increases the efficiency greatly. 6.5 Paging scheme in stored procedure

programme I :

1, first statistics to meet the conditions of the number of records

2. Define table variables: The first field of the table variable is from the growth type, and the second is the unique value field in the recordset (typically the primary key)

3. Use the Insert () SELECT statement to save the unique value field of a qualifying record in a table variable.

4. Use the Where ID in (select ID from table variable WHERE ...) to read out the required unique value field from both sides of the table.

Programme two :

1, first statistics meet the conditions of the number of records, and based on the page size calculation pages

2, if read the first page, read directly using the TOP clause

3, if the number of pages in the first half:

Result set 1:select top curpage*pagesize Fields from Table order by ID ASC

Result set 2:select top PageSize * FROM (Result set 1) Order by ID DESC

End result: SELECT * FROM (Result set 2) Order by ID ASC

4, if the number of pages in the latter half:

Result set 1:select Top (pagecount-curpage) *pagesize Fields from Table order by ID DESC

End result: SELECT top PageSize * from Table ORDER by ID ASC 7 Distributed design Instance

Hainan Province Population Management Information System:

According to the data distribution characteristics of the population management Information system, based on the client/server architecture Distributed System technology, the ORALCE database system is used as the basis of distributed system, and the distributed processing scheme of the population Management Information system database is designed and implemented.

Data characteristic and distributed processing analysis of 1 database

Generally, the population management Information System database stores and manages the population data information of the province, its data quantity is huge, the geographical coverage is wide, the data entry and access point are dispersed, and the regional distribution is strong, the traditional central database management method is difficult to be competent for the large database management across the region, Therefore, distributed databases can be used to store and manage population information data.

Combined with the characteristics of Hainan Special Economic Zone, the population is small and the municipal branches do not participate in the management of population information, so the Hainan Province population Information Data management can use the provincial Hall-municipal Bureau-police station level three model; In addition, because of its business location is also very dispersed, so the population management information system needs According to the principle of design site autonomy of distributed database system, at least one database should be set up on each independent node storing data, and multiple databases could be established in the same place to meet different application needs. Accordingly, a number of physical databases are established in the Department of Provincial and Municipal councils, which are linked together through network and database links to form a unified global database.

2 Design of distributed database

Database design Principles

The main task of the system is to obtain data needed for population management, these data include population basic information, ID card information, image information, code information, tail code distribution information and so on, the population information data item is many, and involves many data table and the dictionary table, therefore in the design database, according to this information relations, Establish the corresponding data table and the constraints between each other. Because of storing and managing a large amount of data, it is necessary to establish a good data organization structure and database so that the whole system can call and manage data quickly, conveniently, safely and accurately. In the specific design process of PMIs database, the following design principles are adopted:

(1) using the open multilayer distributed data access structure;

(2) satisfying the 3NF of the theory of relational database, and preserving the data redundancy appropriately for the convenience of programming, thus improving the efficiency of the whole system;

(3) Make full use of the data dictionary function.

Structure of distributed Database system

The whole system adopts the client/server architecture, uses the standard SQL data access, realizes the data distributed processing, each node is responsible for maintaining the local database integrity, through the distributed database management system, realizes each node database dynamic connection and the distribution operation. The city population information data is stored in the city bureau's database, in the provincial department's database, respectively establishes to each city bureau's database link, the province Hall through the view creation or the snapshot inquires these data, the municipal bureau and the police station user directly accesses the Municipal Bureau database server data. In addition, the police stations can not directly between the data to operate, across the information query through the City Bureau database of the view or snapshot completed, cross-city population information query through the city bureau to establish a database table name or view name and other corresponding synonyms completed. In this way, the corresponding synonyms such as the table name or view name of the remote database enable transparent access to the data, which can be replicated periodically through snapshots to achieve non-real-time access to remote data. The topology structure of the distributed database system is shown in Figure 1 below.

The system adopts the centralized-distributed database storage scheme, which is shared by the Municipal bureau through the special Line data, and finally forms the province population

Distributed scheme of data. Local police station database operation through the remote dial can directly access the local municipal bureau database or the relevant data information downloaded locally, in the local processing, and then uploaded to the corresponding City bureau server, by the city bureau through the background of the data Update module to the police station to do the increase, delete, change, etc. reflected to the Municipal Bureau database. For remote data access, the distributed database technology provided by Oracle can be used in the public security computer to realize the transparent access to the database. The provincial police station, the city bureau, the province hall through the city bureau's distributed database realizes to the province population information sharing.

Data replication Technology

Hainan Province Population Management Information System is distributed data systems, the database server is distributed in the provincial department and the Municipal Bureau. Provincial Hall database Server storage Management population information of the province, the municipal bureau of Server storage and management of the city's population information, all over the database server needs timely and effective exchange of information, provincial Office database server through the dedicated line and the municipal servers connected to real-time communication. As the storage and management of distributed database are distributed in many aspects, how to ensure the integrity of data information of each database server is a key problem. The system uses Oracle data replication technology, through the establishment of database links, the configuration of the main group, the municipal Bureau of the database regularly copied the population data to the Provincial Hall database server. In addition, replication technology is used to distribute processing loads in distributed systems and to enable mobile workers to connect to the master data site.

Distributed data query and Data update processing

Access to a table or view of a remote database, using a database link to select, INSERT, UPDATE, delete, and other statements to the data. The database link defines a path to a remote database that is created by a DBA or application developer or network service and is transparent to the users of the distributed database.

In the case of data access, if the data to be accessed comes from the same table, access the table directly or the view or snapshot based on the table; If the data you want to access comes from a different datasheet, it can be done by a connection operation or a corresponding view. The data items in a view are derived from one table or tables from the same database, two or more tables from different databases in the same node, and two or more tables from a database at different nodes. To query the department's data, you first set up a data snapshot of each city bureau, and then query the snapshot.

for data updates for the same node. Because the city Bureau and police station can directly access, so direct access to local data tables, data updates for different nodes can directly access the local data table, you can create a remote table read-write view, users directly access these views to achieve some complex data access. 8 summary This chapter mainly introduces the techniques of database design and an application example. The use of underscores in the naming conventions is now no longer recommended by Microsoft. Please use according to the actual. An example of a population management system comes from a paper. There are other examples on the web, but this is the better one I've seen. Many of the standards in this article are practical. For example, select * Can not appear because the query volume is too large, while not maintenance. For example, add new fields in the future, if it is an image, video, the query effect imaginable. You cannot write SQL statements directly in your program, in order to better maintain your code later. Instead of having to look at the source code and just modify the set variables, and so on. Using stored procedures is now a lot more, and the famous PetShop program has used examples of stored procedures.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.