A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
I. Design of distributed Database System 1 Database design Overview
Database design refers to the construction of an optimal database schema for a given application environment, the establishment of a database and its application system, so that it can effectively store data.
Basic Steps for database design (Figure 2.1):
4, after the prefix of the table, is the description of the specific content of the table. For example: The user login information table name is: User_login, the user in the Forum information table name is: User_bbs_info
6, when the system has a number of small, recurring values, the use of dictionary tables to save space and optimize the query. such as the region, the system of user type Code, and so on. Such values do not change during the run-time of the program, but need to be stored in the database.
In the case of a region, if we want to query the records of a certain region, the database needs to be queried by string matching, and if the region is changed to the code of a region to be stored in the table, the query through the region's code to query, the query efficiency will be greatly improved.
The program should use a large number of dictionary tables to represent such values. In a dictionary table, a collection of code names and entities that hold such values is associated with a table that uses such values. However, in the coding phase, programmers do not use dictionary tables, because first of all, query the dictionary List of entity code, violates the purpose of improving query efficiency. With the help of a data dictionary, programmers use code names directly to represent entities, thereby improving efficiency.
Although the dictionary table is not actually used, it should remain in the database (at least for the duration of the development period). The dictionary table appears as another form of "Data dictionary document" to show which of the tables in the database use the dictionary table.
In order to improve the data integrity of the database, the complete dictionary table and the foreign KEY constraint of ordinary table can be kept in the development phase. However, in the runtime of the database, the foreign key of the normal table and dictionary table should be deleted to improve the efficiency of the operation, especially when some tables use many dictionary tables.
Case: A database has millions user information, the application system often needs to query the user's information according to the region. The User information table was previously saved according to the specific region name, and now the specific name is changed to the area code in the dictionary table, and the query efficiency is greatly improved.
The dictionary table is uniformly prefixed with dic_. 2.3 field naming conventions
6, the field name is the English phrase, the adjective + noun or the auxiliary verb + verbs tense form expression, the case mixes, follows "sees the name to know the meaning" the principle. 3 SQL Statement specification
6. All SQL keywords capital 4 Stored Procedure Code specification 4.1 only allows applications to access the database through stored procedures
Allows modular programming: simply create the process once and store it in the database, and then call the procedure any time in the program. Stored procedures can be created by people who specialize in database programming and can be modified independently of the program source code.
Allow faster execution: If an operation requires a large number of Transact-SQL code or is repeated, the stored procedure will execute faster than Transact-SQL batch code. It is parsed and optimized when the stored procedure is created, and can be used in an in-memory version of the procedure after the process is first executed. Each time a Transact-SQL statement is run, it is sent repeatedly from the client and is compiled and optimized every time SQL Server executes the statements.
Reduce network traffic: an operation that requires hundreds of lines of Transact-SQL code can be implemented by a separate statement that executes the process code, without the need to send hundreds of lines of code across the network.
1. Logically, the stored procedure separates the application layer from the physical structure of the database. A stored procedure forms an interface between an application and a database. Such interfaces abstract complex database structures and conform to the idea of "interfacing programming" in extreme programming.
2, the main business logic encapsulated in the stored procedures, to avoid the application layer to write a large number of code (in the application of the string inserted too long SQL statements affect efficiency, and maintenance difficulties). It can improve the development efficiency, and debug the stored procedure directly in Query Analyzer, and find the logic problem in the system earlier, thus improving the quality of the code.
3, in the Web site type of application system, SQL injection vulnerability has been difficult to completely eliminate the loophole. This type of security problem can be greatly reduced if the database is accessed only through stored procedures. (therefore, even a simple SQL statement should be written as a stored procedure.) ）
4, as a result of the use of stored procedures, the level of the application can not care about the specific database structure, but only concerned about the interface calls stored procedures. Therefore, the advantages of stored procedures are obvious in the following situations:
• Requirements change, the structure of the table must be changed. Using stored procedures, as long as the parameters are unchanged, we just need to modify the corresponding stored procedures, without the need to modify the application code. Such a design would reduce the impact of the requirements change on the project.
• To improve efficiency, make part of the field redundant: some of the frequently accessed fields, we can make redundant storage in related tables. This improves efficiency and masks redundant details through stored procedures.
• To improve efficiency, use redundant tables (split tables): Some large tables, in order to improve query efficiency, you may need to save records to multiple tables. Using stored procedures, there are stored procedures that determine which split tables to get or insert data from. This improves efficiency without having to care about specific split rules at the application level.
5, the use of stored procedures to facilitate the project later or in operation to centrally optimize system performance. In the process of project development, because of various reasons, often unable to write efficient code, this problem is often in the late project or in the runtime reflected. Through stored procedures to encapsulate access to the database, after the project integration, through the trial operation to observe the efficiency of the system, so it is easy to find the bottleneck of the system, and can optimize the code of stored procedures to improve the efficiency of the system. This optimization is more efficient and easier than optimizing in the application process.
Problem two: Implementing a large amount of logic in a stored procedure will allow a large number of operations to be performed on the database server rather than on the application server. When the traffic is very large, the database server's CPU usage is greatly consumed.
This is also a case in point: There is a massive web site with multiple Web servers that form a load-balanced server cluster, but only a single central database server. When the traffic continues to increase, access to more Web servers to meet the high concurrency of access, but the database server has not been able to increase. Therefore, it is necessary to complete the business logic on the Web server as far as possible and avoid consuming the resources of the database server.
One solution to the problem: stored procedures are compiled SQL statements that are binary code in memory and do not consume too much memory. Also, stored procedures are significantly more efficient than using SQL statements directly. In other words, this is a "space for Time" scheme, a little more memory to exchange for efficiency improvement, is worthwhile.
Solution of problem Two: first of all, in the implementation of business logic, the implementation of the stored procedures is easier than in the application, and secondly, from the development efficiency, the development of the stored procedure is simpler than the application (to complete the same logic). In the high traffic system, the problem of resource allocation of application server and database server should be opened from the cost angle: The cost of software development is much higher than the cost of hardware expenditure. We can easily spend money to buy better servers, but it's hard to spend money on developers making the program a big improvement.
Use stored procedures to encapsulate business logic, first saving a lot of development time and debugging time, and can greatly improve the quality of the code. Therefore, in terms of cost, you should use stored procedures.
Second, at the application level, you can use a lot of static file caching methods to reduce the pressure on the database. For example: Infrequently changing information can be read from the database server, saved as an XML static file on the application server, and so on.
If you can't do it, you should design the system as a distributed system at the beginning of the design, considering possible traffic. This will fundamentally solve the problem of large traffic. 4.2 naming conventions
1. The prefix of the stored procedure is similar to the prefix of the table name: a series of tables is considered an object, the field is the object's property, and the stored procedure is the method of accessing the object. For example: Add a user's stored procedure named: User_adduser
2, each stored procedure has: @errno int and @errmsg varchar (255) Two output parameters. In your application, you can get the execution of the stored procedure based on these two parameters. (These two parameters use the default values and can be ignored)
ErrMsg is a string description of the error message, which is used primarily for debugging periods as a description to avoid using the value in the application. At the same time, pay attention to the English version system and the Chinese version system, the information language choice to the program influence. 4.4 recordset returned by stored procedures
1. The output recordset of the stored procedure: for the structure of the program is clear, the stored procedure is best to return only one recordset. But in some cases to improve performance, it is possible to output multiple recordsets
2, the recordset, each output field finally specifies the alias of the field, with the face real field name information loss to the client, thereby increasing the hacker to find the possibility of system vulnerabilities. 4.5 format conventions
5. Add detailed comments at the beginning of each stored procedure: including stored procedure name, parameter description, function description, return data set description, and author and copyright notices.
End 4.6 Convention for the use of things
2, the transaction must be explicitly submitted or canceled. 4.7 cursor Usage Conventions
2. The stored procedure that contains the cursor must be carefully tested for performance. 5 Database Design specifications 5.1 Data Integrity Specification (coding period)
1, in order to facilitate the coding of the program during the error, you can design the database as much as possible with constraints (check). For example, the value range of the field of integral type, etc., is often field>0.
4. Principle: Data integrity takes precedence over performance during coding. Improve the efficiency as far as possible under the premise of guaranteeing the system running correctly. 5.2 Design Tools and version partitioning
3, the use of Er-win design database, while the design of logical and physical views (can also be powerdesigner and other tools, through the ER graph to express the relationship between the data table)
5, the database development period and the operating period of two versions: the development period version mainly embodies the complete constraints, the runtime version of the efficiency of the optimization. 6 database optimization 6.1 Database Performance Optimization specification (running period)
For efficiency, each table holds a maximum of 100,000 records that are related to the user, and 2 million records are split into 20 tables. A user with number 1.1 million saves records to table one, 100001-200000 numbered records to table two, and so on.
When you insert a record, you first determine the ID of the user who inserted the record. The stored procedure automatically inserts the table into the corresponding splitter table, depending on the scope of the ID.
When you follow the criteria query, the stored procedure automatically connects all the split tables, and the records are filtered in the bundle. (generally: the same type of query is much larger than the query according to the criteria) 6.3 Redundant fields to establish the example
In order to improve the efficiency, in the message this table added the User name field. When you insert a record, both the user ID and the user name are saved. In this way, when querying, you do not have to connect two tables, so that the efficiency is greatly improved.
However, when the user modifies the user name, do you want to update the user name in the other tables to ignore the effect of this user name inconsistency. How you handle it depends on how important the user name is in the module. 6.4 redundant table Building Example
Now split the user table and the associated table, splitting the rule into the user's ID range. When querying a user's group, the efficiency is greatly improved. However, when querying users under a group, it is inefficient to correlate all of the split tables.
To improve efficiency, create a redundant user and Group relational table that holds the unified content in the first relational table, but the split rule is the scope of the group ID. Thus, when querying a user in a group, the query in the second relational table of the cluster increases the efficiency greatly. 6.5 Paging scheme in stored procedure
End result: SELECT top PageSize * from Table ORDER by ID ASC 7 Distributed design Instance
According to the data distribution characteristics of the population management Information system, based on the client/server architecture Distributed System technology, the ORALCE database system is used as the basis of distributed system, and the distributed processing scheme of the population Management Information system database is designed and implemented.
Generally, the population management Information System database stores and manages the population data information of the province, its data quantity is huge, the geographical coverage is wide, the data entry and access point are dispersed, and the regional distribution is strong, the traditional central database management method is difficult to be competent for the large database management across the region, Therefore, distributed databases can be used to store and manage population information data.
Combined with the characteristics of Hainan Special Economic Zone, the population is small and the municipal branches do not participate in the management of population information, so the Hainan Province population Information Data management can use the provincial Hall-municipal Bureau-police station level three model; In addition, because of its business location is also very dispersed, so the population management information system needs According to the principle of design site autonomy of distributed database system, at least one database should be set up on each independent node storing data, and multiple databases could be established in the same place to meet different application needs. Accordingly, a number of physical databases are established in the Department of Provincial and Municipal councils, which are linked together through network and database links to form a unified global database.
The main task of the system is to obtain data needed for population management, these data include population basic information, ID card information, image information, code information, tail code distribution information and so on, the population information data item is many, and involves many data table and the dictionary table, therefore in the design database, according to this information relations, Establish the corresponding data table and the constraints between each other. Because of storing and managing a large amount of data, it is necessary to establish a good data organization structure and database so that the whole system can call and manage data quickly, conveniently, safely and accurately. In the specific design process of PMIs database, the following design principles are adopted:
The whole system adopts the client/server architecture, uses the standard SQL data access, realizes the data distributed processing, each node is responsible for maintaining the local database integrity, through the distributed database management system, realizes each node database dynamic connection and the distribution operation. The city population information data is stored in the city bureau's database, in the provincial department's database, respectively establishes to each city bureau's database link, the province Hall through the view creation or the snapshot inquires these data, the municipal bureau and the police station user directly accesses the Municipal Bureau database server data. In addition, the police stations can not directly between the data to operate, across the information query through the City Bureau database of the view or snapshot completed, cross-city population information query through the city bureau to establish a database table name or view name and other corresponding synonyms completed. In this way, the corresponding synonyms such as the table name or view name of the remote database enable transparent access to the data, which can be replicated periodically through snapshots to achieve non-real-time access to remote data. The topology structure of the distributed database system is shown in Figure 1 below.
Distributed scheme of data. Local police station database operation through the remote dial can directly access the local municipal bureau database or the relevant data information downloaded locally, in the local processing, and then uploaded to the corresponding City bureau server, by the city bureau through the background of the data Update module to the police station to do the increase, delete, change, etc. reflected to the Municipal Bureau database. For remote data access, the distributed database technology provided by Oracle can be used in the public security computer to realize the transparent access to the database. The provincial police station, the city bureau, the province hall through the city bureau's distributed database realizes to the province population information sharing.
Hainan Province Population Management Information System is distributed data systems, the database server is distributed in the provincial department and the Municipal Bureau. Provincial Hall database Server storage Management population information of the province, the municipal bureau of Server storage and management of the city's population information, all over the database server needs timely and effective exchange of information, provincial Office database server through the dedicated line and the municipal servers connected to real-time communication. As the storage and management of distributed database are distributed in many aspects, how to ensure the integrity of data information of each database server is a key problem. The system uses Oracle data replication technology, through the establishment of database links, the configuration of the main group, the municipal Bureau of the database regularly copied the population data to the Provincial Hall database server. In addition, replication technology is used to distribute processing loads in distributed systems and to enable mobile workers to connect to the master data site.
Access to a table or view of a remote database, using a database link to select, INSERT, UPDATE, delete, and other statements to the data. The database link defines a path to a remote database that is created by a DBA or application developer or network service and is transparent to the users of the distributed database.
In the case of data access, if the data to be accessed comes from the same table, access the table directly or the view or snapshot based on the table; If the data you want to access comes from a different datasheet, it can be done by a connection operation or a corresponding view. The data items in a view are derived from one table or tables from the same database, two or more tables from different databases in the same node, and two or more tables from a database at different nodes. To query the department's data, you first set up a data snapshot of each city bureau, and then query the snapshot.
for data updates for the same node. Because the city Bureau and police station can directly access, so direct access to local data tables, data updates for different nodes can directly access the local data table, you can create a remote table read-write view, users directly access these views to achieve some complex data access. 8 summary This chapter mainly introduces the techniques of database design and an application example. The use of underscores in the naming conventions is now no longer recommended by Microsoft. Please use according to the actual. An example of a population management system comes from a paper. There are other examples on the web, but this is the better one I've seen. Many of the standards in this article are practical. For example, select * Can not appear because the query volume is too large, while not maintenance. For example, add new fields in the future, if it is an image, video, the query effect imaginable. You cannot write SQL statements directly in your program, in order to better maintain your code later. Instead of having to look at the source code and just modify the set variables, and so on. Using stored procedures is now a lot more, and the famous PetShop program has used examples of stored procedures.
Start building with 50+ products and up to 12 months usage for Elastic Compute Service