Techniques | data | Database design principles for large database databases
Author: Thinking about finishing: Nanhai
A good database product is not equal to have a good application system, if can not design a reasonable database model, not only will increase the client and Server section program programming and maintenance of the difficulty, and will affect the actual performance of the system operation. Generally speaking, in an MIS system analysis, design, test and trial run stage, because the data quantity is small, designers and testers often only notice the realization of the function, but it is difficult to notice the weak performance, until the system put into actual operation for some time, only to find that the performance of the system is decreasing, At this point to consider the improvement of system performance will cost more human and material resources, and the entire system also inevitably formed a patching project. Based on the experience of designing and using the database for many years, the author puts forward some design guidelines for the reference of our colleagues.
Named specifications
Different database products have different requirements on the naming of objects, therefore, the naming of various objects in the database, the code of the background program should be in a case-sensitive form, a variety of objects named length not more than 30 characters, so that the application system to adapt to different databases.
The cautious use of cursors (Cursor)
Cursors provide a step-by-step scan of a particular set, typically using cursors to traverse data line by row, and to perform different operations based on different conditions of the extracted data. In particular, multiple tables and large table definition of the cursor (large data set) cycle is very easy to make the program into a long, such as special or even panic, the author in a city "Housing Provident Fund Management System" in the daily End account roll charge number, Cursor processing for a 100,000-account causes the program to enter an indefinite (48-hour after reckoning) (Hardware environment: alpha/4000 128Mram, Sco unix,sybase 11.0), Changed to different update statements to be completed within 20 minutes.
Examples are as follows:
Declare mycursor Cursor FOR select Count_no from Count
Open MyCursor
Fetch MyCursor into @vcount_no
while (@ @sqlstatus =0)
Begin
If @vcount_no = ' Condition 1
Operation 1
If @vcount_no = ' Condition 2
Operation 2
。。。
Fetch MyCursor into @vcount_no
End
。。。
。。。
To
Update COUNT set operation 1 for condition 1
Update COUNT set operation 2 for condition 2
。。。
。。。
In some cases, it is sometimes necessary to use a cursor, you can also consider the qualifying data rows into a temporary table, and then the temporary table to define cursors to operate, can be significantly improved performance. The author in a certain city, "Telecommunication toll system" Database background program design, to a table (30,000 lines of eligible 30-line data) cursor operation (Hardware environment: PC Server, PII266 64Mram, NT4.0 mobile phone 6.5). Examples are as follows:
Create #tmp/* Define temporary table * *
(Field 1
Field 2
。。。
)
Insert into #tmp select * out of total WHERE condition/* Total 30,000 lines are eligible for only dozens of lines * *
Declare mycursor Cursor FOR SELECT * FROM #tmp
/* Define cursor on temporary table/*
。。。
Guidelines for using Indexes (index)
Creating an index typically has the following two purposes: maintaining the uniqueness of indexed columns and providing fast access to the data in the table. Large databases have two indexes--clustered and non-clustered--and a table without a clustered index stores data by heap structure. All of the data is added to the end of the table, and the table with the clustered index, whose data is physically stored in the order of the cluster key, is allowed to have only one cluster index, so according to the B-tree structure, You can understand that adding any index increases the speed of querying by indexed columns, but reduces the performance of INSERT, update, and delete operations, especially if the fill factor (fill Factor) is larger. Therefore, the index of more tables for frequent inserts, updates, deletions, tables and indexes because of the setting of a small fill factor, in order to leave more free space in the data pages, reduce page segmentation and organizational work.
Consistency and integrity of data
To ensure consistency and completeness of the database, designers tend to design too many table associations (relation) to reduce data redundancy as much as possible. Inter-table Association is a mandatory measure that, when established, inserts, updates, and deletes for the parent table (parent table) and child table (children table) will occupy the system overhead, and it is best not to associate the Identify property field as a primary key with the child table. If the data redundancy is low, the integrity of the data is easy to be guaranteed, but it increases the operation of the connection query between tables, so it is necessary to improve the response time of the system. Using Rules and Constraints (Check) to prevent system operators from accidentally entering errors that cause data is another common tool for designers, however, unnecessary rules and constraints can also take up unnecessary overhead of the system, and it is important to note that constraints are faster than rules for validating data. All of these, designers in the design phase should be based on the type of system operation, frequency of balanced consideration.
The trap of the business
A transaction is a set of actions that is completed at one one-time. Although these operations are a single operation, SQL Server can ensure that the set of operations is either complete or not done at all. It is this characteristic of large database that makes the integrality of data be guaranteed greatly.
As we all know, SQL Server provides implicit transaction control for each independent SQL statement, allowing the data operations of each DML to be fully committed or rolled back, but SQL Server also provides explicit transaction control statements
Start a transaction----begin TRANSACTION
----Commit TRANSACTION commit a transaction
----ROLLBACK TRANSACTION Roll back a transaction
----transactions can be nested and can be retrieved through the global variable @ @trancount to a connected transaction nesting level.
That requires special attention and makes it easy for programmers to make mistakes, each display or implied thing starts with the variable plus 1, the commit of each transaction causes the variable to be reduced by 1, and the rollback of each transaction causes the variable to be 0, and only if the variable is a 0 o'clock transaction commit (when the last commit statement), The physical data is then written to disk.
Database Performance Tuning
In the case of computer hardware configuration and network design, the factors that affect the performance of the application system are database performance and client programming. Most database designers use two-step database design: First, logical design, and then physical design. The database logic design eliminates all redundant data, improves the data throughput speed, guarantees the data integrality, and clearly expresses the relationship between the data elements. and for the association query between multiple tables (especially large data tables), its performance will be reduced, but also improve the programming of the client program, so the physical design needs to compromise, according to business rules, determine the size of the associated table data, access frequency of data items, It is necessary to improve the data redundancy design for the frequent association query of such data table.
Selection of data types
The reasonable choice of data type has a great influence on the performance and operation of the database, and there are a lot of books on this aspect, which mainly introduces some experience.
The Identify field should not be associated with other tables as the primary key of the table, which will affect the data migration of the table.
The Text and image fields are pointer-type data and are used primarily for storing binary large objects (BLOBs). This type of data is slower to operate than other data types, so avoid using it.
The advantage of date-type fields is that there are many date function supports, so it is very simple to compare and subtract the date. However, a query operation that uses a date as a condition will also use a function, which is much slower than other data types, because the server cannot use the advanced performance strategy to optimize the query and can only perform table scans to traverse each row when using functions as the criteria for the query.
For example, to query all records for 1998 years from DATA_TAB1, which has a date field named date.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.