Common issues in Databases

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Do not use cursors. If you want to damage the performance of the entire system, he is the first choice. Memory usage, and the incredible way they use to lock tables. So that all the performance optimization is not done. Each fetch is equivalent to the SELECT command. If there are 1000 records, he will execute 10000 select

2. design the database according to the standardization requirements

3. Do not use select * If you specify the columns you need in select, the following benefits will be brought: 1. Reduce memory consumption and network bandwidth. 2. More secure design. 3. The query optimizer can read all required columns from the index.

4. When an index is added to a table, the SELECT statement is faster, but the insert and delete statements are much slower, because creating an index requires additional work. This problem is not very well grasped, especially when it comes to delete and update, because these statements often include the SELECT command in the where section.

5. Use transactions, especially when the query is time-consuming. 6. Do not use insert to import large amounts of data. 7. Do not use the text data type unless you use text to process a large amount of data. Because it is not conducive to the query, the speed is slow. A lot of space will be wasted if it is not used well. Generally, varchar can better process your own data.

8. Do not use temporary tables. Generally, subqueries can replace temporary tables. Using temporary tables will incur system overhead. If you are using COM + for programming, it will also bring you a lot of trouble, because the temporary tables exist from the beginning and end since COM + uses the database connection pool.

9. In general, a primary key cannot have a foreign key without the primary key. An object in the leaf area can define a primary key or a primary key, but must have a foreign key (because he has a father). A primary key is a high abstraction of an object, pairing a primary key with a foreign key indicates the connection between objects.

10. The relationship between the basic table of the paradigm and Its fields should satisfy the third paradigm as much as possible. However, data design that satisfies the third paradigm is often not the best design. In order to improve the efficiency of database operation, we often need to reduce the paradigm standard: appropriately increase redundancy to achieve the purpose of changing the space for time. 11. Correct Understanding of repeated occurrence of Non-key fields of data redundancy is a type of data redundancy and a low-level redundancy, that is, repetitive redundancy. Advanced redundancy does not appear repeatedly but is derived from a field.

12. The method to prevent data design patching is the "three-pronged principle"

1. The fewer tables in the database, the better. 2. The fewer fields of the combined primary key in the table, the better. 3. The fewer fields in the table, the better.

1. Primary Key and foreign key

Generally, an object cannot have neither a primary key nor a foreign key. There is no entity without a primary key.

2. treat different tables differently

Different from statistical tables, intermediate tables, and temporary tables, basic tables have the following features:

A. Primitive. The record in the basic table is the record of the raw data (basic data.

B. adequacy. Data in statistical tables and temporary tables can be generated based on certain business principles.

C. Stability. The structure of the basic table is relatively stable, and the records in the table must be stored for a long time.

Therefore, when designing a database, try to distinguish the basic table from other intermediate tables and statistical tables. The third paradigm should be satisfied as much as possible. Other tables can properly reduce the paradigm. However, in any case, database design that meets the third paradigm is often not the best design. In order to improve the efficiency of database operation, we often need to reduce the paradigm standard: appropriately increase redundancy to achieve the purpose of changing the space for time.

3. paradigm understanding and redundancy Classification

1 paradigm: 1nf is an atomic constraint on attributes. It requires attributes to be atomic and cannot be decomposed. 2 paradigm: 2nf is a uniqueness constraint on records, and records must be uniquely identified, that is, the uniqueness of the object. The third paradigm: 3nf is a constraint on field redundancy, that is, any field cannot be derived from other fields, and it requires no redundant fields.

Sometimes, in order to improve operational efficiency, we must lower the paradigm standard and properly retain redundant data. The specific approach is to follow the third paradigm in conceptual data model design, and to reduce the paradigm standard to be considered in physical data model design. Reducing the paradigm is to add fields and allow redundancy.

But there is a difference in redundancy. The repeated appearance of the primary key and the foreign key in multiple tables does not belong to data redundancy. This concept must be clear. In fact, many people are still unclear.

A. Repeated non-key fields are data redundancy! It is also a low-level redundancy, that is, repetitive redundancy.

B. Advanced redundancy does not appear repeatedly, but is derived from a field. [Example 4]: the unit price, quantity, and amount fields in the product. The "amount" is derived from the "unit price" multiplied by the "quantity", which is redundancy, it is also a type of advanced redundancy. Redundancy is designed to speed up processing. Only low-level redundancy increases data inconsistency, because the same data may be input multiple times at different times, locations, and roles. Therefore, we advocate advanced redundancy (derivative redundancy) against low-level redundancy (repetitive redundancy ).

The data designed against the specification requires additional work to maintain data integrity. Generally, you can use the following methods to maintain multiple storage of the same data in application transactions. This method is relatively difficult to manage, and a maintenance logic is easy to appear in multiple applications, which is easy to be omitted. B. Batch Processing maintains the data involved in batch processing of all non-normalized relationships by batch processing procedures. Generally, it runs regularly. The running interval is determined by the business, and jobs can be used to automatically run the batch processing program. It can be used in environments with low real-time requirements for redundant data or rules. C. Create a trigger on the database. modifications to the original data will immediately trigger changes to the redundant columns. It can be used in environments with high real-time requirements for data, but it also reduces data insertion and update speeds.

4. Focus on the use of views and materialized views in Databases

A. Simplified Query

B. Hiding database structures, permission management, and security

C. materialized view for Data Preparation and Performance

5. Data Integrity

We recommend that you implement table-level constraints (5 = 2 columns + 2 tables + 1 tables). If you cannot implement complex business constraints, You can implement triggers and stored procedures.

6. Be good at identifying and correctly handling the many-to-many relationship. If there is a many-to-many relationship between two entities, this relationship should be eliminated. The solution is to add a third entity between the two. In this way, the original multi-to-many relationship is now two one-to-many relationships. The attributes of the original two entities should be reasonably allocated to the three entities. The third entity here is essentially a complex relationship, which corresponds to a basic table. Generally, database design tools cannot identify many-to-many relationships, but can process many-to-many relationships.

[Example 3]: In the "library information system", "books" are an entity, and "readers" are also an entity. The relationship between the two entities is a typical many-to-many relationship: A book can be borrowed by multiple readers at different times, and one reader can borrow multiple books. To this end, you need to add a third entity between the two. This entity is named "borrow and return". Its attribute is: Borrow and return time, and borrow and return sign (0 indicates borrowing and returning, 1 indicates Returning books). In addition, it should have two foreign keys (primary keys of "books" and primary keys of "readers ), enable it to connect to "books" and "readers.

========================================================== ========================================================== === SQL query statement essence Daquan ----- Union query, join query 2, Union query Union operator can combine the query results of two or more select statements into a result set show, that is, the Union query is executed. The syntax format of union is: select_statement Union [all] selectstatement [Union [all] selectstatement] [… N] Where selectstatement is the SELECT query statement to be combined. The all option combines all rows into the result set. If this parameter is not specified, only one row is retained for the duplicate row in the Union query result set. During a joint query, the column title of the query result is the column title of the first query statement. Therefore, to define a column title, it must be defined in the first query statement. To sort the Union query results, you must also use the column name, column title, or column number in the first query statement. When using the Union operator, ensure that there are the same number of expressions in the selection list of each joint query statement, and each query selection expression should have the same data type, or they can be automatically converted to the same data type. During automatic conversion, the system converts low-precision data types to high-precision data types. In Union statements that contain multiple queries, the execution sequence is from left to right. Brackets can be used to change the execution sequence. Example: Query 1 Union (query 2 Union query 3)

3. Connection query multiple tables can be queried through the join operator. Connection is the main feature of the relational database model and a symbol that distinguishes it from other types of database management systems. In the relational database management system, the relationship between data does not have to be determined when a table is created, and all information about an object is often stored in a table. When retrieving data, you can use the join operation to query information about different entities in multiple tables. Connection operations bring great flexibility to users. They can add new data types at any time. Create new tables for different entities and then query them through connections. The connection can be established in the from clause or where clause of the SELECT statement. It is similar that it helps to distinguish the connection operation from the search conditions in the WHERE clause when the clause points out the connection. Therefore, this method is recommended in transact-SQL. The connection syntax format of the from clause defined by the SQL-92 standard is: From join_table join_type join_table [ON (join_condition)] Where join_table specifies the table name involved in the join operation, the join can operate on the same table, you can also perform multi-Table operations. The Join Operation on the same table is also called a self-Join Operation. Join_type indicates the connection type, which can be divided into three types: internal connection, external connection, and cross connection.

Inner join uses a comparison operator to compare data in some columns of a table, and lists the data rows in these tables that match the connection conditions. According to the comparison method used, internal connections are classified into equivalent connections, natural connections, and unequal connections. Outer Join is divided into three types: left Outer Join (left Outer Join or left join), right Outer Join (right Outer Join or right join), and full outer join (full outer join or full join. Different from internal connections, external connections not only list the rows that match the connection conditions, but also list the left table (when the left Outer Join is performed) and the right table (when the right outer join is performed) or all data rows that meet the search criteria in two tables (when the table is fully connected. Cross join does not have a where clause. It returns the Cartesian product of all data rows in the join table, the number of rows in the result set is equal to the number of rows that meet the query conditions in the first table multiplied by the number of rows that meet the query conditions in the second table. The ON (join_condition) clause in the Join Operation specifies the join condition, which consists of columns, comparison operators, and logical operators in the connected table. No matter which connection is used, you cannot directly connect the columns of the text, ntext, and image data types, but you can indirectly connect these columns.

Example: Select p1.pub _ id, p2.pub _ id, p1.pr _ info from pub_info as P1 inner join pub_info as P2 on datalength (p1.pr _ INFO) = datalength (p2.pr _ INFO)

(1) The Connection query operation in the internal connection lists the data rows that match the connection conditions. It uses the comparison operator to compare the column values of the connected columns. Internal join is divided into three types: 1. equijoin: Use the equal sign (=) operator in the connection condition to compare the column values of the connected columns. All columns in the connected table are listed in the query results, including duplicate columns. 2. Unequal join: Use a comparison operator other than the equal operator to compare the column values of the connected columns. These operators include >,>=, <=, <,!> ,! <And <>. 3. Natural join: Use the equal to (=) operator in the connection condition to compare the column values in the connected column. However, it uses the selection list to indicate the columns included in the query result set, delete duplicate columns in the connection table. For example, the following uses equijoin to list authors and publishers in the same city in the authors and publishers tables: Select * from authors as a inner join publishers as P on. city = P. if city uses a natural connection, delete the duplicate columns (city and state) in the authors and publishers tables in the selection list: select. *, P. pub_id, P. pub_name, P. country from authors as a inner join publishers as P on. city = P. when the city (2) Outer connection is connected, only the rows that meet the query conditions (where search conditions or having conditions) and connection conditions are returned in the query result set. When an external connection is used, it returns to the query result set that contains not only rows that meet the connection conditions, but also the left table (when the left outer connection is used) and the right table (when the right outer connection is used) or all data rows in two edge join tables (all Outer Join. For example, use the left outer connection to connect the Forum content with the author information: select. *, B. * From luntan left join usertable as B on. username = B. under username, all authors in the city table, all authors in the User table, and their cities are connected using a full outer join: select. *, B. * From City as a full outer join user as B on. username = B. username

(3) crossjoin without the WHERE clause returns the Cartesian product of all data rows in the two joined tables, the number of rows returned to the result set is equal to the number of rows that meet the query conditions in the first table multiplied by the number of rows that meet the query conditions in the second table. For example, if there are 6 types of books in the titles table, and there are 8 publishers in the publishers table, the number of records retrieved by the following crossover will be 6*8 = 48 rows. Select Type, pub_name from titles cross join publishers order by typesql core statement (very useful tips) insert data

Add a new record to the table. You must use the SQL insert statement. Here is an example of how to use this statement:

Insert mytable (mycolumn) values ('some data ')

This statement inserts the 'some data' string into the mycolumn field of mytable. The name of the field to be inserted is specified in the first bracket, and the actual data is given in the second bracket.

The complete syntax of the insert statement is as follows:

Insert [into] {table_name | view_name} [(column_list)] {default values |

Values_list | select_statement}

If a table has multiple fields, you can insert data into all fields by separating the field names and field values with commas. Assume that mytable has three fields first_column, second_column, and third_column. The following insert statement adds a complete record with values for all three fields:

Insert mytable (first_column, second_column, third_column)

Values ('some data', 'Some more data', 'Et more data ')

Note:

You can use the insert statement to insert data to text fields. However, if you need to enter a long string, you should use the writetext statement. This part of content is too advanced for this book, so it is not discussed. For more information, see the Microsoft SQL server documentation.

What if you specify only two fields and data in the insert statement? In other words, you insert a new record to a table, but one field does not provide data. In this case, there are four possibilities:

If this field has a default value, this value is used. For example, assume that no data is provided to the field third_column when you insert a new record, and this field has a default value of 'some value '. In this case, when a new record is created, the value 'some value' is inserted '.

If this field can accept null values without default values, a null value is inserted.

If this field cannot accept null values and there is no default value, an error will occur. You will receive the error message:

The column in table mytable may not be null.

Finally, if this field is an Identifier Field, it will automatically generate a new value. When you insert a new record to a table with an ID field, you only need to ignore this field and the ID field will assign a new value to you.

Note:

After inserting a new record into a table with an identified field, you can use the SQL variable @ identity to access the new record.

. Consider the following SQL statement:

Insert mytable (first_column) values ('some value ')

Insert anothertable (another_first, another_second)

Values (@ identity, 'some value ')

If the table mytable has an identification field, the value of this field will be inserted into the another_first field of the table anothertable. This is because the variable @ identity always saves the value of the last inserted Id field.

The field another_first should have the same data type as the field first_column. However, the field another_first cannot be identified. The another_first field is used to save the value of the field first_column.

Delete record

To delete one or more records from a table, use the SQL Delete statement. You can provide the WHERE clause for the delete statement. The where clause is used to select the record to be deleted. For example, the following Delete statement only deletes records whose value of first_column is 'deleteme:

Delete mytable where first_column = 'deltet me'

The complete syntax of the delete statement is as follows:

Delete [from] {table_name | view_name} [where clause]

Any conditions that can be used in the SQL SELECT statement can be used in the WHERE clause of the delect statement. For example, the following Delete statement only deletes records whose first_column values are 'Goodbye 'or whose second_column values are 'so long:

Delete mytable where first_column = 'goodby' or second_column = 'so long'

If you do not provide a where clause for the delete statement, all records in the table will be deleted. You shouldn't have this idea. If you want to delete all records in the expected table, use the truncate TABLE statement in chapter 10.

Note:

Why should we use the truncate TABLE statement instead of the delete statement? When you use the truncate TABLE statement, the deletion of records is not recorded. That is to say, this means that the truncate table is much faster than the delete table.

Update record

To modify one or more existing records in a table, use the SQL update statement. Like the delete statement, the where clause can be used to update a specific record. See this example:

Update mytable set first_column = 'updated! 'Where second_column = 'Update me! '

This update statement updates the value of all second_column fields to 'Update me! . For all selected records, the value of the field first_column is set to 'updated! '.

The complete syntax of the update statement is as follows:

Update {table_name | view_name} set [{table_name | view_name}]

{Column_list | variable_list | variable_and_column_list}

[, {Column_list2 | variable_list2 | variable_and_column_list2 }...

[, {Column_listn | variable_listn | variable_and_column_listn}]

[Where clause]

Note:

You can use the update statement for text fields. However, if you need to update a long string, use the updatetext statement. This part of content is too advanced for this book, so it is not discussed. For more information, see the Microsoft SQL server documentation.

If you do not provide the WHERE clause, all records in the table will be updated. Sometimes this is useful. For example, if you want to double the price of all books in table titles, you can use the following update statement:

You can also update multiple fields at the same time. For example, the following update statement updates the three fields first_column, second_column, and third_column at the same time:

Update mytable set first_column = 'updated! '

Second_column = 'updated! '

Third_column = 'updated! '

Where first_column = 'Update me1'

Tips

SQL ignores unnecessary spaces in the statement. You can write SQL statements in any format that is easier to read.

Use select to create records and tables

You may have noticed that the insert statement is a little different from the delete statement and the update statement. It operates only one record at a time. However, there is a way for the insert statement to add multiple records at a time. To do this, you need to combine the insert statement with the SELECT statement, like this:

Insert mytable (first_column, second_column)

Select another_first, another_second

From anothertable

Where another_first = 'Copy me! '

This statement copies records from anothertable to mytable. Only the value of another_first In the table anothertable is 'Copy me! .

This form of insert statements is useful when a backup is created for a record in a table. Before deleting records in a table, you can copy them to another table in this way.

If you need to copy the entire table, you can use the select into statement. For example, the following statement creates a new table named newtable, which contains all the data in mytable:

Select * into newtable from mytable

You can also specify that only specific fields are used to create the new table. To do this, you only need to specify the fields you want to copy in the field list. In addition, you can use the WHERE clause to restrict the records copied to the new table. In the following example, only the value of second_columnd is copied to 'Copy me! 'Record's first_column field.

Select first_column into newtable

From mytable

Where second_column = 'Copy me! '

It is very difficult to use SQL to modify the created table. For example, if you add a field to a table, there is no easy way to remove it. In addition, if you accidentally error the Data Type of a field, you cannot change it. However, you can bypass these two problems by using the SQL statements described in this section.

For example, if you want to delete a field from a table. Using the select into statement, you can create a copy of the table, but it does not contain the field to be deleted. This allows you to delete this field and retain data that you do not want to delete.

If you want to change the data type of a field, you can create a new table containing the correct data type field. After creating the table, you can use the update statement and select statement to copy all the data in the original table to the new table. In this way, you can modify the table structure and save the original data.

1. The method to prevent Database Design patching is the "Three-less principle" (1) the fewer tables in a database, the better. Only when the number of tables is small can we explain that the E--R of the system is small and refined, remove redundant entities, form a high abstraction of the objective world, the system data integration, it prevents patching design;

(2) The fewer fields that combine primary keys in a table, the better. Because the primary key is used to create a primary key index and the other is used as a foreign key of the sub-table, the number of fields in the primary key combination is less, which not only saves the running time, it also saves the index storage space;

(3) The fewer fields in a table, the better. Only when the number of fields is small can we see that there is no data duplication in the system and there is little data redundancy. More importantly, we urge readers to "change columns to rows ", this prevents the fields in the sub-table from being pulled into the master table, leaving many blank fields in the master table. The so-called "Change columns to rows" means to pull out part of the main table and create a sub-Table separately. This method is very simple. Some people just don't get used to it, don't adopt it, don't execute it.

The practical principle of database design is to find a proper balance between data redundancy and processing speed. "Three shao" is an overall concept. A general viewpoint cannot isolate a certain principle. This principle is relative, not absolute. The "three-plus" principle is certainly incorrect. Imagine it would be much better to override the E--R diagram of one hundred entities (one thousand properties in total) with the same system functionality than the E--R diagram of two hundred entities (two thousand properties in total.

We advocate the "Three shao" principle, which allows readers to learn to use Database Design technology for system data integration. The data integration step is to integrate the file system into an application database, integrate the application database into a topic database, and integrate the topic database into a global integrated database. The higher the degree of integration, the stronger the data sharing, and the fewer information islands, the number of entities, the number of primary keys, and the number of attributes in the global E-R diagram of the entire enterprise information system will be less.

The purpose of advocating the "Three shao" principle is to prevent readers from using the patch technology to constantly add, delete, and modify databases, so that enterprise databases become the "garbage dumps" for randomly designing database tables ", or the database table "Miscellaneous", and finally cause the database basic tables, code tables, intermediate tables, temporary tables in disorder, countless, leading to the failure to maintain the information system of enterprises and institutions.

2. Ways to improve the database operation efficiency under the given system hardware and system software conditions, the way to improve the operation efficiency of the database system is: (1) in the physical design of the database, reduce the paradigm, increase redundancy, use fewer triggers, and use more stored procedures. (2) When the computation is very complex and the number of records is very large (for example, 10 million records), complex computation must first be performed outside the database, after the file system is processed in C ++, the database is appended to the table. This is the experience of designing the telecom billing system. (3) It is found that there are too many records for a table. For example, if there are more than 10 million records, perform horizontal segmentation on the table (Oracle's partition table is powerful and can replace this operation ). The horizontal split method uses a value of the primary key PK of the table as the boundary to horizontally split the records of the table into two tables. If you find that there are too many fields in a table, for example, more than 80 fields, split the table vertically and split the original table into two tables. (4) Optimize the Database Management System DBMS, that is, optimize various system parameters, such as the number of buffers. (5) When using the data-oriented SQL language for programming, we should try to adopt optimization algorithms. In short, to improve the operational efficiency of the database, we must make efforts at the same time at the three levels: database system-level optimization, database design-level optimization, and program-level optimization.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Common issues in Databases

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Common issues in Databases

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support