Tip:
Although a person is very smart, only correct operations on the database can make it run normally.
Many of the content in this chapter is often mentioned and discussed and even analyzed in the previous chapters of this book. This chapter will take advantage of all the content (all theories) that has been introduced before and practice it. Various factors affecting database performance in different database models are described. If some content is clearly repeated in the previous chapters, it indicates that this content is more important in database modeling. Database performance is the most important factor for concerned databases or database models. If the performance is insufficient, the database model cannot serve end users normally. At least for the database, end users are your customers, that is, your food and clothing parents.
Tip:
The customer may be a direct or indirect customer. Indirect customers may be customers of others, such as customers.
The previous chapters have explained the theory of database modeling. This chapter will build a bridge between database modeling and the relevant theoretical concepts described in the previous chapters, and study a large case in practice later in this chapter. Later in this book, we will describe the whole process of thinking, analyzing, designing, and establishing a database model in the actual situation, and study the practicability of database modeling.
After learning this chapter, your understanding of the database modeling method is not limited to theory, but also a little practical knowledge.
Main content of this chapter:
● Factors affecting model adjustment of different types of databases
● Write various details for efficient queries
● Correct use of indexes to improve database performance
● View
● Application Cache
8.1 requirements for different database models
The performance adjustment of different types of database models depends on the service objects of the database, that is, the applications connected to the database. The previous chapters have discussed all relevant theories of database model adjustment, but we should actually associate things with each other. The following sections describe all theories that have been instilled so far and explain the reasons and usage of these theories first.
Different types of database models should be adjusted in different ways. Generally, you can adjust the database model according to the application requirements of the dependent database. This is due to the needs of end users. Two extreme cases of this Division are the OLTP database model and the Data Warehouse database model. The following sections will differentiate the database types based on the performance requirements of different types of database models.
8.1.1 factors affecting OLTP database model Adjustment
The OLTP database serves the internet. The main features of the OLTP database are as follows:
● Large user group-the OLTP database has an incalculable large user group and requires the same information at the same time.
● Extremely high concurrency-concurrency means that the sharing of the same information is very high.
●
Large Databases-OLTP databases are large and small Based on Different Application Types and user groups. Large online retailers around the world may have many servers. In a city in a country, websites that only advertise local nightclubs only have access traffic, so the information contained is much less.
●
Response time-immediate and real-time response to database changes and database activities is required. If you take out cash from a bank ATM and settle your account online one hour later, you must see this transaction. Similarly, if you shop online, you will want to see the credit card account transactions within a few minutes (preferably within a few seconds.
● Small transaction-the user will retrieve a single record or a very small connection.
●
Granularity-many OLTP database models are highly standardized structures, but this is often an error. The OLTP database allows access to small pieces of data, but the problem is that due to over-standardization, sometimes small pieces of data are actually equivalent to a large multi-Table connection. If the table is normalized to comply with all business rules in the table structure, performance problems may occur, even if the user finds 10 to 20 records on the screen. The most typical example is when a user logs on to a bank account and obtains a bank report. If all the information in a piece of paper (or a web page) is distributed in many tables, users may become impatient as long as the response time of the combined data exceeds 7 seconds. Imagine that thousands of other users may access the same data at the same time.
●
Manageability-this is usually feasible, but it is often very difficult. The OLTP database user base is usually distributed around the world, and there are 24 consecutive hours of user access every 365 days every year. This makes OLTP database management complicated and troublesome.
●
Service period-as mentioned above, the OLTP database must be permanently in the wake-up and alert status and available for use at any time. This is an ideal situation, but many service providers advertise reliability slightly lower than 100% based on their own capabilities. A service time less than 100% indicates that a small number of service window time is allowed.
8.1.2 factors affecting the adjustment of the client-server database model
A large number of client-server environments provide services for a small number of users, usually dozens or even fewer. The main features of the client-server database are as follows:
● Small user groups-companies can use either local area networks or wide area networks. It is much easier to predict and estimate the company's internal usage than meeting the OLTP database capacity requirements.
●
Low concurrency-the company-wide client-server database has a user group that can be estimated. This type of user group may be very small or relatively large. However, because the user group size can be estimated, the service requirements can also be estimated. The requirements of the OLTP database are actually predictable. However, for the OLTP database, the user base is too large to calculate, and the use of the OLTP database is often sudden increase (or decrease ), even occasionally there is a large peak value (new end users ). The concurrency of the client-server database is easier to predict than that of the OLTP database. Predictability means that databases are easier to prepare applications and meet application requirements.
●
Database scale-the size of the client-server database is usually small. If the database size is too large, the client-server architecture cannot meet this requirement. Over-using the client-server architecture requires extremely expensive hardware. At this time, the use of OLTP and data warehouse architecture can reduce costs.
● Response Time-operations on a single recorded user interface. The client-server response time can be seen as real-time, and the report may take several minutes.
●
Large and small transactions-the client-server environment connects data in the form of the user interface, as well as the requirements of the report, there are also large and small transactions, the report should be small enough for unified management. This kind of service can be implemented due to low requirements on the number of user groups and concurrency.
●
Granularity-generally, all data items are relatively small, and the table structure is more rigorous. The client-server database can even merge a large number of business rule structures into the table structure by means of standardization at a higher level, such as 3 NF or above.
Tip:
I would like to reiterate my point of view that the application of high-level standardization meets the mathematical requirements, but not necessarily the actual requirements. The application should perform mathematical operations, so that the database is only responsible for data storage, that is, it should not put too much processing in the database. High-level standardization is feasible, but it may be complicated and difficult to manage, modify, and control. Now the application SDK has powerful processing and mathematical computing capabilities, and far better than this. Relational databases aim to store data in a structured manner. The object database can well manage the internal processing of database objects, but the relational database cannot!
● Manageability-at this time, because the parameters are small and can be estimated, and everyone will go home at night, it is easy to manage data because a large number of periods are set aside for maintenance.
● Service period-see the same description in the previous section "Factors Affecting OLTP database model adjustment.
8.1.3 factors affecting the adjustment of the Data Warehouse database model
Data Warehouses are extremely large amounts of data and a few application environments. These application environments are also technical difficulties.
● Minimum user group --
Generally, administrators, developers, and analytic end users access data warehouses. This type of analyticdb end users are generally knowledgeable supervisors or middle-level managers. The main purpose of storing a large number of old data in a data warehouse is to predict the future. analyticdb users need such prediction information.
● Ultra-low concurrency-data sharing in data warehouses is rare. Most activities are read-only or batch update of fact tables when the database is not used for reporting and analysis. Therefore, concurrency is not a problem.
● Horrible database scale --
The data warehouse may be incredible. Administrators and developers must determine how much details are retained, when data is deleted, when data is summarized, and when data is summarized. Many such judgments need to be made during the use of the data warehouse, because it is difficult to predict future requirements during the design and development stages. When the data warehouse is very large, ad-hoc queries may cause serious problems. It is important to train users and inform them of how to correctly write the connection code. In addition, predictions on efficiency, such as the provision of pre-established connection structures and the aggregation of materialized views, can help.
Tip:
Materialized views can copy data and allow access to physical copies of data, thus avoiding access to underlying tables and costly connections and aggregation. Relational databases allow query rewriting using materialized views. Query Rewriting means that the access to a table in the query may be replaced by another smaller and more efficient materialized view. In fact, I/O activities and processing activities are reduced, which can greatly improve query performance.
● Response time --
The response time of the Data Warehouse can be several hours or even longer. The response time is related to many factors, such as the physical size of the Data Warehouse database, the complexity of the end user's report and analysis requirements, the data granularity, and the extent to which the end user understands the Data Warehouse size.
● An astonishing amount of transactions-users can use simple reports and highly complex analysis techniques to retrieve large amounts of data. Therefore, the fewer tables, the better. It is best to update the database with periodic large volume operations.
● Ultra-low granularity --
It is best to use the star schema for a data warehouse, because this structure can minimize the number of tables in the connection. The star mode contains only one fact table. The fact table is connected to a layer that contains many descriptive and static small dimension tables. In this way, you can use a single large table to efficiently connect many small tables. If the connection has multiple large tables, serious performance problems may occur.
● High manageability requirements --
As the data warehouse is large, it is difficult to manage such a large database. The larger the database, the more time and hardware resources required to use and change data. Manageability requirements are gradually being implemented by more mature large-scale database processing technologies, such as very expensive hardware and special techniques (such as clustering, partitioning, parallel processing and materialized views ). Data Warehouses are often read-only large structures. In this way, more flexibility is required to meet the large physical size requirements of the database.
● Service period --
The service period of the Data Warehouse is generally not a problem, because the use of end users is often accidental, burst a large number of I/O activities, rather than continuous use as the OLTP database. Most I/O activities are read-only. Of course, this depends on the real-time data warehouse. In fact, if the real-time Report of the Data Warehouse requires continuous real-time updates, all problems will become very troublesome.
Tip:
One way to solve the problem of data warehouse performance is to use the data mart. A data mart is a sub-part of a data warehouse. A large data warehouse may contain multiple large fact tables that are linked to the same dimension. Data mart can be considered as a group of dimensions of a single large fact table (or a star schema of one or two fact tables) and its link.
8.1.4 database model Adjustment
The biggest problem with database model adjustment is that the adjustment must be carried out in the design phase and should be done before all development work is completed. This is related to the table and the relationship between the tables. Most data warehouses are read-only and are not subject to changes in the production phase. Therefore, data warehouses are mainly read-only environments. The read-only environment can use the special database structure, which can overwrite, copy, and summarize data in a table. For example, materialized views are widely used in data warehouses and even some OLTP databases. Materialized View allows you to copy table data. You can copy a single table or connect to a duplicate to obtain a physical copy of the data. After a materialized view copy is created based on the needs of a query or a group of queries, the query can perform the copy, thus providing better performance.
Adjusting the database model is the most difficult and costly because the SQL code depends on the structure of the underlying database model. Therefore, the adjustment may cause a large number of modifications to the application code. The database model supports other programs and is the basis of other programs. Therefore, changes to the database model will lead to changes to the main application, especially after the application code is developed. The problem is that the adjustment and modification of the database model (for example, the modification of the underlying table) will affect other parts. Because other parts depend on the database model, the project for modifying all applications from the database model is huge, so that all applications need to be modified. This is the reason for establishing a correct database model before application development. Unfortunately, our world is not ideal. We can only strive to do well. Large changes to the database model table structure often lead to full rewriting of application software. However, the performance adjustment method after the database model is developed is available, that is, the alternate index is created. Stored Procedures also facilitate the division, acceleration, and organization of existing databases.
In database model adjustment, the worst case and the most expensive case is standardization, non-standardization, changes to the referential integrity and table structure, and other changes to the basic table structure. It is best to minimize the impact on the relationships between existing tables and tables, such as alternate indexes, materialized views, clusters, and other techniques, to improve the performance of the database model, instead of interfering with the key underlying table structure. Database objects, such as materialized views and clusters, can create copies or overwrites of existing table structures without affecting existing tables, thus avoiding table modifications, and significantly avoids any modifications to the relevant applications, because the application code may have been completed, tested, debugged, and used in the actual production environment. The disadvantage of overwriting and copying copies is that the number of objects (such as materialized views) that can be created is limited. Creating too many objects is not conducive to improving performance.
This is why OLTP databases have low granularity, non-standardization, and small data volumes. Another extreme data warehouse also requires a low granularity and requires a highly normalized (simple) table structure to minimize the number of tables in the join query, which will not seriously impede the report performance of the data warehouse.