How data builds a bank data warehouse
Song Yuchang, No. 11th Xinhua East Road, Dengzhou, Henan, China.
As a new technology in data management field, the essence of Data Warehouse technology is to put forward a comprehensive solution for online analytical processing (OLAP), which differs from many previous technologies, and it is mainly a concept, under the guidance of this concept, to complete the system construction. There are no ready-made products that can be purchased directly, and no specific analysis specifications and implementation methods, that is, there is no mature, reliable and widely accepted data warehouse standards. In the past, the design and implementation of relational database, not only have a detailed theoretical deduction, there are countless design examples, no matter what you are using the company's database products, development tools, as long as the specification to do, then achieve the same business needs of the program will be very similar. In the existing data warehouse, the difference between MOLAP scheme and ROLAP scheme appears, and various data warehouse modeling tools and performance tools appear, and the personal experience and quality of designers will play an important role. The implementation of data Warehouse technology at present in the actual application of data warehouse technology mainly includes the following several specific implementation methods. 1. Establish Data Warehouse (ROLAP) 2 on relational database, establish Data Warehouse (MOLAP) in multidimensional database The MOLAP scheme is to organize data in multidimensional way, to store data in multidimensional way, and ROLAP scheme to express multidimensional concept with two-dimensional relational table as core. By dividing the multidimensional structure into two kinds of tables: dimension table and fact table, the relational structure can be well adapted to the representation and storage of multidimensional data. In the multidimensional data model expression, the multidimensional matrix is more clear and occupies less storage than the relational table, and the system can be the biggest problem in the ROLAP systems that query data through the connection between relational tables. MOLAP schemes are simpler than ROLAP schemes, and indexing and data aggregation can be automated and managed automatically, but at the same time lose some flexibility. The implementation of ROLAP scheme is more complicated, but the flexibility is good, the user can define statistics and calculation method dynamically, and also can protect the investment in the existing relational database. Because of the advantages and disadvantages of both schemes, it is often used in combination with MOLAP and ROLAP in practical applications, that is, the so-called hybrid model. The use of relational databases to store historical data, detail data or non-numeric data, to play the advantage of mature relational database technology, to reduce costs, and to store the current data and common statistics in a multidimensional database to improve operational performance. 3, in the original relational database to establish a logical data warehouse because of the current running OLTP system has accumulated a large number of data, how to extract the necessary information for the decision to become the most urgent needs of users. The new data Warehouse can provide a complete solution from function and performance, but it needs a lot of manpower and material resources, and the building of data warehouse and the accumulation of analysis data need some time, which can not meet the urgent need of the users to analyze the information in time. Therefore, in the early stage of building a data warehouse, some suitable performance tools can be used to build up the original OLTP systemA logical Data Warehouse system. Although due to the limitations of the original OLTP system design, such a system may not be able to achieve a lot of analysis functions, but such a system of data structure fixation, information analysis needs relatively stable and mature, so the database modeling, implementation process will be relatively easy and convenient; Such a system will also become the future of real Data Warehouse construction prototype. The relationship between information system and data warehouse because of the large amount of data and the diversification of data sources, it is unavoidable to encounter how to manage these voluminous data and how to extract the useful information from the commercial banks when they build the management information system. , and the biggest advantage of data warehouse is that it can centralize the business data of different information islands in the enterprise network, store it in a single integrated database, and provide various means to statistic and analyze the data. Therefore, it can be said that the use of data Warehouse in the bank to build management information systems, both pressure and data base, the connection between them is inevitable, difficult to give up. The application scope of data warehouse in Commercial bank includes deposit analysis, loan analysis, customer market analysis, Related Financial analysis and decision (securities, foreign exchange trading), risk forecast, benefit analysis and so on. When the bank information system is built, because of the different historical situation and the actual demand, there are two ways: 1. Due to the current domestic commercial banks ' internal operation supervision, the lack of a good data collection mechanism, it can be in the construction of management information system, data collection input and data summary analysis two parts to consider. In such a system, because there is no need to consider the processing of a large number of historical data, and considering that there may be multiple data sources in the process of collection, it is possible to build a data warehouse at the same time in the system construction, and integrate various data collected into the Data warehouse. 2, perfect the original system and for the existing OLTP system, which precipitated a large number of historical data, you can first in the original system to establish a logical data warehouse, that is, the use of data analysis of the Performance tool, in the relational model to build a virtual multidimensional model. When the system needs to be stabilized, the physical Data Warehouse is established, which saves the investment and shortens the development period. Problems to be noticed in the implementation of the model design (including logical model design and physical model design) is the basis of the system and the key to success or failure, in the actual operation, depending on the implementation of the different technology should be the following issues to pay attention to. 1, directly build the data warehouse directly to build the data warehouse, the requirements of business analysis must be reorganized in the OLTP system data, and the different focus on the organization to make it easy to use. The theme of the subject is a logical concept, it should be able to fully and uniformly depict the analysis of the data involved in the object and the relationship. The basis of dividing the subject mainly comes from two aspects: the analysis of the original fixed report and the interview with the business personnel. The original fixed report can better reflect the needs of the previous work on the data analysis, and the data meaning and format is relatively mature and stable, and it needs a lot of reference in the model design. But just fullEnough to replace the current manual report is far from the goal of building management information system, but also through business interviews, further excavation of the daily work of potential broader, deeper analysis needs. Only in this way can you really understand the topic partitioning needed to build a data warehouse model. * The analysis of the content of the refinement of the topic is directly related to the scope of the analysis of content, once the topic is clearly divided, the next step is to refine the analysis of the specific content and based on the nature of the analysis to determine its location in the Data Warehouse. Usually the dimension element corresponds to the analysis angle, and the measure corresponds to the specific index of the analysis concern. Whether an indicator is a dimension element, the measurement or the dimension attribute depends on the specific business requirements, but the following conceptual experience can be summed up from the actual operation: as a dimension element or a dimension attribute, it is usually a discrete type of data, allowing only limited values; As a measure of continuous data, the value is infinite. If you must use continuous data as a dimension element, you must segment it by value, taking the segmented value as the actual dimension element. When determining whether an analysis metric is a dimension element or a dimension attribute, it is necessary to consider the storage space occupied by this metric and the usage frequency of the related query synthetically. It is important to emphasize that in the process of refining the content, it is necessary to solve the ambiguity problem of the index. Indicators of the same name in different reports and in business interviews, whether they are extracted or computed by the same method under the same conditions, and what are the interrelationships between them, these questions must be accurately and clearly answered from the analyst who is familiar with the business, otherwise it will affect the model design, data extraction, Data presentation and many other aspects. * Granularity Design the granularity of data stored in the Data warehouse model will have an impact on the information system in many aspects. What level of the various dimensions is the most fine-grained in the fact table determines whether the stored data satisfies the functional requirements of the information analysis, and the level division of granularity and the selection of granularity in the aggregation table will directly affect the response time of the query. If the same information system is to be run at the same time at a large and multi-level level, such as departmental and enterprise level, it should also consider different levels of data warehouse using different granularity. * In the model design of the techniques of composite indicators, especially the definition of ratio-type indicators, we must pay attention to add and subtract after multiplication, or vice versa. The number of households, the number of calculations, such indicators in the analysis or reports often appear, but do not need to be a separate indicator of physical presence in the database, but the definition of the analysis model must be prepared. The time characteristic of the measure, according to the different performance of the analysis index in the time dimension, can be divided into the additive index, the half additive index and the not accumulation index. 2, build the logical data warehouse on the basis of the original data if you use data from an OLTP system directly to analyze and process data, you will encounter a lot of trouble, sometimes even impossible. This is not to say that the relational database is not good, but because its design ideas do not adapt to large-scale data analysis. Therefore, when using this method, you need to pay attention to the following issues: * Different time units This is the implementation process of the mostFrequently encountered problems, but also often the most difficult to solve the problem. The time that is stored in the OLTP system is often the same as the actual business, such as the date of the accounting data unit, and the financial statement unit for the month or half-year. In the case of analysis, it is necessary to unify the data of different time units into the same result, so there must be an appropriate transformation mechanism to realize. * Redundant information is called redundant information, which refers to the field of the same meaning that exists in the different relational tables, and the same meaning not only refers to the way in which these fields are obtained or calculated, but also the conditions for their establishment, such as the loan balance of the same loan in the same area as at a certain time In OLTP systems, such fields are often designed to be based on performance considerations, and in order to ensure the uniqueness and accuracy of the results when oriented to the analysis design model, the analysis results must be produced using only one of the data. * Inter-table connection because the design of table in OLTP system is oriented to business processing, it is necessary to ensure data integrality, consistency and response time, so the table and table are both independent and interdependent. In the design of Data Warehouse logic model, the connection between the tables must make the corresponding trade-offs, both the analysis data can be obtained or calculated through the connection, but also avoid the loop, resulting in ambiguity analysis data. In addition, different connection paths will also have different query speed, which affects the response performance of data analysis. * The design of statistical tables if the above problems cannot be solved well on the basis of the original database, then the interest is to build statistical tables, that is, the simplified Data Warehouse, form similar to the Data Warehouse fact table, the timing of statistical data into the time, redundancy, connectivity and other issues discarded, for simple analysis. Second, the data extraction problem data extraction is a technical content is not high, but very cumbersome work, must have a person responsible for data extraction work. In the design, we should pay attention to the problems are: 1, data extraction rules to be used as metadata for the specification and management, extraction process of the source table, source field, destination table, destination field, conversion rules and conversion conditions must be well documented. This is not only easy for programmers to implement, but also easy to modify when the extraction rule or logical model changes. 2, how to record the changes in the business database is an important link in the data extraction. Because data is saved by time in the warehouse, the difference in data between different points of time becomes a key factor. Usually can use the method that the database management system provides to produce the data change log at the database level, according to the log again to judge the data change condition completes the extraction, this is one from the performance, the maneuverability as well as to the original business system influence and so on many aspects comprehensive consideration all is comparatively ideal method. 3. When the data in the same table in the Data warehouse comes from different tables in the original system, and even different libraries, it is necessary to ensure that these data units are consistent and satisfy the same time condition. 4, data extraction not only to consider the extraction of data, but also to consider the timing of the extraction and implementation of the way, this is a complete data extraction schemeTo ensure that the extracted data is accurate and usable. Problems of maintenance and optimization in the later stage the construction of data Warehouse is a long-term work, it needs to be adjusted and perfected in the process of operation as well as other systems. This includes two areas of work: 1, the performance data warehouse involves the query of massive data, the large amount of data written and read out, not only the requirement of database system is very high, but also different from the requirements of OLTP system, so the performance of Data Warehouse system is a problem that can not be neglected in the process of system design, implementation and maintenance. Especially during the operation, we should pay close attention to the consumption of the system resources and adjust the system according to the characteristics of the application, including adjusting database parameters, data fragment placement, creating special index and even improving system configuration. 2, the model application and the demand are mutually promotes, the unceasing development, as the information system completes the operation, the user in to the system understanding unceasingly deepens the process, also will put forward to the system to update the higher request. How to meet the needs of users under the precondition of minimum input is also a problem worthy of attention and painstaking study. The first step is to tap the potential of existing systems, and secondly, to consider the need to increase the theme or to add a small number of indicators on the existing system to address the needs of the system, to adjust the systems, and finally to consider the reconfiguration of the system, as much as possible to reduce the investment in system construction The application of data Warehouse in the application of the above method, mainly completes the report generation and the daily business analysis, this does not bring the real benefit to the enterprise, also far did not give play to the Data Warehouse application value. With the application in depth, it can be worked out by the technical personnel of the enterprise and the business personnel, plan the application model of the actual value to the enterprise, and adjust the parameters of the model according to the development of the actual business, in order to find the rule in the operation of the enterprise, that is, data mining on the Data Warehouse This can fully reflect the significance of the construction of Data Warehouse, which ultimately brings benefits to enterprises. Although the data warehouse technology still needs to develop and improve, but as long as the enterprise can realize the importance of information analysis, business personnel and technical personnel can really cooperate, I believe that in the near future there will be more practical results appear.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.