Original address
I. Definition of metadata
According to the traditional definition, metadata (Metadata) is data about data. In the Data warehouse system, metadata can help data warehouse administrators and developers of data warehouses to easily find the data they care about; metadata is the data that describes the structure and methods of data in a data warehouse, and can be divided into two categories by different uses: technical metadata (technical Metadata) and Business meta data (Business Metadata).
Technical metadata is data that stores technical details about the Data Warehouse system, is used to develop and manage data Warehouse usage, which includes the following information: A description of the data warehouse structure, including the definition of the warehouse pattern, view, dimension, hierarchy, and export data, and the location and content of the data mart, the business system, Algorithms for the architecture and pattern aggregation of data warehouses and data marts, includes metrics and dimension definition algorithms, data granularity, subject areas, aggregation, summary, predefined queries and reports, mapping from the operating environment to the data Warehouse environment, including source data and their content, data segmentation, data extraction, cleanup, Conversion rules and data refresh rules, security (user authorization and access control).
Business metadata describes the data in the data warehouse from a business perspective, providing a semantic layer between the user and the actual system, enabling business people who do not know computer technology to "read" the data in the Data Warehouse. The business metadata mainly includes the following information: The data model expressed by the business terms of the consumer, object name and property name, the source of the principles and data to access the data, the analysis methods provided by the system and the information of the formulas and reports, including the following information: Enterprise conceptual Model: This is the important information that business metadata should provide, It represents high-level information about the enterprise data model, business concepts and interrelationships across the enterprise. Based on this enterprise model, the business people who do not understand the database technology and SQL statements can also know the data in the Data Warehouse. Multidimensional Data Model: This is an important part of the enterprise conceptual model, which tells the business analyst what dimension, dimension categories, data cubes, and aggregation rules are in the Data mart. The data cube here represents the multidimensional organization of the Business Fact table and the dimension table in a subject area.
Dependencies between the business conceptual model and physical data: The business metadata mentioned above only represents the business view of the data, and the correspondence between these business views and the actual data warehouse or database, tables, fields, dimensions and hierarchies in the multidimensional database should also be reflected in the metadata knowledge base.
Ii. the role of meta-data
The Data warehouse is not so much a software development project as a system integration project, because its main task is to integrate the data warehouse tools needed to complete data extraction, transformation and loading, OLAP analysis and data mining. As shown in the following figure, its typical structure consists of the operating environment layer, the Data Warehouse layer and the business layer.
The first layer (operating environment layer) refers to the OLTP system and some external data sources of the business in the whole enterprise; The second layer is a data warehouse layer composed of the data extracted from the first layer to a central area, and the third layer is a business layer composed of various tools to complete the analysis of the business data. The left part of the diagram is metadata management, which plays a role in the connecting link, embodied in the following aspects: 1. Metadata is required for data integration
The most important feature of data warehouse is its integration. This feature is embodied not only in the data it contains, but also in the process of implementing the Data Warehouse project. On the one hand, data extracted from each data source should be stored in the data warehouse according to a certain pattern, these data sources correspond to the data in the Data warehouse and the conversion rules are stored in the metadata knowledge base; On the other hand, in the process of data Warehouse project implementation, it is often time-consuming and laborious to establish data warehouse directly, so in practice, People may build data marts in accordance with a unified data model, and then build data warehouses on the basis of each data mart. However, when the number of data marts is increasing, it is easy to form the "spider web" phenomenon, and metadata management is the key to solve the "spider web". If you are in the process of building a data mart, pay attention to metadata management, in the integration into the data warehouse will be relatively smooth; conversely, if metadata management is overlooked in the process of building data marts, the final integration process can be difficult or even impossible to implement. 2. The semantic layer of metadata definition can help users understand the data in the Data Warehouse
End users cannot be as familiar with database technology as Data Warehouse system administrators or developers, so there is an urgent need for a "translation" that enables them to clearly understand the implications of data in a data warehouse. Metadata enables the mapping of business models to data models so that the data can be "translated" in the way that the user needs to help end users understand and use the data. 3. Metadata is the key to ensuring data quality
When a data warehouse or data mart is established, users often have doubts about the data when they are in use. These suspicions are often due to the fact that the underlying data is not "transparent" to the user, and users naturally doubt the results. With the help of metadata management system, the end users of the data and data extraction and conversion rules will be easy to get, so they will naturally have confidence in the data, of course, can easily find the quality of data problems. Even foreign scholars in the metadata model based on the introduction of quality dimension, from a higher point of view to solve this problem. 4. Meta-data can support demand change
With the development of information technology and the change of enterprise function, the demand of enterprise is changing constantly. How to construct a software system that changes smoothly with the change of requirement is an important problem in the field of software engineering. Traditional information systems often use documents to adapt to changes in demand, but it is not enough to rely solely on documents. A successful metadata management system can effectively manage the entire business workflow, data flow and information flow, so that the system does not rely on specific developers, thus improving the scalability of the system. Iii. Current status of metadata management
We learned from the above sections that metadata can almost be called the "soul" of a data warehouse or even a business intelligence (BI) system, because metadata has an important place throughout the lifecycle of the Data Warehouse, the data warehousing solutions for each vendor refer to the management of the metadata. But unfortunately for the management of metadata, each solution has not explicitly put forward a complete management mode; They provide only the management of specific local meta data. The main data-related tools in the current market are shown in the following figure:
As shown in the figure, the data warehouse tools related to metadata can be roughly divided into four categories: 1. Data extraction tools;
The data in the business system is extracted, transformed and integrated into the data warehouse, such as Ardent DataStage, Pentaho open source ETL products kettle, ETI and so on. These tools provide only technical metadata and provide little support for business meta data. 2. Front End Display tool:
Including OLAP analysis, reports and business intelligence tools, such as Cognos Powerplay, Business objects Bo, and domestic manufacturers sail soft finebi/finereport. They support multidimensional business views by mapping relational tables into business-related facts and dimensions, and then multidimensional analysis of the data in the Data Warehouse. These tools provide a semantic layer that corresponds to the technical metadata of the business metadata. 3. Modeling tools:
Business modeling tools for non-technical people that provide higher levels of business-specific semantics. such as CA's Erwin, Sysbase's PowerDesigner and rational's rose and so on. 4. Meta-data storage tools:
Metadata is typically stored in a dedicated database, which is like a "black box", with no external knowledge of how the metadata used and produced by these tools is stored. There is also a class of tools called the Metadata repository (Metadata Repository) that provide a centralized storage space for metadata, independent of other tools. These tools include Microsoft's Repository,ardent Metastage and Sybase's WCC. 5. Meta-Data management tools:
At present, there are three kinds of metadata management tools in China. First, such as IBM, CA and other companies have provided special tools, such as IBM acquisition of Ascential obtained METASTAGE,CA decisionbase is so; second, like Dag Metacenter, open source products Pentaho Metadata, They do not rely on a bi product, is a third-party metadata management tool; Third, such as Primeton, Carnation Integrator also has its own metadata management tools: Primeton Metacube, the new Torch Network metadata management system, such as Carnation Metaone.
The specialized Metadata management tool, to own the product compatibility is good, once involves the cross system management, is not satisfactory. From the domestic practical application, Dag Metacenter This tool to use the most, the current saw in the telecommunications, financial areas of the construction of metadata management projects are basically the application of this product.
I searched the internet for almost all the metadata manufacturers: Pentaho Open source metadata products, support source download trial, can carry out integrated development, Primeton metacube download, configuration trouble, so far has not been transferred, other company products are not available for download trial. IV. Metadata Management standards
There are no rules radius. One important reason for the difficulty of metadata management is the lack of uniform standards. In this case, each company's metadata management solution is different. In recent years, with the open information model of the Metadata Alliance MDC (Meta Data Coalition) OIM (open Information Model) and the OMG organization of the Public Warehouse models CWM (Common model) standard gradually improved , and the integration of MDC and OMG organizations, provide a unified standard for data warehouse vendors, paving the way for metadata management.
From the history of metadata development, it is not difficult to see that there are two main methods of metadata management: for relatively simple environment, a centralized metadata knowledge base is established according to the common metadata management standard. For the complicated environment, the metadata management system of each part is established, the distributed metadata Knowledge Base is formed, and then the integrated management of metadata is realized by establishing standard metadata Interchange format.
At present, OMG's CWM (Common Warehouse metamodel) standard has become a unified standard in the metadata management world:
The OMG is an ISO with more than 500 members and the famous CORBA standard is from the organization. The main purpose of the Common warehouse meta model (Common Warehouse metamodel) is to help different data warehouse tools, platforms and meta data repositories to exchange metadata in heterogeneous environment. March 2001, the OMG promulgated the CWM 1.0 standard. The CWM model includes both metadata storage and metadata exchange, which is based on the following three industrial standards: UML: It models the CWM model. MOF (Meta object Facility): It is the OMG meta model and the metadata storage standard, provides the interface to the metadata repository in the heterogeneous environment. XMI (XML Metadata Exchange): It enables metadata to be exchanged in the form of an XML file stream.
Original address