Data Warehouse Experience Summary

Source: Internet
Author: User
Data Warehouse Experience Summary

Planning by subject domainDW

Theme domain contains things that interest decision makers in a certain aspect. A topic domain usually covers multiple business departments. For example, a product subject domain involves sales, finance, logistics, procurement, and other departments.

A topic domain includes a topic, such as a product topic.



The domain includes topics such as cost, shipping, and inventory.



The topic domain model is an abstraction of the business model and needs to reflect the enterprise business model from the perspective of decision makers and managers. The decision maker does not need to understand the detailed business details of each department. The manager of the sales department needs to know the product inventory and procurement plan to arrange sales, but does not know the business processes of the Logistics Department and the procurement department. Therefore, while integrating data from multiple business departmentsOLTPThe specific business logic in the database makes data delivery easier to understand and more efficient.

EDWDevelopment

In the development mode, one is to develop multiple data marketplaces one by one, and then combine these data marketplaces into data warehouses. The advantage of this method is high efficiency and quick effect in the early stage. However, due to the independent operation of these data marketplaces, problems may occur in later management and integration.HubMultiple Data marketplaces support a central data mart.

Another development mode is to first develop a unified data warehouse, and then the data warehouse supports multiple data marketplaces. However, this method is difficult to implement or even implement in large enterprises.

What is actually more feasible is parallel development. when starting a new data mart, you can adjust the data warehouse and add new content to the data warehouse. This model requires some experience and an understanding of the entire enterprise, so as to leave space and flexibility for the next adjustment and expansion of the data warehouse.

 

FamiliarBusiness applications

Enterprises usually have a variety of commercial applications.ProgramFor exampleERP,OA,CRM,Ehr,E-business,PDMAnd so on.BaIt will become the data source of the data warehouse. Although the business models and processes of different companies are differentBaThe basic concept is similar to the data model. Although there are already manyEAI(Enterprise applications integration) Tool,ETLFor developers, understandBaIs necessary. If you have the opportunity, contact moreSAP,OracleVarious commercial applications.

 

ETL: Balance between performance and quality

ETLThe two most important aspects of the process are efficiency and quality.

The sameETLSteps:


You can also do this:

 

 

The former is importantETLPerformance method, while the latter is a method that focuses on data quality.ETLThere are different requirements in the process. For example, in the telecom industry, the data is mostly collected by machines, and the data volume is huge but the quality is good. In this case, performance-oriented methods should be adopted. On the contrary, some enterprises have multiple information systems, business personnel of different departments or external customers of enterprises often produce nonstandard data on these systems. To ensure data qualityETLWhen a large number of review steps are addedETLThe process is complicated and the performance will also decrease.

ETLUse different policies in different environments and applications to balance performance and quality.

 

Tips for Data Warehouse Optimization

1.Multiple granularity. The data warehouse needs to store the most fine-grained data, while the data mart can store coarse-grained data. In this way, the performance of data query in the dataset is better than that in the data warehouse.

2.Partition. Partitions can utilize the hardware performance of storage devices, while partitions can be managed and backed up independently to reduce risks.

3.Index.OracleIndex table,SQL ServerThe clustering index determines the physical storage method of data, different fromOLTPThe data in the database and data warehouse is updated on a daily basis and generally the historical data is not rewritten. You can use the date field as the physical storage method to enhance data writing performance.

 

Metadata Management

The purpose of metadata is to make data easier to understand. It has two functions.

On the one hand,ItDeveloped by personnelDWTo deliver the data to the user, let the user know what the data means. For example, when you see the sales amount of the product, you may not know whether the amount is the amount of the product that you open the sales order, the amount of the product that has been shipped, or the amount that should be collected and returned. On the other hand,ItDevelopers are developingDWAfter a while, review againDWYou may have forgotten the data source andETLProcess arrivalDWOr a developerAComplete some work and hand over the project to another developerB,BYou may not understandAProcesses and converts data.

Metadata is implemented by writing documents, and tools, suchETLThe metadata management module in the tool allows developers to understand data,OLAPAdd a friend Name and description for the field in the database.BiClient ConnectionOLAPThe database can understand the meaning of the data without askingItPersonnel.

Metadata makes data easier to understand, provided that metadata itself is easy to understand. A project contains multiple types of metadata, suchETLMetadata,DWMetadata,OLAPMetadata and report metadata......, It is best to make these metadata into a unified document management.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.