The architecture of the Data Warehouse

Source: Internet
Author: User

According to the author's experience in the Data Warehouse ETL internship in Teradata and Main Street network, we can understand the relationship and difference between them in architecture design.

Teradata is generally the enterprise-class data Warehouse, in the Teradata Data Warehouse architecture, generally is buffer layer, model layer, market layer. As shown in the following illustration:

The source system is an Oracle database and typically has multiple. By loading into the Data Warehouse buffer layer with tools such as FastLoad or TPump or mutilload, the design buffer layer is mainly considered from the point of view of technology implementation.

The base layer is based on the Business division theme and is modeled by three paradigms. The situation that the market floor author encounters is the data table that is used as the report.

The entire ETL process, in fact can be described as ELT, extraction, loading, conversion. This process is controlled by the ETL automation developed by Teradata itself.

In general, data is loaded on a daily basis, such as today's data after 24:00 today. There are also business requirements for data to be queried in real time in the Data warehouse, which can be implemented by means of Message Queuing.

In the street network such as the Internet company, the structure of the data warehouse is such, Stg,ods,dwd,app.

STG: temporary layer, technical implementation considerations.

ODS: The source layer, saving the source system detail data.

DWD: Model Layer

App: The application layer, often also the report data table.

As shown in the following illustration:

The difference between an internet company and a traditional enterprise company is that the click Stream Log is a major source of analysis.

The main difference in model design is the design of the ODS layer. The main reasons for considering this design are:

1, technical implementation considerations, the data source from a number of heterogeneous databases, with this layer, you can reduce the cost of conversion, only focus on extraction and loading.

2, the model layer is often based on the model to obtain data, and the requirements are very large changes, and sometimes very urgent, the model layer may not support the demand, so need a stable data details to support.

For more information, refer to http://www.alibuybuy.com/posts/23198.html.

In internet companies, we use hive as a database for data warehouse implementations.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.