According to the author's experience in the Data Warehouse ETL internship in Teradata and Main Street network, we can understand the relationship and difference between them in architecture design.
Teradata is generally the enterprise-class data Warehouse, in the Teradata Data Warehouse architecture, generally is buffer layer, model layer, market layer. As shown in the following illustration:
The source system is an Oracle database and typically has multiple. By loading into the Data Warehouse buffer layer with tools such as FastLoad or TPump or mutilload, the design buffer layer is mainly considered from the point of view of technology implementation.
The base layer is based on the Business division theme and is modeled by three paradigms. The situation that the market floor author encounters is the data table that is used as the report.
The entire ETL process, in fact can be described as ELT, extraction, loading, conversion. This process is controlled by the ETL automation developed by Teradata itself.
In general, data is loaded on a daily basis, such as today's data after 24:00 today. There are also business requirements for data to be queried in real time in the Data warehouse, which can be implemented by means of Message Queuing.
In the street network such as the Internet company, the structure of the data warehouse is such, Stg,ods,dwd,app.
STG: temporary layer, technical implementation considerations.
ODS: The source layer, saving the source system detail data.
DWD: Model Layer
App: The application layer, often also the report data table.
As shown in the following illustration:
The difference between an internet company and a traditional enterprise company is that the click Stream Log is a major source of analysis.
The main difference in model design is the design of the ODS layer. The main reasons for considering this design are:
1, technical implementation considerations, the data source from a number of heterogeneous databases, with this layer, you can reduce the cost of conversion, only focus on extraction and loading.
2, the model layer is often based on the model to obtain data, and the requirements are very large changes, and sometimes very urgent, the model layer may not support the demand, so need a stable data details to support.
For more information, refer to http://www.alibuybuy.com/posts/23198.html.
In internet companies, we use hive as a database for data warehouse implementations.