Use Hive to build a data warehouse

Source: Internet
Author: User
Building a real data warehouse in a data warehouse may be a huge project. There are many different devices, methods, and theories. What is the greatest common value? What are facts and what are theme related to these facts? And how do you mix, match, merge, and integrate systems that may have existed for decades with systems that were only implemented a few months ago? This is still in the big data and H

Building a real data warehouse in a data warehouse may be a huge project. There are many different devices, methods, and theories. What is the greatest common value? What are facts and what are theme related to these facts? And how do you mix, match, merge, and integrate systems that may have existed for decades with systems that were only implemented a few months ago? This is still in the big data and H

Data Warehouse

Building a real data warehouse may be a huge project. There are many different devices, methods, and theories. What is the greatest common value? What are facts, server space, and theme related to these facts? And how do you mix, match, merge, and integrate systems that may have existed for decades with systems that were only implemented a few months ago? This is still before big data and Hadoop. By adding unstructured, Data, NoSQL, and Hadoop to a combination, you will soon get a huge data integration project.

The simplest way to describe a data warehouse is to realize that it can be attributed to star mode, facts, and dimensions. You have the right to create these elements and website spaces-store databases, dynamically extract, convert, and load processes, or integrate secondary indexes. Of course, you can build a data warehouse that contains the star mode, facts, and dimensions and use Hive as the core technology, but this is not easy. Outside the Hadoop world, this will become a greater challenge. Hive is not so much a legal data warehouse as an integration, conversion, and quick search tool. This mode may be like a data warehouse or a Hong Kong server, but its applicability indicates that it is not an RDBMS. So why use it?

What is the star mode?

Imagine a star with one center and multiple "arms" pointing in different directions ". The center is the source or fact table of power. All arms are directed to different dimensions. Many data warehouses have one fact table and multiple dimensions.

Fact tables contain any data that you can weight or calculate. In this example, you have baseball statistics, such as running, hitting, and hitting rate. You can calculate, add, subtract, or multiply these columns.

Dimensions are based on themes. In this example, you have the athlete information dimension, time and date dimension, and so on. Generally, no columns in multiple dimensions are calculated or weighted.

In this example, the key connecting a dimension table to a fact table is playerID.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.