Over the past three years, the Hadoop ecosystem has expanded to a large extent, with many major IT vendors introducing Hadoop connectors to enhance the top tier of Hadoop or the Hadoop release that the vendor uses. Given the exponential growth in the deployment rate of Hadoop and the growing depth and breadth of its ecosystems, we wonder whether the rise of Hadoop will lead to the end of traditional data warehousing solutions.
We can also put this issue into a larger context to discuss: to what extent does large data change the environment of traditional data analysis?
A data warehouse is a technology and software suite that collects data from the operating system, consolidates it, and unifies it into a central database, and then analyzes, visualizes, and tracks metrics on the data dashboard.
The main difference between a data warehouse and Hadoop is that the data warehouse is typically deployed in a single relational database, which functions as a central repository. In contrast, Hadoop and its Hadoop file system are used across multiple machines to handle massive amounts of data, which are not capable of any single machine.
In addition, the Hadoop ecosystem includes a data Warehouse layer/service built on top of the Hadoop core, while the Hadoop upper service includes SQL (Presto), Sql-like (Hive), and NoSQL (Hbase) types of data storage. In contrast, over the past decade, large data warehouses have shifted to using custom multiprocessor devices to extend data volumes, such as the Netezza (which IBM acquired) and the data warehouses provided by Teradata. However, these facilities are very expensive and most SMEs cannot afford them.
In this context, it is natural to ask: is Hadoop a data warehouse terminator?
To answer this question, we need to separate the data warehouse technology from the Data Warehouse deployment. Hadoop (and the presence of NoSQL databases) will herald the demise of a single database deployment of data Warehouse devices and traditional data warehouses.
And there have been examples in this regard. The Hadoop vendor Cloudera its platform as an "enterprise data hub," which essentially incorporates the traditional data management solutions into the requirements. Readwrite.com a similar view in a recently published article titled "Why Proprietary Big data technologies don't want to compete with Hadoop". Similarly, a recent Wall Street Journal article describes how Hadoop challenges Oracle and Teradata.
Hadoop or NoSQL ecosystems will continue to evolve. Many large data environments are starting to choose NoSQL, SQL, and even newsql data Warehouse hybrid methods. In addition, the MapReduce parallel processing engine also has changes and improvements, such as the Apache Spark project. Although the story is far from over, it can be said that the traditional single server relational database or database device is not the future of large data or data warehousing.
On the other hand, data warehousing technologies (including extraction-conversion-and-load, three-dimensional modeling, and business intelligence) will be applied to the new hadoop/nosql environment. In addition, these technologies will be turned around to support more mixed environments. The main principle is that because not all data are equal, IT managers should choose data storage and access mechanisms to accommodate the use of data. The mixed environment will include key value stores, relational databases, graphic storage, document storage, columnar storage, XML databases, metadata catalogs, and so on.
As you can see, this is not a simple question, and it is impossible to come up with a simple answer. However, in general, while large data will change the deployment of the data warehouse over the next five years, it does not result in outdated data warehousing concepts and practices.
What does it mean for the federal government to invest heavily in the Data Warehouse?
First, when there is not enough capacity in the existing data warehouse, the Data warehouse will be transferred to a Hadoop, multi machine or cloud managed solution. Second, the enterprise does not choose the "universal" approach, but will look to fit their enterprise internal data capacity of the hybrid storage method.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.