How Hadoop creates a powerful aggregation platform by complementing the Data Warehouse

Source: Internet
Author: User
Keywords Data warehousing Hadoop aggregation platform

Apache Hadoop is the foundation of a new generation of data warehouses. Hadoop is used by companies as a strategic role in their current warehousing architectures, such as extraction/transformation/loading (ETL), data staging, and unstructured content preprocessing. I also see Hadoop as a key technology in a new generation of large-scale parallel data warehouses in the cloud, and Hadoop complements today's warehousing techniques and low latency streaming platforms.

At IBM, we expect Hadoop and data warehousing technologies to integrate more fully and converge into a new platform paradigm in the next few years: Hadoop data warehouses. Hadoop does not showcase the old traditional warehousing architecture; instead, it will complement and extend the data warehouse to support a single version of real data, data governance, and master data management for multiple structured data, at least in the following two formats: structured (such as relationships or tables), semi-structured (including with XML Tagged free text files) and/or unstructured (for example, ASCII and other free-form text formats).

Data warehouses and Hadoop have achieved a spirit of unity in many ways, sharing a common architectural approach, both in the IBM architecture and most in the industry. The main features of this sharing method are: Large-scale parallel processing, database internal analysis, mixed workload management and flexible storage layer.

Hadoop has always been the case, and is clearly likely to be the key to the development of large data methods for users and vendors. The reasons for the strong growth of Hadoop include:

 Large scale, vendor-independent framework for multi-structure information, for advanced analysis
 Can leverage scalable frameworks to build advanced analytics and data management capabilities
 Fast to a new direction
 Rapid commercialization and adoption by enterprises
 Support from the vibrant open source community and industry

However, the evolution of the converged Hadoop Data Warehouse platform is not achievable overnight. These are not enough to lead to the formation of Hadoop, in the current form, any large data legacy methods or new methods will be marginalized. The sacrifice of the memory database, columnar database, or graphic database can not achieve this goal. All of these methods will coexist in the Hadoop Data Warehouse, which will become mainstream in the near future.

In the critical data Warehouse ecosystem deployment role, Hadoop (often including NoSQL technology) will establish its place in the staging, preprocessing, and ETL layers. In this role, Hadoop has three advantages in terms of social, sensor, event, click Flow, RFID, and other new data sources: quantity, speed, and kind. Again, Hadoop will become the preferred "sandbox" platform for data scientists to explore large, complex datasets and develop sophisticated statistical models for leading large data applications.

An exciting discovery about the emerging Hadoop Data Warehouse is that as the necessary governance, security, and management tools emerge, it will apply to applications that require a comprehensive, 360-degree view of truth about structured (transactional) and unstructured (social) customer data, In order to promote the implementation of digital channel strategy positioning, experience optimization and other factors. This feature will become a killer application for a new generation of Hadoop data warehouses, and no single component technology can be supported by optimization alone.

Your Hadoop Data Warehouse will become a powerful aggregation platform. However, unless your vendor can deliver out-of-the-box solution accelerators (including social media analytics and real-time infrastructure monitoring) that apply to a particular large data application that you are currently deploying, Hadoop does not necessarily provide direct business value. In assessing business large data and Hadoop solutions, you should also consider whether they bind key solution accelerator elements, especially sample applications, user-defined and standard development kits, and industry data models in the banking, insurance, telecommunications, healthcare, and retail sectors.

At IBM, we will continue to meet these requirements and other Hadoop data warehouse requirements across a wide range of information management solution areas. Please look forward to further details released in the next few months.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.