Hadoop cluster enables data analysis platform

Source: Internet
Author: User
Keywords nbsp can they very analysis platform

Eckerson Wayne, a consultant, says Hadoop provides a platform for easier control of individual data analysis and Spreadmart (report marts) built by business users, while giving them a place to perform self-service analysis.

Spreadmart is the abbreviation of ToolStrip Data mart, in the field of business intelligence, the different spreadsheets that multiple individuals and teams create. Because the data is inconsistent, it brings a lot of trouble to the business.

For decades, all data analysts have used self-service analysis tools to access and manipulate data, identify trends and anomalies, and showcase business intelligence insights. Although the types of tools have changed over the years, the results are almost always the same: Spreadmart or data shadow systems are built on unique rules, metrics, and definitions.

Most large enterprises have tens of thousands of spreadmart, each of which is designed to deal with significant or localized problems at some point in time. Although valuable to individual business units, Spreadmart the CEO and CFO. They ask a simple question, such as, "How many customers do we have?" "They got conflicting answers from Spreadmart because the data were inconsistent with data analysts and business unit heads. The Spreadmart phenomenon has enabled thousands of IT managers and business executives to establish data warehouse rules to restore data consistency and business order.

This does not prevent people from using the data in a variety of spreadmart tools, from Microsoft Excel and access to service BI software, and at the high-end level, use SAS and SPSS software for statistical analysis and data mining. But there is a new technology that can help companies improve Spreadmart side effects: Hadoop clusters.

The Open-source software is free, and the hardware needed to run it is cheap, and analysts don't have to understand SQL or data modeling technology to use it. They can dump data to Hadoop and then access, process, and analyze data using high-level languages such as hive or pig, or with a compatible BI and data integration tool on Hadoop. While there are many reasons to implement Hadoop, one of the main reasons is to nurture data analysis of self service without it intervention, and Hadoop is rapidly becoming the preferred Spreadmart platform for mature analysts and department heads.

Free Management in Hadoop

Until now, there has been a way to implement data management in a Hadoop environment with minimal communication. Data quality, data consistency, appropriate size, and metadata management these terms have not yet entered the dictionary of Hadoop. Because Hadoop is still new, most companies are still assessing their ability to support production facilities. This is also because its main users, business analysts, have never been overly focused on enterprise data governance and consistency, and they can assess and analyze trends without the need for high-quality data.

So, if Hadoop is a free system for all self-service, analysts and business users can implement dumps and access data without laborious management, what is the guarantee that the hyped Hadoop data pool will not turn into a ripple, in other words, Will hadoop further increase the number of spreadmart in the future or contribute to the consolidation of Spreadmart?

The answer to the question is: both will.

Companies can indeed use Hadoop as a low-cost repository for all their data, that is, data pools. As a result, the Hadoop system provides a one-stop service for every analyst and business unit in the enterprise, rather than searching for data in multiple applications and systems, and analysts can get everything they need by digging into the data pool. This makes it easier to create spreadmart.

However, this is not a way to add a lot of free-control spreadmart on a variety of PCs and file servers, but Hadoop provides a possibility to enhance the ability to analyze data in a single location: a huge analytical sandbox that offers a larger economy and considerable cost savings. It enables it and business managers to actually see what analysts are doing. One way to consider Spreadmart is to consider Spreadmart as an instantiation of business requirements. Hidden Spreadmart makes it difficult for IT managers to discern what is important to the business, and it can be difficult to find data from the Data warehouse to meet the requirements of enterprise reporting. By concentrating data analysis in the data lake, Hadoop makes these issues easy for it and business partners, and also proactively meets their needs.

Data Analysis Nova

However, Hadoop is not just a container for keeping spreadmart collections. He is an extensible, flexible data-processing platform that can meet the needs of most enterprises in the analysis of information. It's like the Swiss Army Knife in data processing: It's a general-purpose tool that can do almost anything, though not the best (at least not yet).

Hadoop can store all of the data for the enterprise, not just a subset, just like a data warehouse. With yarn resource management, part of the Hadoop 2, launched last fall, it has been able to support a variety of data and analysis processing applications, from real-time SQL query systems, graphics to memory calculations and streaming analysis engines. While 2 of Hadoop takes time to mature, the future is clear: businesses can store their data in a Hadoop cluster and process it there.

This is revolutionary. Savvy it and data Warehouse managers will soon be aware of the impact. With the advent of the Hadoop 2 system, the Future Analysis architecture will revolve around Hadoop rather than the previous relational database. Further, existing analysis systems will become specialized databases and eventually disappear, as Hadoop matures and merges their functionality.

At least, that's the vision. A great deal of development and experimentation is needed before most businesses change their current analytical ecosystems into HADOOP2 data pools. The existing analysis system has a long life: even if their value has been completely devalued, the embedded nature and the inertia of the enterprise make it difficult for companies to abandon them. Hadoop may never fulfill its promise, or another technology will replace its analytical status in the future.

But in the world of Hadoop, such things happen at all times. Today, Hadoop is rapidly becoming the de facto enterprise data repository, taking precedence over the Spreadmart platform (or analysis sandbox). Soon it may be the main platform for building analytical applications and most analytical ecosystems.

"TechTarget China original content, all rights reserved, authorized China Big Data release"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.