If you have a lot of data in your hands, then all you have to do is choose an ideal version of the Hadoop release.
The old rarity, once a service for Internet empires such as Google and Yahoo, has built up a reputation for popularity and popularity and has begun to evolve into an ordinary corporate environment. There are two reasons for this: one, the larger the size of the data that businesses need to manage, and Hadoop is the perfect platform for this task-especially in the context of the mixed mix of traditional and new unstructured data; Many vendors have joined the development of Hadoop-related support and services projects, and their efforts have made such programs more populist and easy to accept by ordinary businesses.
According to most enterprises, only 12% of the total amount of data available can be analyzed, and 88% of the eight are just occupied storage space and cannot realize value.
-According to Forrester's 2013 fourth quarterly software survey report.
"Hadoop has become an unstoppable source of Open-source power, and its roots spread extensively and deeply into various enterprise data management architectures," Forrester analyst Mike Gualtieri and Noel Yuhanna wrote in a recent wave report on the Hadoop market. "Forrester believes that Hadoop has become a necessary data platform for large enterprises and plays a key role in any future data management platform." If you have a lot of structured, unstructured, and/or binary data, Hadoop will be the perfect starting point for you to start data management. ”
So what do we need to do first? Forrester points out that there are a lot of options available at the start stage, and there are nine Hadoop service providers in the market that deserve careful consideration and comparison. Their solutions have different strengths and weaknesses, Forrester demonstrated that the current has not yet come out a clear position of the market ruler, the major technology giants and a large number of relatively young manufacturers have come up with their own commendable service program.
Let's start with a little background: Hadoop belongs to the open source Apache project, and any user can download its core components for free-including Hadoop Common, Hadoop Distributed File Systems (HDFS), Hadoop yarn, and Hadoop MapReduce. Companies such as IBM, Amazon Web Services, Microsoft, and Teradata are all trying to get Hadoop to be packaged and released into a more easy-to-use distribution or service project. Each business has a slightly different strategic orientation, but the core difference is that Hadoop has the ability to distribute workloads to thousands of server devices, which is the key to transforming large data into manageable data.
Note: This list is based on the vendors listed in the Forrester Wave Report and does not cover all Hadoop and large data management platforms. The sequence is sorted by first letter.
Amazon Web Services
For customers looking for a public cloud solution hosted by the Hadoop platform, Forrester's "King of Cloud" is your ultimate goal-and, yes, with Amazon Web Services, there is no need for such customers to focus on other products. Amazon's Hadoop product is known as Elastic Map Reduce (EMR). According to AWS, the product is a large data management service built using Hadoop. But it's not a pure open-source Hadoop, but it's tweaked and modified to run specifically in the cloud environment of AWS.
According to Forrester, EMR is currently the most popular Hadoop platform in the market. It already has a broad and large group of partners, and strives to provide various services based on EMR, including query, modeling, integration, and management. AWS is also not lax in innovation; According to Forrester's survey, it wants to give EMR the ability to scale automatically according to workload requirements. The company plans to progressively provide more solid support for EMR through other products and services, including the redshift Data Warehouse and the recently released Kenesis real-time processing engine. In addition, AWS is interested in providing support mechanisms for other NOSQL databases and business intelligence tools. In terms of weakness, the biggest amount of AWS is not allowing users to run the Hadoop release in an internal environment, but don't worry, the next two companies are best at this kind of demand.
Cloudera
Cloudera has created an open source Hadoop release that borrows a lot of the results of the Apache project, but has also made significant improvements. Cloudera has developed a significant amount of functionality for its products, including a management and monitoring tool called the Cloudera Manager, plus a SQL engine Impala for running traditional data on Hadoop. Cloudera uses open source Hadoop as the foundation of its distribution, but the end result is not a purely open-source product. If Cloudera customers need features that some open source Hadoop does not have, they can choose to develop their own or find a partner with such solutions. "Cloudera's innovations are tightly centered around the Hadoop-centred technology system, but they also carry out aggressive innovation to quickly meet customer needs and build differentiated solutions that differentiate from other vendors," Forrester explains. The result is that the popularity of the Cloudera platform has been steadily increasing, and currently has more than 200 paid users, some of which are managed by more than 1000 nodes, the total amount of data breakthrough 1pb,forrester concluded.
Hortonworks
Like Cloudera, Hortonworks is also a technology enterprise specializing in the development of Hadoop products. Unlike Cloudera, Hortonworks's enthusiasm for developing open source Hadoop code is almost the highest among all vendors. The goal of Hortonworks is to make improvements to open source code while building the Hadoop ecosystem and user base. Its platform solution always insists on maximizing the consistency of open source code. Hortonworks Management says such a route is good for the user because it avoids the vendor's locking effect on the user (if a Hortonworks customer wants or needs to switch to another platform, they can easily migrate existing applications onto the open source platform). This is not to say, of course, that the Hortonworks company has given up on innovative attempts based on open source code. Instead, it has invested all its research and development efforts in the open source community of the platform. Hortonworks, for example, has developed a tool called Ambari, designed to fill the gap in the cluster management area of the project. Hortonworks's product program helps it build a solid partner base within the vendor community, including Teradata, Microsoft, Red Hat and SAP.
IBM
When enterprise users think of large IT projects, the first thing that comes to mind is probably IBM's name. Because of this, IBM has not unexpectedly become one of the major players in the Hadoop project field. Forrester points out that IBM's Hadoop program is now deployed to the business environment of more than 100 customers, many of which have reached PB levels of data processing. The company uses its rich grid computing, global Data center, and enterprise-level program implementation experience to create its own large data projects. "The IBM roadmap includes further integration of the Biginsights Hadoop solution with IBM-related IT assets, such as SPSS advanced Analytics, High-performance computing workload management, business intelligence tools, and data management and modeling tools," Forrester explains.
Intel
Like Amazon Web Services, Intel has optimized its own version of Hadoop and made it run on its own hardware architecture, especially the Xeon Chip. For customers who want to maximize the potential of the Hadoop system and find the best fit between software and hardware, Intel's Hadoop release is probably the ideal choice. Forrester noted that since Intel had only recently launched such products, the company hoped to make more innovative efforts on the basis of similar existing solutions in the market. Intel and Microsoft have both gained a "strong performer" rating in the Hadoop market, while the other seven older vendors are known as "market leaders".
MAPR Technology Co., Ltd.
Maybe a lot of friends have never heard of such a company, but MAPR technology is almost as good as the current manufacturer of Hadoop releases. In a survey by the Forrester Wave Report for Hadoop users, MAPR has the highest rating in existing products, with the same number in the release architecture and data processing capabilities of two individual ratings. The company's secret weapon is that MAPR has added many only, no semicolon-specific features to its version of Hadoop. For example, the MAPR distribution supports the Network File System (NFS, for short); MAPR also provides disaster recovery and high-availability features in the release. Forrester points out that the biggest problem with MAPR is simply not being able to build brand awareness in the marketplace similar to Cloudera and Hortonworks. MAPR is fully capable of becoming a mainstream Hadoop provider, as long as it strengthens partnerships and promotes marketing efforts, according to the consulting firm's recommendations.
Microsoft
Historically, Microsoft is certainly not a company that is willing to embrace Open-source software, but in the face of the surging tide of big data, Microsoft has taken the leap, not only to allow Hadoop to operate in Windows, but also to contribute to the wider development of the Hadoop ecosystem through code contributions. This toil also bear fruitful fruit, Microsoft Public Cloud Windows Azure among the Hdinsight products is the best proof. This is a service product based on the Hortonworks release platform, but specifically designed for Azure.
Microsoft has also launched a number of exciting projects, including the polybase features that are ready to be used, designed to allow users to search through Hadoop queries for data in SQL Server. "Microsoft has launched outstanding solutions in databases, data warehousing, Cloud, OLAP, business intelligence, spreadsheets (POWERPIVOT), collaboration, and development tools markets, which enable it to bring a more powerful and fast-growing Hadoop product stack for Microsoft customers," Forrester says. Similar to Intel, Microsoft is listed as a "strong performer" rather than a "market leader".
Pivotal Software
Last year, EMC and VMware assembled the asset elite and jointly created the pivotal company, a young enterprise based largely on spin-off and restructuring. One of the main directions for pivotal is to develop the Hadoop release-the other direction is the cloud Foundry PaaS. In order to come up with their own outstanding products, pivotal a series of tools on the basis of open source code, especially the SQL engine named HAWQ and the Hadoop application for running this large data platform. Forrester says the leading advantage of the Pivotal Hadoop platform is that it can be integrated with the vast product lines of Pivotal, EMC, and VMware. Strong technical support from EMC and VMware has also become the main backing for the pivotal market. For now, however, the company has less than 100 customer groups, most of which are small and medium-sized enterprises, Forrester notes.
Teradata
For companies like Teradata, Hadoop can be both a deadly threat and a valuable opportunity. The company used to specialize in data management, especially for SQL and relational databases. Therefore, the rapid development of NoSQL platform such as Hadoop should naturally become the biggest headache for the company. On the contrary, however, Teradata has taken a positive attitude towards Hadoop. By building partnerships with Hortonworks, Teradata now allows customers to integrate the Hadoop platform with its SQL products. Its plug-ins and off-the-shelf Hadoop platform also help existing Teradata customers to use data stored in Teradata data warehouses in a seamless manner.