Big data and Hadoop are moving in a step-by-step way to bring changes to the enterprise's data management architecture. This is a gold rush, featuring franchisees, enterprise-class software vendors and cloud service vendors, each of whom wants to build a new empire on the Virgin land. Although the Open-source Apache Hadoop project itself already contains a variety of core modules-such as Hadoop Common, Hadoop Distributed File Systems (HDFS), Hadoop yarn, and Hadoop mapreduce-- However, due to the lack of technical support from commercial vendors and packaging solutions, it is not directly in the market as a customer's optional products. How are the current top commercial distributions compatible with Apache Hadoop, and how do they stay independent of each other? Below we will discuss with Forrester how the nine commercial Hadoop distributions are going out of their way.
Amazon Web Services Elastic mapreduce the largest market share
When you talk about Hadoop, Amazon may not be the first program provider to appear in your head, but AWS's elastic MapReduce is indeed one of the first commercially-Hadoop products to market. He also has a leading position in global market share, said Forrester's chief analyst, Mike Gualtieri. EMR is a set of Hadoop operating in a cloud environment that leverages Amazon EC2 as a computing resource, Amazon S3 as a storage resource, and accommodates a number of other services.
"The solution roadmap for AWS includes integrating Amazon EMR with Amazon kinesis to achieve process processing, further enhancing its integration with Amazon redshift data warehouses and other data sources, and automating cluster sizing with policy guidance;" Support additional NoSQL databases on the basis of Hadoop, versus more business intelligence solutions from Third-party vendors, "Gualtieri wrote.
Cloudera focus on Hadoop innovation based on enterprise customer needs
AWS may be ahead in terms of market share, but Cloudera, the franchise firm, is also close behind; the company currently has more than 200 customers, some of which have more than 1000 nodes deployed, and the amount of data is at a PB level.
"Corporate customers want to have a set of Hadoop management and monitoring tools, for which Cloudera created Cloudera Manager," Gualtieri wrote. "Corporate customers want to get a faster set of Hadoop SQL engines, and this is the same architecture that Cloudera uses for large-scale parallel processing (MPP) architectures to create impala--enterprise-class data warehouses. Cloudera's innovative approach lies in continuing to uphold the core of the Hadoop project while differentiating itself from other suppliers through rapid innovation and proactive customer demand. "The Cloudera profit model is mainly from software ordering, but they also provide technical support services."
Hortonworks drives Open Source Hadoop innovation
Throughout the major participating vendors, Hortonworks, the franchise Hadoop business, with its own Hortonworks data platform (HDP) and the Apache open source is the most fit, but it is also actively seeking with other engineering technology partners in the deep collaboration, These include Microsoft, Teradata, SAP, and Red Hat, among others.
"Hortonworks's strategy is to drive innovation through the open source community and build ecosystems with partners to accelerate the popularity of Hadoop among business customers," Gualtieri wrote. "If the open source community is not growing at the right pace in some ways, Hortonworks will build new projects and use their own resources to help them get a strong momentum forward." ”
In this context, the Apache Ambari Project, designed to provide the Hadoop cluster Management console, is a typical example.
IBM infosphere biginsights, the blue Giant Support enterprise development project
IBM is not as proud of the depth of the Hadoop community as some of its rivals, but its remarkable achievements in distributed computing and data management have helped it come up with a fairly comprehensive set of Hadoop solutions. IBM has now completed more than 100 Hadoop deployments, some of which take up petabytes of data.
"In addition, IBM also has a number of advanced analysis tools, global market share and service implementation program, which enables it to attract a large number of corporate customers through a comprehensive set of large data solutions," Gualtieri wrote. "The IBM roadmap includes the ongoing integration of Biginsights Hadoop solutions with related IBM assets, such as SPSS advanced Analytics, High-performance computing workload management, business intelligence tools, and data management and modeling tools." ”
MAPR Technologies provides support for NFS and other innovative results
MAPR Technologies ranked third in this list of franchises, with market share ranking Cloudera and Hortonworks. As early as the beginning of the phase, MAPR is not as conservative as other vendors to the concept of Hadoop certification, but during this period focused on the implementation of enterprise-level features.
"MAPR Technologies has a number of unique innovations for its Hadoop distribution, including support for Network file Systems (NFS), running binaries in the cluster, performance hardening for HBase, high-availability and disaster recovery, and so on," Gualitieri wrote. Gualtieri also pointed out that the current MAPR competitors have started to actively create similar enterprise-class functions, so mapr must be in the marketing aspects of action and establish their own partnership and distribution channels.
Pivotal Software fully develop its Greenplum engine potential
As an emerging enterprise on the shoulders of EMC and VMware Giants, Pivotal was steered by former VMware CEO Paul Maritz and supported by EMC's powerful technical consulting and data science team. In addition to the Greenplum database technology from EMC's column databases, Pivotal's Hadoop release also implements an MPP-like SQL performance performance with the MPP Hadoop SQL engine named HAWQ.
"Pivotal is the first enterprise Data Warehouse provider to provide fully functional enterprise-class Hadoop equipment, and is the first vendor to integrate its own Hadoop, enterprise Data Warehouse and data management into the same rack and launch the device family," Gualtieri wrote. "The pivotal roadmap will make its Hadoop solution lead in terms of competitive advantage; its innovation focus is on improving the HAWQ SQL engine and further integrating it with other pivotal products." ”
Teradata uses a wealth of expertise to build Hadoop devices
Teradata is a very professional enterprise Data Warehouse equipment suppliers, the company on this basis with Hortonworks to establish a solid technical cooperation, the Hadoop to the market in the form of equipment.
"Teradata's Hadoop release includes an integration mechanism with Teradata management tools and sql-h, and the use of federated SQL engines to help customers query data from their data warehouses and Hadoop," Gualtieri wrote. The scheme also uses Aster to analyze Hadoop. ”
Teradata's Hadoop device currently has less than 100 customers, but Gualtieri points out that its strong financial strength and rich technical and management resources are enough to create a unique set of high-performance equipment that is hard for other suppliers to confront head-on.
Intel provides hardware based performance and security enhancements for Hadoop
Intel's involvement in the Hadoop release is relatively late, but it does not prevent it from becoming a strong competitor with its Xeon chip performance.
"Intel is the first vendor to deliver the performance and security hardening mechanism to Hadoop on a hardware basis," Gualtieri wrote. "Intel's roadmap for the next few years will further establish close partnerships with other players in the Hadoop solutions market." In addition, Intel will continue to focus on the use of hardware hardening performance and security performance, local task optimization, lustre and graphical analysis, all of which will drive its distribution to win wide attention and appreciation. ”
Microsoft Windows Azure hdinsight, thriving under cloud and windows
As part of the Hortonworks Engineering Technical cooperation project, Microsoft Windows Azure Hdinsight Service is designed to focus on Windows Azure cloud. Hdinsight and Hadoop for Windows (a branch version of the Hortonworks data platform) are currently the only one by one sets of Hadoop distributions that run in Windows environments.
"Microsoft also provides polybase to help SQL Server customers query data stored in Hadoop," Gualtieri wrote. "Microsoft has also made a positive contribution to other Open-source community Hadoop projects, including the next generation of hive," he said. Microsoft has brought significant improvements to its customers in databases, data warehousing, Cloud, OLAP, business intelligence, spreadsheets (PowerPivot), Reid, and development tools through a series of Hadoop stack initiatives. ”