1, Apache Hadoop2.0 version, has the following modules: Hadoop universal module that supports a common set of tools for other Hadoop modules;
Hadoop Distributed File System (HDFS), a distributed file system that supports high-throughput access to application data;
Hadoop YARN, a framework for job scheduling and cluster resource management;
Hadoop MapReduce, a YARN-based big data parallel processing system.
In addition to the community version, Hadoop currently has distributions from numerous vendors.
2, Cloudera: The most popular release version, with the most deployment cases; provides powerful deployment, management and monitoring tools. Developed and contributed to the Impala project that can process big data in real time.
3, Hortonworks: 100% open source Apache Hadoop unique provider. Hortonworks is the first provider to use the metadata service features of Apache HCatalog. Moreover, their Stinger greatly optimized the Hive project. Hortonworks offers people a very nice, easy to use sandbox. Hortonworks has developed a number of enhancements and submitted them to the core backbone, which enables Apache Hadoop to run locally on Microsoft Windows platforms including Windows Servers and Windows Azure.
4. MapR: Compared to competitors, it uses a number of different concepts, especially to support local UNIX file systems rather than HDFS (using non-open source builds) for better performance and ease of use. We can use local UNIX commands instead of Hadoop commands. In addition, MapR distinguishes it from other competitors with high-availability features such as snapshots, mirroring, or stateful failover. The company also leads the Apache Drill project, a re-implementation of Google's Dremel open source project to perform SQL-like queries on Hadoop data to provide real-time processing.
5. Amazon Elastic Map Reduce (EMR): The difference with other providers is that this is a hosted solution that runs on a network of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3) Above the scale of the infrastructure. In addition to the Amazon distribution, MapR can also be used on EMR, which is the primary use case. If you need one-time or unusual big data processing, EMR can save you a lot of money. However, this also has disadvantages. It only includes the Pig and Hive projects in the Hadoop ecosystem and does not include many other projects by default. Also, the EMR is highly optimized to work with the data in S3, which has a higher latency and will not be located on the data on your compute nodes. So file IO on EMR is much slower and has more latency than your own Hadoop cluster or your private EC2 cluster.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.