Commercial distribution is mainly to provide more professional technical support, which is more important for large enterprises, different distributions have their own characteristics, this article on the release of a simple comparison of the introduction. Comparison options: Dkhadoop release, Cloudera release, Hortonworks release, MapR release, Huawei Hadoop releaseHadoop is a software framework that enables distributed processing of large amounts of
engines than leading commercial data warehousing applications For open source projects, the best health metric is the size of its active developer community. As shown in Figure 3 below,Hive and Presto have the largest contributor base . (Spark SQL data is not there) In 2016, Cloudera, Hortonworks, Kognitio and Teradata were caught up in the benchmark battle that Tony Baer summed up, and it was shocking that the vendor-favored SQL engine defeated o
there)Source: Open Hub https://www.openhub.net/In 2016, Cloudera, Hortonworks, Kognitio and Teradata were caught up in the benchmark battle that Tony Baer summed up, and it was shocking that the vendor-favored SQL engine defeated other options in every study, This poses a question: does benchmarking make sense?Atscale two times a year benchmark testing is not unfounded. As a bi startup, Atscale sells software that connects the BI front-end and SQL ba
increased by nearly six times.
The usage of big data has gradually increased
Several major big data analytics vendors entering the magic quadrant this time are Cloudera, Hortonworks, and MapR, and existing customers of these big data architecture manufacturers promise to increase their investment.
For example, Gartner mentioned that most existing Cloudera customers intend to purchase more licenses, products, and functions in the next 12 months.
Incremental index update into the new standard of text retrieval, spanner and F1 showed us the possibility of cross-datacenter database. In Google's second wave of technology, based on hive and Dremel, emerging big data companies Cloudera open source Big Data query Analysis engine Impala,hortonworks Open source Stinger,fackbook open source Presto. Similar to the PREGEL,UC Berkeley Amplab Lab, the Spark Graph Computing framework has been developed, an
most companiesCharged or notAs an important indicator.
Currently,Free of chargeHadoop has three major versions (both foreign vendors:Apache(The original version, all releases are improved based on this version ),Cloudera(Cloudera's distribution including Apache hadoop ("CDH" for short "),Hortonworks version(Hortonworks data platform, referred to as "HDP ").2.2 Introduction to the Apache hadoop release vers
In today's enterprises, 80% of the data is unstructured data, which increases by 60% every year. Big Data will challenge enterprises' Storage Architecture and Data center infrastructure. It will also trigger a chain reaction to applications such as data warehouse, data mining, business intelligence, and cloud computing. In the future, enterprises will use more TB-level (1 TB = 1024 GB) data sets for business intelligence and business analysis. By 2020, global data usage is expected to surge by 4
Reference Document: Http://cloudera.github.io/hue/docs-3.7.0/index.html
System: CentOS release 6.5 (Final)
Install git (if you download hue-3.7.1.tgz directly, you don't need this step)
Yum Install-y Git-core
Here, I'm based on the basic environment and its configuration, as follows:
CentOS Release 6.5 (Final)
Java Version "1.7.0_75"
apache-maven-3.2.5
git version 1.7.1
Python 2.6.6
From http://gethue.com/hue-3-7-with-sentry-app-and-new-search-widgets-are-out/#下载 "3.7.1 tarball" version
I. Introduction to the Hadoop releaseThere are many Hadoop distributions available, with Intel distributions, Huawei Distributions, Cloudera Distributions (CDH), hortonworks versions, and so on, all of which are based on Apache Hadoop, and there are so many versions is due to Apache Hadoop's Open source agreement: Anyone can modify it and publish/sell it as an open source or commercial product.Currently, there are three main versions of Hadoop that ar
designed to efficiently transfer bulk data for data transfer between Apache Hadoop and structured data repositories such as relational databases.
Flume: A distributed, reliable, and usable service for efficiently collecting, summarizing, and moving large volumes of log data.
ZooKeeper: A centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing packet services.
Cloudera: The most-formed version of Hadoop, with
From http://projects.apache.org/indexes/quick.html[Now, Future ], 2015-02-06 update.
Apache Accumulo
The Apache accumulo sorted, distributed Key/value Store is based on Google ' s BigTable design. It is built on top of Apache Hadoop, Zookeeper, and Thrift. It features a few novel improvements on the BigTable design in the form of Cell-level access labels and a server-side prog Ramming mechanism that can modify key/value pairs at various points in the data management process.Ca
The main introduction to the Hadoop family of products, commonly used projects include Hadoop, Hive, Pig, HBase, Sqoop, Mahout, Zookeeper, Avro, Ambari, Chukwa, new additions include, YARN, Hcatalog, O Ozie, Cassandra, Hama, Whirr, Flume, Bigtop, Crunch, hue, etc.Since 2011, China has entered the era of big data surging, and the family software, represented by Hadoop, occupies a vast expanse of data processing. Open source industry and vendors, all da
Original address: http://blog.fens.me/hadoop-family-roadmap/Sep 6,Tags:hadoophadoop familyroadmapcomments:CommentsHadoop Family Learning RoadmapThe Hadoop family of articles, mainly about the Hadoop family of products, commonly used projects include Hadoop, Hive, Pig, HBase, Sqoop, Mahout, Zookeeper, Avro, Ambari, Chukwa, and new additions to the project including, YARN, Hcatalog, Oozie, Cassandra, Hama, Whirr, Flume, Bigtop, Crunch, hue, etc.Since 20
tasks. The pig language layer currently contains a native language, Pig Latin, which was originally designed to be easy to program and ensure scalability.
7. chukwa
Apache chukwa is an open-source data collection system that monitors large distributed systems. Built on the HDFS and MAP/reduce frameworks, it inherits the scalability and Stability of hadoop. Chukwa also contains a flexible and powerful toolkit for displaying, monitoring, and analyzing results to ensure optimal data use.
8.
Read Catalogue
Cause
Virtual machines
Linux
System Installation
Series Index
This article is copyright Mephisto and Blog Park is shared, welcome reprint, but must retain this paragraph statement, and give the original link, thank you for your cooperation.The article is written by elder brother (Mephisto), SourcelinkCause
We have a rudimentary understanding of Hadoop, with Namenode,datanode,namenode and DataNode that can be on a machine, but that doesn't wor
hive installation, and other files in the list require copy from note: If the file version that comes with the hive installation is inconsistent with hbase/lib, you should delete the file under Hive/lib and copy from Hbase/lib.) guava-14.0.1.jarzookeeper- 3.4.6.2.4.2.0-258 .jarhtrace-core- 3.1.0-incubating.jarhbase-common- 1.1.2.2.4.2.0-258. Jarhbase-common-1.1.2.2.4.2.0-258-tests.jarhbase-client-1.1.2.2.4.2.0-258.jarhbase-server-1.1.2.2.4.2.0-258.jarhbase-prot ocol-1.1.2.2.4.2.0-258 .jar
currently contains a native language--pig Latin, which was originally developed to be easy to program and ensure scalability. 7. Chukwa Apache Chukwa is an open source data collection system for monitoring large distribution systems. Built on the HDFs and map/reduce frameworks, it inherits the scalability and stability of Hadoop. The Chukwa also includes a flexible and powerful toolkit for displaying, monitoring, and analyzing results to ensure optimal use of data. 8.
-normalization and materialized views, and powerful built-in caches, the Cassandra Data model provides a convenient two-level index (column Indexe).
Chukwa:
Apache Chukwa is an open source data collection system for monitoring large distribution systems. Built on the HDFs and map/reduce frameworks, it inherits the scalability and stability of Hadoop. The Chukwa also includes a flexible and powerful toolkit for displaying, monitoring, and analyzing results to ensure optimal use of data.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.