Talking about the Hadoop ecosystem

Last Update:2015-03-20 Source: Internet

Author: User

Keywords Data Warehouse Hadoop mapreduce zookeeper

Tags .mall apps big data broke broke out business business processes calls

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Big data broke out in 2014, and more and more companies are discovering the use of large data, not only to manage daily business processes, but also to solve complex business problems. Big data quickly jumped into hot words and made themselves a reliable technology that could solve the problems of large and small business entities.

Large data, as the name suggests, is the huge amount of data that exists around us, which can be generated in a range of uses such as smart devices, the Internet, social media, chat rooms, mobile apps, phone calls, and commodity purchases. Large data technology is the information used to collect, store, and analyze these levels of measure (usually to take bytes).

Large data technologies radically change the way people look at data and database storage, and subvert how data is used. In military, large data can be used to prevent foreign invaders from invading. In the NBA, large data technologies can capture and analyze thousands of individual movements. Large data technology is used in medicine to fight cancer and heart disease. Car companies through large data technology to achieve car driving and exchange letters.

Big data is changing the world. So what is behind all this software system in support? How is large data technology rapidly popular and keeps up?

The answer is Hadoop.

Many people think Hadoop is big data. Not really. Large data is generated before Hadoop and can continue without hadoop. But at the moment Hadoop is a powerful partner in big data, and they have a close relationship. Because of this, many people use Hadoop, and now you can hardly find a big data company that doesn't use Hadoop software. So what is Hadoop?

Hadoop is a "software library" that allows users to manipulate computer clusters to handle large datasets with a simple programming model. In other words, it gives the enterprise the ability to collect, store, and analyze a large number of datasets.

In addition, one important aspect of understanding Hadoop is that it is a software library. Hadoop contains a large number of libraries, which complement the underlying Hadoop framework, giving organizations the right tools to get the desired Hadoop results.

Next, let's take a look at the ecosystem of Hadoop. For more information, see the Hadoop website.

The Hadoop project includes many components--hadoop Common,hadoop distributed File system,hadoop yarn and Hadoop MapReduce. These component systems work together to provide users with support for additional Hadoop engineering tools, allowing users the ability to process large datasets in real time, where Hadoop automatically schedules tasks and manages cluster resources.

Some Hadoop components are listed below, and different components provide specific services.

Apachehive: Data Warehouse infrastructure that provides data aggregation and specific queries. This system enables the user to make valid queries and returns the results in real time.

Apachespark:apache Spark is a computing engine that provides fast data analysis on large datasets. It was built on top of the HDFs, but bypassed the mapreduce using its own data-processing framework. Spark is often used in real-time querying, streaming, iterative algorithms, complex operations and machine learning.

Apacheambari:ambari is used to help manage Hadoop. It provides support for many of the tools in the Hadoop ecosystem, including Hive, HBase, Pig, Spooq, and zookeeper. This tool provides a cluster management dashboard that tracks cluster running status and helps diagnose performance issues.

Apachepig:pig is a platform for integrated advanced query languages that can be used to handle large datasets.

Apachehbase:hbase is a non relational database management system, running on top of HDFs. It is used to handle sparse datasets in large data engineering.

Other common Hadoop projects include Avro, Cassandra, Chukwa, Mahout and zookeeper.

With Hadoop, users can use many tools and resources to adapt to different business needs with real big data technologies.

Free Subscription "CSDN cloud Computing (left) and csdn large data (right)" micro-letter public number, real-time grasp of first-hand cloud news, to understand the latest big data progress!
CSDN publishes related cloud computing information such as virtualization, Docker, OpenStack, Cloudstack, data center, sharing Hadoop, Spark, Nosql/newsql, HBase, Impala, Large data viewpoints, such as memory calculation, stream computing, machine learning and intelligent algorithms, provide services such as cloud computing and large data technology, platform, practice and industry information. &NBSP;&NBSP

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More