Basic: Hadoop Components and ecosystem introduction

Source: Internet
Author: User
Keywords Large data ecosystems

Since 2014, big data has been growing. More and more companies are starting to use large data, including day-to-day transaction management and complex business solutions. Big data has quickly shifted from an exaggerated vocabulary to a viable technology, whether it's large or small.

Big data, in short, is the amount of data that exists around us, such as smart terminals, web apps, social media, chat rooms, mobile apps, communications records, payment histories, and various other ways in which data is involved. Large data technology consolidates, stores, and analyzes a large amount of information, typically at PB levels.

This technology has completely changed the way people look at data and databases, and has changed the way people use data. In military terms, large numbers are used primarily to prevent damage. In the NBA, big data is mainly used to capture and analyze millions of individual movements in a game. The health industry will use large data to treat cancer and heart disease. Car companies will automate the technology and communicate autonomously.

Big data is changing the world. But what kind of software is there after that? What makes big data technology evolve and run?

Here's the answer, Hadoop.

Many people think Hadoop is big data, but it's not. Large data already exists before Hadoop occurs, and even without hadoop large data will continue to exist. However, Hadoop is now an oligopoly in the big data industry. Hadoop is mixed with big data for a reason-many people use it. You can find out what big data companies don't use Hadoop.

What exactly is Hadoop?

It is a "software library" through which users can gain the ability to process large amounts of data, namely "process large volumes of data on a cluster of computers using a simple programming model." In other words, it gives the company the ability to aggregate, store, and analyze large amounts of data.

In addition, it is important to understand large data because it is a "software library". A large library of libraries complements the Hadoop basic framework, giving companies the specific tools they need to get the desired Hadoop results.

Let's take a look at the Hadoop ecosystem. Related resources and more information see the Hadoop Web site.

The Hadoop project includes a variety of modular--hadoop general-purpose libraries, Hadoop Distributed file Systems, Hadoop YARN, and Hadoop MapReduce. The integration of these systems underpins the other Hadoop projects we mentioned below, giving the ability to automate real-time task scheduling and cluster resource management for large data processing.

Other projects offer specific services that improve the entire Hadoop ecosystem.

Apache Hive: "A data Warehouse that provides data aggregation and ad hoc queries." The system is a powerful query tool, and can only get the query results in real time.

Apache Spark:apache Spark is a general-purpose computing engine that enables rapid analysis of large-scale data. Spark is built on HDFs and does not rely on MapReduce, which uses its own data-processing framework. Apache Spark typically uses scenarios, including real-time queries, event flow processing, iterative computing, complex operations, and machine learning.

The Apache Ambari:ambari is designed to help manage Hadoop. It can provide support for many tools in the Hadoop ecosystem, including Hive, HBase, Pig, Sqoop and zookeeper. The tool features a control panel that can track cluster health and help diagnose performance problems.

The Apache Pig:pig is a platform tool that supports a query language that handles large datasets.

Apache Hbase:hbase is a non relational database management system running on HDFS. The purpose of establishing hbase is to handle sparse datasets in large data projects.

Other common Hadoop projects include: Avro, Cassandra, Chukwa, Mahout and zookeeper.

By building Hadoop, users have access to a large number of tools and resources to tailor their large data platforms to suit their business needs.

"TechTarget China original content, all rights reserved, by authorized China Big data release, declined other website reprint." 】

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.