Some big data tools, noun records.

Source: Internet
Author: User

Often see some words appearing together, summarized today. Look again later.

All from Apache offical Docs

1/apache Kafka

What is Kafka?

Kafka is a distributed, partipationed, replicated commit log service. It provides the functionlity of a messaging system, but with a unique design.

Simply, it is a log messaging system. It reminds of RabbitMQ which also a message system.

So, google it differences.

TL;DR; Reference:http://www.quora.com/what-are-the-differences-between-apache-kafka-and-rabbitmq

And, Kafka is dependent on zookeeper.

 

2/apache Zookeeper

What is zookeeper?

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed Synchronizatio N, and providing group services. All of these kinds of services is used in some form or another by distributed applications. Each time they was implemented there is a lot of work that goes into fixing the bugs and race conditions that was Inevitab Le. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them, which MAK E them brittle in the presence of change and difficult to manage. Even when did correctly, different implementations of these services leads to management complexity when the applications is deployed.

What ' s his aim?

ZooKeeper aims at distilling the essence of these different services into a very simple interface to a centralized Coordin ation service. The service itself is distributed and highly reliable. Consensus, group management, and presence protocols would be implemented by the service so, the applications does not nee D to implement them on their own. Application specific uses of these would consist of a mixture of specific components of Zoo Keeper and application specific conventions. ZooKeeper Recipes shows how the This simple service can is used to build much more powerful abstractions.

3/apache Storm

What is Storm?

Apache Storm is a free and open source distributed realtime computation System. Storm makes it easy-reliably process unbounded streams of data, doing for realtime processing what Hadoop do for batch Processing. Storm is simple, can being used with any programming language, and are a lot of fun to use!

Where to use it?

Storm have many use Cases:realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and M Ore. Storm is FAST:A benchmark clocked it in over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data would be processed, and are easy-to-set up and operate.

4/apache Spark

What is Spark?

    Apache spark™ is a fast and general engine for large-scale data processing.

5/apache Hive

What is Hive?

The Apache hive™data warehouse software facilitates querying and managing large datasets residing in distributed storage . Hive provides a mechanism to project structure onto this data and query the data using a Sql-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers When it was inconvenient or inefficient to express this logic in HiveQL.

So, it's a sql-like language. Find It on Ibm:http://www-01.ibm.com/software/data/infosphere/hadoop/hive/their docs is always good.

6/apache Pig

What is pig?

    Apache Pig is a platform for analyzing large data sets this consists of a high-level language for expressing data Analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs was that their structure was amenable to substantial parallelization, which in turns en Ables them to handle very large data sets.

Conclusion:

1. Most of the messaging system based on Producer-consumer pattern.

2.pig and hive is like language, Sql-language.

  

  

Some big data tools, noun records.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.