Often see some words appearing together, summarized today. Look again later.
All from Apache offical Docs
1/apache Kafka
What is Kafka?
Kafka is a distributed, partipationed, replicated commit log service. It provides the functionlity of a messaging system, but with a unique design.
Simply, it is a log messaging system. It reminds of RabbitMQ which also a message system.
So, google it differences.
TL;DR; Reference:http://www.quora.com/what-are-the-differences-between-apache-kafka-and-rabbitmq
And, Kafka is dependent on zookeeper.
2/apache Zookeeper
What is zookeeper?
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed Synchronizatio N, and providing group services. All of these kinds of services is used in some form or another by distributed applications. Each time they was implemented there is a lot of work that goes into fixing the bugs and race conditions that was Inevitab Le. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them, which MAK E them brittle in the presence of change and difficult to manage. Even when did correctly, different implementations of these services leads to management complexity when the applications is deployed.
What ' s his aim?
ZooKeeper aims at distilling the essence of these different services into a very simple interface to a centralized Coordin ation service. The service itself is distributed and highly reliable. Consensus, group management, and presence protocols would be implemented by the service so, the applications does not nee D to implement them on their own. Application specific uses of these would consist of a mixture of specific components of Zoo Keeper and application specific conventions. ZooKeeper Recipes shows how the This simple service can is used to build much more powerful abstractions.
3/apache Storm
What is Storm?
Apache Storm is a free and open source distributed realtime computation System. Storm makes it easy-reliably process unbounded streams of data, doing for realtime processing what Hadoop do for batch Processing. Storm is simple, can being used with any programming language, and are a lot of fun to use!
Where to use it?
Storm have many use Cases:realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and M Ore. Storm is FAST:A benchmark clocked it in over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data would be processed, and are easy-to-set up and operate.
4/apache Spark
What is Spark?
Apache spark™ is a fast and general engine for large-scale data processing.
5/apache Hive
What is Hive?
The Apache hive™data warehouse software facilitates querying and managing large datasets residing in distributed storage . Hive provides a mechanism to project structure onto this data and query the data using a Sql-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers When it was inconvenient or inefficient to express this logic in HiveQL.
So, it's a sql-like language. Find It on Ibm:http://www-01.ibm.com/software/data/infosphere/hadoop/hive/their docs is always good.
6/apache Pig
What is pig?
Apache Pig is a platform for analyzing large data sets this consists of a high-level language for expressing data Analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs was that their structure was amenable to substantial parallelization, which in turns en Ables them to handle very large data sets.
Conclusion:
1. Most of the messaging system based on Producer-consumer pattern.
2.pig and hive is like language, Sql-language.
Some big data tools, noun records.