一些大資料工具，名詞的記錄

最後更新：2015-08-31 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：

經常看到一些詞一起出現，今天總結下。日後再看

All from Apache Offical Docs

1/apache kafka

what is kafka?

　　　　kafka is a distributed, partipationed, replicated commit log service,. It provides the functionlity of a messaging system, but with a unique design.

　Simply, it is a log messaging system. It reminds of RabbitMQ which also a message system.

　　So, google its differences.

　　TL;DR; Reference: http://www.quora.com/What-are-the-differences-between-Apache-Kafka-and-RabbitMQ

　　And, kafka is dependent on zookeeper.

2/apache zookeeper

　　what is zookeeper?

　　　　ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them ,which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.

　　what‘s his aim?

　　　　ZooKeeper aims at distilling the essence of these different services into a very simple interface to a centralized coordination service. The service itself is distributed and highly reliable. Consensus, group management, and presence protocols will be implemented by the service so that the applications do not need to implement them on their own. Application specific uses of these will consist of a mixture of specific components of Zoo Keeper and application specific conventions. ZooKeeper Recipes shows how this simple service can be used to build much more powerful abstractions.

3/apache storm

　　what is storm?

　　　　Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!

　　where to use it?

　　　　Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.

4/apache spark

　　what is spark?

　　　　Apache Spark™ is a fast and general engine for large-scale data processing.

5/apache hive

　　what is hive?

　　　　The Apache Hive ™ data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.

　　　　So, it is a sql-like language. Find it on IBM: http://www-01.ibm.com/software/data/infosphere/hadoop/hive/ Their docs are always good.

6/apache pig

　　what is pig?

　　　　Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.　　　　

Conclusion:

　　1. most of messaging system based on producer-consumer pattern.　

　　2.pig and hive are like language, sql-language.

一些大資料工具，名詞的記錄

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

一些大資料工具，名詞的記錄

聯繫我們

熱門內容

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

一些大資料工具，名詞的記錄

聯繫我們

熱門內容

熱門主題

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support