一些大資料工具,名詞的記錄

來源:互聯網
上載者:User

標籤:

經常看到一些詞一起出現,今天總結下。 日後再看

All from Apache Offical Docs

 

1/apache kafka

    what is kafka?

    kafka is a distributed, partipationed, replicated commit log service,. It provides the functionlity of a messaging system, but with a unique design.

 Simply, it is a log messaging system. It reminds of RabbitMQ which also a message system.

  So, google its differences. 

  TL;DR; Reference: http://www.quora.com/What-are-the-differences-between-Apache-Kafka-and-RabbitMQ

  And, kafka is dependent on zookeeper.

 

2/apache zookeeper

  what is zookeeper?

    ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them ,which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.

  what‘s his aim?

    ZooKeeper aims at distilling the essence of these different services into a very simple interface to a centralized coordination service. The service itself is distributed and highly reliable. Consensus, group management, and presence protocols will be implemented by the service so that the applications do not need to implement them on their own. Application specific uses of these will consist of a mixture of specific components of Zoo Keeper and application specific conventions. ZooKeeper Recipes shows how this simple service can be used to build much more powerful abstractions.

 

3/apache storm

  what is storm?

    Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!

  where to use it?

    Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.

4/apache spark

  what is spark?

    Apache Spark™ is a fast and general engine for large-scale data processing.

5/apache hive

  what is hive?

    The Apache Hive ™ data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.

    So, it is a sql-like language. Find it on IBM: http://www-01.ibm.com/software/data/infosphere/hadoop/hive/ Their docs are always good.

6/apache pig

  what is pig?

    Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.    

 

 

Conclusion:

  1. most of messaging system based on producer-consumer pattern.  

     

  2.pig and hive are like language, sql-language.

  

 

 

  

一些大資料工具,名詞的記錄

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.