Learn about Twitter storm architecture, and batch and streaming solutions
Source: Internet
Author: User
KeywordsSolutions Twitterstorm streaming
Hadoop (the undisputed king of the Big Data analysis field) concentrates on batch processing. This model is sufficient for many scenarios, such as indexing a Web page, but there are other usage models that require real-time information from highly dynamic sources. To solve this problem, the http://www.aliyun.com/zixun/aggregation/13431.html ">storm" (now called Backtype in Twitter) is to be relied on by Nathan Marz. Storm does not process static data, but it handles stream data that is expected to be contiguous. Given that Twitter users generate 140 million tweets a day, it's easy to see the huge use of this technology.
But Storm is not just a traditional large data analysis system: It is an example of a complex event processing (CEP) system. CEP systems are often categorized into computations and detection-oriented, where each system can be implemented in Storm through user-defined algorithms. For example, CEP can be used to identify meaningful events in the torrent of events and then handle them in real time.
Nathan Marz offers a number of examples of using Storm in Twitter. One of the most interesting examples is generating trend information. Twitter extracts emerging trends from massive tweets and maintains them at both the local and national levels. This means that when a case begins to emerge, Twitter's trend-themed algorithm recognizes the topic in real time. This real-time algorithm is implemented as a continuous analysis of Twitter data in Storm.
Storm and traditional Large data
Storm differs from other large data solutions in terms of how it is handled. Hadoop is essentially a batch system. The data is introduced into the Hadoop file system (HDFS) and distributed to each node for processing. When processing is complete, the resulting data is returned to the HDFS for use by the originator. Storm supports the creation of topologies to transform data streams without endpoints. Unlike Hadoop jobs, these transformations never stop, and they continue to process the incoming data.
Large Data implementations
The core of Hadoop is written in the Java™ language, but it supports data analysis applications written in a variety of languages. The implementation of the latest applications takes a more esoteric route to take full advantage of modern languages and their features. For example, the Spark of the University of California (UC) in Berkeley is implemented in the Scala language, and the Twitter Storm is implemented using the Clojure (pronunciation and closure) language.
Clojure is a modern dialect of the Lisp language. Similar to lisp,clojure support for a functional programming style, Clojure also introduces features to simplify multithreaded programming (a feature that is useful for creating Storm). Clojure is a virtual machine (VM) based language that runs on a Java virtual machine. However, although Storm is developed using the Clojure language, you can still write applications in almost any language in Storm. All that is required is an adapter that is connected to the Storm schema. There are already adapters for Scala, JRuby, Perl, and PHP, but there are structured query language adapters that support streaming to the Storm topology.
Key attributes for Storm
Some characteristics of Storm implementation determine its performance and reliability. Storm uses ZeroMQ to send messages, which eliminates the middle queuing process, allowing messages to flow directly between the tasks themselves. Behind the message is an automated and efficient mechanism for serializing and deserializing the primitive types of Storm.
One of the most interesting places in Storm is its focus on fault tolerance and management. Storm implements guaranteed message processing, so each tuple is fully processed through the topology, and if a tuple is found to be unhandled, it is automatically replayed from the nozzle. Storm also implements task-level fault detection, and when a task fails, the message is automatically reassigned to quickly start processing again. Storm contains more intelligent processing management than Hadoop, and processes are managed by regulators to ensure that resources are fully utilized.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.