Jstorm is a distributed real-time computing engine

Source: Internet
Author: User
Tags hadoop mapreduce

Alibaba/ jstorm Jstorm is a distributed real-time computing engine.

Jstorm is a Hadoop mapreduce-like system in which the user implements a task according to the specified interface, then submits the task to the Jstorm system, jstorm the task and runs it 7 * 24 hours, once the middle worker In the event of an unexpected failure, the scheduler immediately assigns a new worker to replace the defunct worker.

Therefore, from an application perspective, the Jstorm application is a distributed application that adheres to a certain programming specification. From the system point of view, jstorm a set of scheduling system similar to MapReduce. From the data point of view, is a set of pipeline-based message processing mechanism.

Real-time computing is now one of the hottest trends in big data, because people are increasingly demanding data, real-time requirements are getting faster, and traditional Hadoop Map Reduce is not going to meet demand, so there is a growing demand in this area.

Advantages

Before Storm and Jstorm appeared, there were many real-time computing engines on the market, but since Storm and Jstorm appeared, it can be said that unified rivers and lakes: the advantages:

    • The development is very rapid, the interface is simple, easy to use, as long as adhere to the topology,spout, bolt programming specifications can develop a very good extension of the application, the bottom of the rpc,worker between redundancy, data diversion and other actions at all without consideration.
    • Scalability is excellent, when the first processing unit speed, directly configure the number of concurrent, you can linearly expand performance
    • Robust, automatically assigns new workers to replace invalid worker when worker fails or machine fails
    • Data accuracy, the Acker mechanism can be used to ensure that data is not lost. If the precision has more than one step requirements, the use of transaction mechanisms to ensure accurate data.
Application Scenarios

The way Jstorm processes data is based on pipeline processing of messages, so it is particularly well suited for stateless computing , where the dependent data of a cell is all found in the received message, and it is best that one stream does not rely on another.

Therefore, it is often used to

    • Log analysis, analyze the specific data from the log, and store the results of the analysis in an external memory such as a database. Currently, the mainstream log analytics technology uses Jstorm or storm
    • A piping system that transfers one data from one system to another, such as synchronizing a database to Hadoop
    • A message converter that converts the received message into a format that is stored in another system such as message middleware
    • A statistic analyzer, which extracts a field from a log or message, then calculates a count or sum, and finally stores the statistic value in an external memory. Intermediate processes can be more complex.

Jstorm is a distributed real-time computing engine

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.