LinkedIn Open source real-time data processing system Samza

Source: Internet
Author: User
Keywords Open source processing system
Tags applications based blog data data processing difference distributed framework

Recently, LinkedIn Open source has a technology--samza, it is a distributed flow processing framework, dedicated to real-time data processing, very much like the Twitter stream processing system storm. The difference is that Samza is based on Hadoop and uses LinkedIn's own Kafka distributed messaging system.

Storm and Samza are very similar, as LinkedIn's Chris Riccomini blog: "[Samza] can help you build applications, process message queues-update databases, count and other aggregations, transform messages, and so on." "And these are really classic storm apps, just migrating to Samza, and Samza documents are contrasting these two systems."

Last month, Samza was widely disseminated in various forums and communities, with comments pointing to the potential benefits of Samza:

"Like many people, we use storm to process data based on Kafka, and then send that data to Hadoop for off-line analysis." It would be a great victory if we could integrate these three environments together. “

On the surface, it seems to be a very good idea. The Apache Software Foundation's Project home page describes the features and advantages of pairing with Kafka and yarn.

High fault tolerance: If the server or processor fails, Samza will reboot the stream processor with yarn. High reliability: Samza uses Kafka to ensure that all messages are processed in the order in which they are written to the partition, and that no messages will be lost. Scalability: Samza is segmented and distributed at all levels; Kafka provides an orderly, divisible, deployable, and highly fault tolerant system; yarn provides a distributed environment for SAMZA containers to run.

The future of Samza

It remains to be seen whether Samza can attract a large number of users and communities to participate in innovation like Storm. But LinkedIn is certainly going to be like Twitter development storm to ensure SAMZA development, and the latter is more advantageous in usability, after all, the Samza that runs on yarn or mesos framework has more flexibility.

If Samza has a good future, then yarn is also worthy of the Hadoop community's "hype" in the past 1 August, not only to run storm, but to run Samza, and even to run a lot of other things. This is important, after all, many software vendors have put big data "futures" (and even the entire future) on Hadoop, and they want the platform to be the last winner.

Previous reliance on MapReduce Technology limited the applicability of Hadoop, but yarn has opened up support for large-scale streaming, interactive SQL queries, machine learning, and image processing loads. With technology changing, Hadoop becomes a more realistic idea of supporting all large data applications.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.