Demo Run for Summingbird (Storm + Hadoop)

Source: Internet
Author: User

Objective

In order to run Summingbird demo, the author has gone a lot of detours, and in the country is basically not access to any information, took a long time to fix the demo run. Really is a bitter tears, interested in want to study Summingbird and listen to the author of the one by one Tao, the general can be summingbird understand as Storm + Hadoop.

A quick preview of Big Data processing

The advent of the era of big data, the large-scale processing is divided into batch processing and real-time processing two directions, the advantage of batch processing is good fault-tolerant, because the data when there is a local or distributed storage, you can repeat the data processing, the disadvantage is that the speed is slow, to wait until the data are all deposited before the batch processing. For real-time processing, the advantage is fast, real-time calculation, disadvantage is bad fault tolerance, because the data flow into the memory and then out, filtering out useful data, rather than all the data to disk processing, so when you want to run the previous data is impossible, that is, its processing data is not available. Batching or real-time processing is becoming more and more difficult to meet the diverse needs, it is bound to combine the two to deal with. It maintains the fault tolerance of batch processing, and maintains real-time processing in real time. The following is the protagonist of this article-summingbird, Seamless integration of batch computing and real-time computing.

Second, learning Summingbird need to build the environment

The author of the Machine OS for Linux, to run Summingbird, build the environment of the machine is as follows:

1.zookeeper

2.kafka

3.memcached

Second, the skills needed to learn Summingbird

1. There should be some understanding of SBT

2. Familiarity with the Scala language

3. How storm and Hadoop work should be more familiar

Third, the exploration of the demo run

Interested Park friends can search for Summingbird on GitHub and have a general understanding of them. Of course, you can follow the official GitHub tutorial to run the demo, if successful, there will be no results, because the existence of GFW, leading to the official tutorial of the Twitter stream will not be able to successfully access the program. So certainly is not running, the author just started when also tried, have failed, and then constantly Google, and on Twitter constantly asked the project initiator. and began to try again, and ended in failure. Then GitHub found an example that combines storm and Hadoop, so the heart is a happy, continue to start research, follow the step by step, and finally, the result has failed. See the error is that because some of the jar package is not available, or GFW, not in the Twitter Maven repository to obtain the corresponding jar package, because the author did not study maven and SBT, then began to learn SBT and Maven, of course, there is no special in-depth study, Just master some basic usage and be able to read SBT files and maven files. After opening the project's SBT file, it was found that the library on which it depended was walled and began to change to the MAVEN repository in Oschina.

Four, finally successfully run

Specific project code has been hosted on GitHub above, just follow the steps, you can get the correct results, but also hope that you can have a lot of advice. The next step is to start importing data from the local database for processing.

Five, experience

Learning Big Data involved in the knowledge is really very broad, to master a lot, so it must be a solid research. I have to say that China's firewall gfw is indeed the reputation, the bad. While protecting the network, it does give developers some unnecessary trouble. However, the final success of the operation.

The GitHub path is as follows: Https://github.com/leesf/summingbird-hybrid-example-china

You are also welcome to the Park Friends Fork and add star

Demo Run for Summingbird (Storm + Hadoop)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.