(topkonjstorm) Second week work report: 2014-07-14~2014-07~20

Last Update:2014-07-22 Source: Internet

Author: User

Tags ack emit zookeeper

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This week's work is divided into two parts.

One: Build Jstorm environment (three machines cluster)

Since Microsoft Azure's virtual machines have not yet been applied, I've built a lab environment

1. Build Zookeeper Cluster

A) Download zookeeper version 3.4.5, unzip to/xxx/xxx/zookeeper-3.4.5

b) Configure environment variables (in ~/.BASHRC)

Export Zookeeper_home=/xxx/xxx/zookeeper-3.4.5

Export Path= $PATH: $HOME/bin: $ZOOKEEPER _home/bin

Export Classpath= $CLASSPATH: $ZOOKEEPER _home/lib

c) configuration $zookeeper_home/conf/zoo.cfg, mainly

Datadir=/home/yangrenkai/data/zookeeper/data

clientport=5181

server.1=blade5:2881:3881

server.2=blade7:2881:3881

server.3=blade8:2881:3881

d) Create a myID file under DataDir, with a content of 1 or 2 or 3, depending on the x of server.x.

2. Install java1.7 and Python 2.6, because Jstorm is written by a large number of Java and Python.

3. Installing JStorm-0.9.3.1

A) Download JStorm-0.9.3.1 version and unzip to/xxx/xxx/jstorm-0.9.3.1

b) Configure environment variables (in ~/.BASHRC)

Export jstorm_home=/xxx/xxx/jstorm-0.9.3.1

Export path= $PATH: $JSTORM _home/bin

c) Configuration $jstorm_home/conf/storm.yaml

Storm.zookeeper.servers: Represents the address of the zookeeper

Storm.zookeeper.port: The port representing the zookeeper

Nimbus.host: Represents the address of the Nimbus

Storm.zookeeper.root: Represents jstorm in the Zookeeper root directory, when multiple Jstorm share a zookeeper, you need to set this option, the default is "/jstorm"

Storm.local.dir: Indicates jstorm temporary data storage directory, need to ensure that Jstorm program has write permission to the directory

installation directory for JAVA.LIBRARY.PATH:ZEROMQ and Java ZEROMQ Library, default "/usr/local/lib:/opt/local/lib:/usr/lib"

Supervisor.slots.ports: Represents the list of port slots provided by supervisor, note that there is no conflict with other ports, default is 68XX, and Storm is 67xx

Supervisor.disk.slot: Indicates a data directory, when a machine has more than one disk, can provide disk read-write slot, easy to have heavy IO operation of the application

Topology.enable.classloader:false, by default, turns off ClassLoader, if the app jar conflicts with Jstorm's dependent jar, such as the app uses Thrift9, but Jstorm uses Thrif T7, you need to open ClassLoader

Nimbus.groupfile.path: If you need to do resource isolation, such as how much resources the Data warehouse uses, how much resources the technology department uses, how many resources the wireless department uses, it needs to open the grouping function, set the absolute path of a configuration file, and change the configuration file such as source code Group_f As shown in Ile.ini

Local temp directory used by Storm.local.dir:jstorm, if a machine is running storm and jstorm at the same time, do not share a directory, you must leave the two

d) Enter the command on the node that submitted the topology

#mkdir ~/.jstorm

#cp-F $JSTORM _home/conf/storm.yaml ~/.jstorm

e) Start ZK First, at startup Nimbus and supervisor, and Nimbus and supervisor preferably not on a node, I am 1 Nimbus and 2 supervisor, one supervisor configuration four ports

4. Jstorm requires Tomcat to display the UI, so you need to install Tomcat

A) Download Tomcat8.0.9, unzip to/xxx/xxx/Tomcat-8.0.9

b) Run the command:

cd/xxx/xxx/tomcat-8.0.9/webapps/

CP $JSTORM _home/jstorm-ui-0.9.3.war. /

MV ROOT Root.old

Ln-s jstorm-ui-0.9.3 ROOT

c) Start,/xxx/xxx/tomcat-8.0.9/bin/startup.sh

Two: Finish writing the first version of Topk_on_jstorm (project address)

1. Establishment of JSTORM-TOPK project

2. The entire project provides a simple TOPK calculation process, which is provided by scoreproducespout with a concurrency of 1 to provide random number data (Id,score), Computebolt with concurrency of 4 provides TOPK calculation, Rollup printing for Printandstorebolt with a concurrency of 1.

3. Establish Topkservertopology

Topologybuilder builder = new Topologybuilder ();

Builder.setspout ("spout", New Scoreproducespout (), 1);

Builder.setbolt ("Compute", new Computebolt (), 4). shufflegrouping ("spout");

Builder.setbolt ("Print", New Printandstorebolt (), 1). Shufflegrouping (
"Compute");

4. Establish Scoreproducespout, inherit irichspout (details in the next weekly report)

_collector.emit (New Values (Tupleid, ID, score), Tupleid);

Where Tupleid is a long type increment from 0, the ID is a four-bit [0-9a-za-z] constituent character, and Socre is a random number within 1000000. The Tupleid parameter of the Emit method represents the communication with Acker, which achieves record-level not lost. (ACK mechanism is described in other posts)

5. Establish Computebolt, inherit Irichbolt (details in the next weekly report)

The original data set is divided into 4 parts, parallel processing, each task calculates the TOPK on its own stream, even if the task is down or the tuple fail, the accumulation calculation is re-accumulated. The Excute () method implements the TOPK algorithm, which is more complex and can look at the source code in the project address.

Inherit Irichbolt, you can control whether the data is ACK or continue down a bolt send (that is, the next bolt to control the ACK)

6. Establish Printandstorebolt, inherit Irichbolt (details in the next weekly report)

All of the results are summarized here, Excute () still implements and Computebolt similar algorithms, different data volumes are smaller (filtered by Computebolt) and printed (for later persistence/output) leaving the interface.

7. Run on jstorm cluster, jstorm jar Topk.jar com.msopentech.jstorm.topk.topology.TopKServerTopology

The basic needs of TOPK can be completed.

Next week's plan

1. Can build a cluster on Microsoft Azure and run the TOPK algorithm (for the time being without an account and are looking for mentor help).

2. The TOPK algorithm can continue to be improved.

3. Implementation of the REST API input.

Thank Csdn Open-source summer camp and Shang Teacher's guidance and support!

(topkonjstorm) Second week work report: 2014-07-14~2014-07~20

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More