(topkonjstorm) Second week work report: 2014-07-14~2014-07~20

Source: Internet
Author: User
Tags ack emit zookeeper

This week's work is divided into two parts.

One: Build Jstorm environment (three machines cluster)

Since Microsoft Azure's virtual machines have not yet been applied, I've built a lab environment

1. Build Zookeeper Cluster

A) Download zookeeper version 3.4.5, unzip to/xxx/xxx/zookeeper-3.4.5

b) Configure environment variables (in ~/.BASHRC)

Export Zookeeper_home=/xxx/xxx/zookeeper-3.4.5

Export Path= $PATH: $HOME/bin: $ZOOKEEPER _home/bin

Export Classpath= $CLASSPATH: $ZOOKEEPER _home/lib

c) configuration $zookeeper_home/conf/zoo.cfg, mainly

Datadir=/home/yangrenkai/data/zookeeper/data

clientport=5181

server.1=blade5:2881:3881

server.2=blade7:2881:3881

server.3=blade8:2881:3881

d) Create a myID file under DataDir, with a content of 1 or 2 or 3, depending on the x of server.x.

2. Install java1.7 and Python 2.6, because Jstorm is written by a large number of Java and Python.

3. Installing JStorm-0.9.3.1

A) Download JStorm-0.9.3.1 version and unzip to/xxx/xxx/jstorm-0.9.3.1

b) Configure environment variables (in ~/.BASHRC)

Export jstorm_home=/xxx/xxx/jstorm-0.9.3.1

Export path= $PATH: $JSTORM _home/bin

c) Configuration $jstorm_home/conf/storm.yaml

Storm.zookeeper.servers: Represents the address of the zookeeper

Storm.zookeeper.port: The port representing the zookeeper

Nimbus.host: Represents the address of the Nimbus

Storm.zookeeper.root: Represents jstorm in the Zookeeper root directory, when multiple Jstorm share a zookeeper, you need to set this option, the default is "/jstorm"

Storm.local.dir: Indicates jstorm temporary data storage directory, need to ensure that Jstorm program has write permission to the directory

installation directory for JAVA.LIBRARY.PATH:ZEROMQ and Java ZEROMQ Library, default "/usr/local/lib:/opt/local/lib:/usr/lib"

Supervisor.slots.ports: Represents the list of port slots provided by supervisor, note that there is no conflict with other ports, default is 68XX, and Storm is 67xx

Supervisor.disk.slot: Indicates a data directory, when a machine has more than one disk, can provide disk read-write slot, easy to have heavy IO operation of the application

Topology.enable.classloader:false, by default, turns off ClassLoader, if the app jar conflicts with Jstorm's dependent jar, such as the app uses Thrift9, but Jstorm uses Thrif T7, you need to open ClassLoader

Nimbus.groupfile.path: If you need to do resource isolation, such as how much resources the Data warehouse uses, how much resources the technology department uses, how many resources the wireless department uses, it needs to open the grouping function, set the absolute path of a configuration file, and change the configuration file such as source code Group_f As shown in Ile.ini

Local temp directory used by Storm.local.dir:jstorm, if a machine is running storm and jstorm at the same time, do not share a directory, you must leave the two

d) Enter the command on the node that submitted the topology

#mkdir ~/.jstorm

#cp-F $JSTORM _home/conf/storm.yaml ~/.jstorm

e) Start ZK First, at startup Nimbus and supervisor, and Nimbus and supervisor preferably not on a node, I am 1 Nimbus and 2 supervisor, one supervisor configuration four ports

4. Jstorm requires Tomcat to display the UI, so you need to install Tomcat

A) Download Tomcat8.0.9, unzip to/xxx/xxx/Tomcat-8.0.9

b) Run the command:

cd/xxx/xxx/tomcat-8.0.9/webapps/

CP $JSTORM _home/jstorm-ui-0.9.3.war. /

MV ROOT Root.old

Ln-s jstorm-ui-0.9.3 ROOT

c) Start,/xxx/xxx/tomcat-8.0.9/bin/startup.sh

Two: Finish writing the first version of Topk_on_jstorm (project address)

1. Establishment of JSTORM-TOPK project

2. The entire project provides a simple TOPK calculation process, which is provided by scoreproducespout with a concurrency of 1 to provide random number data (Id,score), Computebolt with concurrency of 4 provides TOPK calculation, Rollup printing for Printandstorebolt with a concurrency of 1.

3. Establish Topkservertopology

Topologybuilder builder = new Topologybuilder ();

Builder.setspout ("spout", New Scoreproducespout (), 1);

Builder.setbolt ("Compute", new Computebolt (), 4). shufflegrouping ("spout");

Builder.setbolt ("Print", New Printandstorebolt (), 1). Shufflegrouping (
"Compute");

4. Establish Scoreproducespout, inherit irichspout (details in the next weekly report)

_collector.emit (New Values (Tupleid, ID, score), Tupleid);

Where Tupleid is a long type increment from 0, the ID is a four-bit [0-9a-za-z] constituent character, and Socre is a random number within 1000000. The Tupleid parameter of the Emit method represents the communication with Acker, which achieves record-level not lost. (ACK mechanism is described in other posts)

5. Establish Computebolt, inherit Irichbolt (details in the next weekly report)

The original data set is divided into 4 parts, parallel processing, each task calculates the TOPK on its own stream, even if the task is down or the tuple fail, the accumulation calculation is re-accumulated. The Excute () method implements the TOPK algorithm, which is more complex and can look at the source code in the project address.

Inherit Irichbolt, you can control whether the data is ACK or continue down a bolt send (that is, the next bolt to control the ACK)

6. Establish Printandstorebolt, inherit Irichbolt (details in the next weekly report)

All of the results are summarized here, Excute () still implements and Computebolt similar algorithms, different data volumes are smaller (filtered by Computebolt) and printed (for later persistence/output) leaving the interface.

7. Run on jstorm cluster, jstorm jar Topk.jar com.msopentech.jstorm.topk.topology.TopKServerTopology

The basic needs of TOPK can be completed.

Next week's plan

1. Can build a cluster on Microsoft Azure and run the TOPK algorithm (for the time being without an account and are looking for mentor help).

2. The TOPK algorithm can continue to be improved.

3. Implementation of the REST API input.


Thank Csdn Open-source summer camp and Shang Teacher's guidance and support!

(topkonjstorm) Second week work report: 2014-07-14~2014-07~20

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.