How to install and configure Apache Samza and apachesamza on Linux

Source: Internet
Author: User

How to install and configure Apache Samza and apachesamza on Linux

Samza is a distributed stream processing framework (streaming processing). It implements real-time stream Data processing Based on Kafka message queues. (To be precise, samza uses kafka in a modular form, so it can be structured in other message queue frameworks, but the starting point and default implementation are based on kafka)

Apache Kafka is mainly used to control message sending.

Apache Hadoop YARN provides error information, isolation processors, security, and resource management.

This article describes howUbuntu 14.04Of 32BitInstall Samza on the system.


Installation preparation:

To install and configure Apache-Samza, you need the following:

JDK 1.7
Maven2

Kafka
Yarn
Zookeeper

#  apt-get install curl gem


Download and set the JDK path:

We need to install JDK and set its environment variables.

# cd /usr/java # wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-i586.tar.gz" # tar xzf jdk-7u79-linux-i586.tar.gz


Decompress and set the JAVA_HOME path
# tar -zxvf  jdk-7u79-linux-i586.tar.gz# JAVA_HOME=/usr/java/jdk1.7.0_79# export JAVA_HOME# PATH=$JAVA_HOME/bin:$PATH# export PATH

Add the above ~ /. BashrcAnd /Etc/bashrc File

Install Maven2:

Next download and install maven

#  wget https://launchpad.net/~bneijt/+archive/ubuntu/ppa/+build/2139203/+files/maven3_3.0.1-0~ppa2_all.deb


# dpkg -i maven3_3.0.1-0~ppa2_all.deb


Check maven version

#  mvn3 -version
Apache Maven 3.0.1 (r1038046; 16:28:32 + 0530)
Java version: 1.7.0 _ 79
Java home:/usr/java/jdk1.7.0 _ 79/jre
Default locale: en_IN, platform encoding: UTF-8
OS name: "linux" version: "3.8.0-29-generic" arch: "i386" Family: "unix"
Install Hello- Samza :  


We follow / Usr Under the/local folder

# cd /usr/local


Copy hello-samza in,

# git clone git://git.apache.org/samza-hello-samza.git hello-samza


This project contains a "grid" script with the hello-samza variable. With this, you can do everything you need. You can use it to install Kafka, Yarn, and Zookeeper.

Run the following command,

# cd /usr/local/hello-samza


root@dev:/usr/local/hello-samza# bin/grid install kafka 
EXECUTING: install kafka
Downloading kafka_2.10-0.8.2.1.tgz...
% Total % Received % Xferd Average Speed Time Current
Dload Upload Total Spent Left Speed
15 15.4 M 15 2406 k 0 304 k 0 0:00:51 0:00:07 443 k
root@dev:/usr/local/hello-samza# bin/grid install yarn
EXECUTING: install yarn
Downloading hadoop-2.6.1.tar.gz...
% Total % Received % Xferd Average Speed Time Current
Dload Upload Total Spent Left Speed
77 187 M 77 145 M 0 0 239 k 0 0:13:23 0:10:22 204 k
root@dev:/usr/local/hello-samza#  bin/grid install zookeeper
EXECUTING: install zookeeper
Downloading zookeeper-3.4.3.tar.gz...
% Total % Received % Xferd Average Speed Time Current
Dload Upload Total Spent Left Speed
8 15.4 M 8 1324 k 0 212 k 0 0:01:14 0:00:06 266 k
Now you will find that all the packages are in the "deploy" folder under the root directory of hello-samza.

root@dev:/usr/local/hello-samza# cd deployroot@dev:/usr/local/hello-samza/deploy# ls 
Kafka yarn zookeeper
Run the bin/grid bootstrap command

root@dev:/usr/local/hello-samza# bin/grid bootstrap 
Download http://repo1.maven.org/maven2/org/fusesource/scalate/scalate-util_2.10/1.6.1/scalate-util_2.10-1.6.1.jar
: Samza-yarn_2.10: processResources
: Samza-yarn_2.10: classes
: Samza-yarn_2.10: lesscss.
....
....
BUILD SUCCESSFUL

Total time: 20 mins 32.855 secs
/Usr/local/hello-samza
EXECUTING: install zookeeper
Using previolet usly downloaded file/root/. samza/download/zookeeper-3.4.3.tar.gz
EXECUTING: install yarn
Using previolet usly downloaded file/root/. samza/download/hadoop-2.6.1.tar.gz
EXECUTING: install kafka
Using previolet usly downloaded file/root/. samza/download/kafka_2.10-0.8.2.1.tgz
EXECUTING: start zookeeper
JMX enabled by default
Using config:/usr/local/hello-samza/deploy/zookeeper/bin/../conf/zoo. cfg
Starting zookeeper... STARTED
EXECUTING: start yarn
Starting resourcemanager, logging to/usr/local/hello-samza/deploy/yarn/logs/yarn-root-resourcemanager-dev.out
Starting nodemanager, logging to/usr/local/hello-samza/deploy/yarn/logs/yarn-root-nodemanager-dev.out
EXECUTING: start kafka
 
After the preceding grid is executed, you can verify whether YARN is installed and running. Http: // localhost: 8088The yarn ui is displayed.

Build a Samza work package:

You need to build this package. YARN uses this package to execute the grid.

Note: For example, if you build the latest version of the hello-samza project, run the following command first.

root@dev:/usr/local/hello-samza#./gradlew publishToMavenLocal 


You can use these commands in the hello-samza project:

root@dev:/usr/local/hello-samza# mvn clean packageroot@dev:/usr/local/hello-samza# mkdir -p deploy/samzaroot@dev:/usr/local/hello-samza# tar -xvf ./target/hello-samza-0.10.0-dist.tar.gz -C deploy/samza


Run the Samza task:

After the Samza package is built, you can use the t Run-job.shScript to complete some tasks

root@dev:/usr/local/hello-samza # deploy/samza/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-feed.properties 
 

The above task will unedit the implementation feedback from Wikipedia and put the edits to " Thelinuxfaq -Raw"In the topic.

After running this topic for a few minutes, you can check the last update of Kafka:

root@dev:/usr/local/hello-samza#  deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper localhost:2181 --topic thelinuxfaq-raw


Visit the yarn ui again (http: // localhost: 8088). You will see that Samza is running normally, rather than an error message!

Disable Samza:

Once everything is done, you can use the grid script to close all related servers.
root@dev:/usr/local/hello-samza #  bin/grid stop all 

Output example:
EXECUTING: stop allEXECUTING: stop kafkaEXECUTING: stop yarnstopping resourcemanagerstopping nodemanagerEXECUTING: stop zookeeperJMX enabled by defaultUsing config: /usr/local/hello-samza/deploy/zookeeper/bin/../conf/zoo.cfgStopping zookeeper ... STOPPED

Start Samza:

Yes. You can use the grid script to start all services,

root@dev:/usr/local/hello-samza #  bin/grid start all 

Output example:
EXECUTING: start allEXECUTING: start zookeeperJMX enabled by defaultUsing config: /usr/local/hello-samza/deploy/zookeeper/bin/../conf/zoo.cfgStarting zookeeper ... STARTEDEXECUTING: start yarn....EXECUTING: start kafka
 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.