How to install and configure Apache Samza and apachesamza on Linux
Samza is a distributed stream processing framework (streaming processing). It implements real-time stream Data processing Based on Kafka message queues. (To be precise, samza uses kafka in a modular form, so it can be structured in other message queue frameworks, but the starting point and default implementation are based on kafka)
Apache Kafka is mainly used to control message sending.
Apache Hadoop YARN provides error information, isolation processors, security, and resource management.
This article describes howUbuntu 14.04Of 32BitInstall Samza on the system.
Installation preparation:
To install and configure Apache-Samza, you need the following:
JDK 1.7
Maven2
Kafka
Yarn
Zookeeper
# apt-get install curl gem
Download and set the JDK path:
We need to install JDK and set its environment variables.
# cd /usr/java # wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-i586.tar.gz" # tar xzf jdk-7u79-linux-i586.tar.gz
Decompress and set the JAVA_HOME path
# tar -zxvf jdk-7u79-linux-i586.tar.gz# JAVA_HOME=/usr/java/jdk1.7.0_79# export JAVA_HOME# PATH=$JAVA_HOME/bin:$PATH# export PATH
Add the above ~ /. BashrcAnd
/Etc/bashrc File
Install Maven2:
Next download and install maven
# wget https://launchpad.net/~bneijt/+archive/ubuntu/ppa/+build/2139203/+files/maven3_3.0.1-0~ppa2_all.deb
# dpkg -i maven3_3.0.1-0~ppa2_all.deb
Check maven version
# mvn3 -version
Apache Maven 3.0.1 (r1038046; 16:28:32 + 0530)
Java version: 1.7.0 _ 79
Java home:/usr/java/jdk1.7.0 _ 79/jre
Default locale: en_IN, platform encoding: UTF-8
OS name: "linux" version: "3.8.0-29-generic" arch: "i386" Family: "unix"
Install Hello-
Samza
:
We follow
/
Usr
Under the/local folder
# cd /usr/local
Copy hello-samza in,
# git clone git://git.apache.org/samza-hello-samza.git hello-samza
This project contains a "grid" script with the hello-samza variable. With this, you can do everything you need. You can use it to install Kafka, Yarn, and Zookeeper.
Run the following command,
# cd /usr/local/hello-samza
root@dev:/usr/local/hello-samza# bin/grid install kafka
EXECUTING: install kafka
Downloading kafka_2.10-0.8.2.1.tgz...
% Total % Received % Xferd Average Speed Time Current
Dload Upload Total Spent Left Speed
15 15.4 M 15 2406 k 0 304 k 0 0:00:51 0:00:07 443 k
root@dev:/usr/local/hello-samza# bin/grid install yarn
EXECUTING: install yarn
Downloading hadoop-2.6.1.tar.gz...
% Total % Received % Xferd Average Speed Time Current
Dload Upload Total Spent Left Speed
77 187 M 77 145 M 0 0 239 k 0 0:13:23 0:10:22 204 k
root@dev:/usr/local/hello-samza# bin/grid install zookeeper
EXECUTING: install zookeeper
Downloading zookeeper-3.4.3.tar.gz...
% Total % Received % Xferd Average Speed Time Current
Dload Upload Total Spent Left Speed
8 15.4 M 8 1324 k 0 212 k 0 0:01:14 0:00:06 266 k
Now you will find that all the packages are in the "deploy" folder under the root directory of hello-samza.
root@dev:/usr/local/hello-samza# cd deployroot@dev:/usr/local/hello-samza/deploy# ls
Kafka yarn zookeeper
Run the bin/grid bootstrap command
root@dev:/usr/local/hello-samza# bin/grid bootstrap
Download http://repo1.maven.org/maven2/org/fusesource/scalate/scalate-util_2.10/1.6.1/scalate-util_2.10-1.6.1.jar
: Samza-yarn_2.10: processResources
: Samza-yarn_2.10: classes
: Samza-yarn_2.10: lesscss.
....
....
BUILD SUCCESSFUL
Total time: 20 mins 32.855 secs
/Usr/local/hello-samza
EXECUTING: install zookeeper
Using previolet usly downloaded file/root/. samza/download/zookeeper-3.4.3.tar.gz
EXECUTING: install yarn
Using previolet usly downloaded file/root/. samza/download/hadoop-2.6.1.tar.gz
EXECUTING: install kafka
Using previolet usly downloaded file/root/. samza/download/kafka_2.10-0.8.2.1.tgz
EXECUTING: start zookeeper
JMX enabled by default
Using config:/usr/local/hello-samza/deploy/zookeeper/bin/../conf/zoo. cfg
Starting zookeeper... STARTED
EXECUTING: start yarn
Starting resourcemanager, logging to/usr/local/hello-samza/deploy/yarn/logs/yarn-root-resourcemanager-dev.out
Starting nodemanager, logging to/usr/local/hello-samza/deploy/yarn/logs/yarn-root-nodemanager-dev.out
EXECUTING: start kafka
After the preceding grid is executed, you can verify whether YARN is installed and running.
Http: // localhost: 8088The yarn ui is displayed.
Build a Samza work package:
You need to build this package. YARN uses this package to execute the grid.
Note: For example, if you build the latest version of the hello-samza project, run the following command first.
root@dev:/usr/local/hello-samza#./gradlew publishToMavenLocal
You can use these commands in the hello-samza project:
root@dev:/usr/local/hello-samza# mvn clean packageroot@dev:/usr/local/hello-samza# mkdir -p deploy/samzaroot@dev:/usr/local/hello-samza# tar -xvf ./target/hello-samza-0.10.0-dist.tar.gz -C deploy/samza
Run the Samza task:
After the Samza package is built, you can use the t
Run-job.shScript to complete some tasks
root@dev:/usr/local/hello-samza # deploy/samza/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-feed.properties
The above task will unedit the implementation feedback from Wikipedia and put the edits to
"
Thelinuxfaq
-Raw"In the topic.
After running this topic for a few minutes, you can check the last update of Kafka:
root@dev:/usr/local/hello-samza# deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic thelinuxfaq-raw
Visit the yarn ui again (http: // localhost: 8088). You will see that Samza is running normally, rather than an error message!
Disable Samza:
Once everything is done, you can use the grid script to close all related servers.
root@dev:/usr/local/hello-samza # bin/grid stop all
Output example:
EXECUTING: stop allEXECUTING: stop kafkaEXECUTING: stop yarnstopping resourcemanagerstopping nodemanagerEXECUTING: stop zookeeperJMX enabled by defaultUsing config: /usr/local/hello-samza/deploy/zookeeper/bin/../conf/zoo.cfgStopping zookeeper ... STOPPED
Start Samza:
Yes. You can use the grid script to start all services,
root@dev:/usr/local/hello-samza # bin/grid start all
Output example:
EXECUTING: start allEXECUTING: start zookeeperJMX enabled by defaultUsing config: /usr/local/hello-samza/deploy/zookeeper/bin/../conf/zoo.cfgStarting zookeeper ... STARTEDEXECUTING: start yarn....EXECUTING: start kafka