Apache KAFKA cluster Environment Environment building

Source: Internet
Author: User
Tags config garbage collection zookeeper

http://bigcat2013.iteye.com/blog/2175880


Apache Kafka is a high-throughput distributed messaging system, open source by LinkedIn. Referring to Kafka's introduction to the official website: "Apache Kafka is publish-subscribe messaging rethought as a distributed commit log." Publish-subscribe "is the core idea of Kafka design, and also the most distinctive place of Kafka. Publish in KAKFA is a producer role, subscribe is consumer, as in our lives, manufacturers produce products, consumers generally can not go directly to the factory to buy, but also need an agent dealer, So the same in the Kafka ecosystem, there is a broker role. So the Kafka ecosystem can be broadly described as follows:

"Producer-->broker<--consumer"

The general introduction is so much, specific people can go to the official website: http://kafka.apache.org/

Then there is the commonplace question: why use Kafka. What kind of scene does the Kafka apply to? I first share with you the use of the project in the summary, there are other ideas of students welcome to add: The use of Kafka reasons:

1. Distributed, high throughput, fast (Kafka is directly through disk storage, linear read and write, fast: Avoid the data between the JVM memory and system memory replication, reduce the consumption of performance of object creation and garbage collection)

2. Support both real-time and offline two solutions (I believe many projects have similar needs, this is the official LinkedIn structure, we are part of the data through storm to do real-time computing processing, part of the Hadoop offline analysis).

3.open Source (open source who doesn't like it)

4. The source code is written by Scala and can be run on the JVM (the author is very fond of Scala, the functional language has always been very handsome, and Spark is written by Scala, it seems that there will be a time to brush Scala) use the scene:

The author is mainly used to do log analysis system, in fact, LinkedIn is so used, may be because the Kafka reliability requirements are not particularly high, in addition to the log, some of the site's browsing data should also apply. (As long as the original data does not need to be stored directly in db)

The following is a brief introduction to the Kafka cluster construction process:

Prep environment: At least 3 Linux servers (the author is a 5 redhat version of cloud server)

First step: Install Jdk/jre

Step Two: Install Zookeeper (Kafka comes with zookeeper service, but it is recommended that you build a zookeeper cluster separately, which can be shared with other applications and manageable)

Zookeeper installation, you can refer to my other blog post: http://bigcat2013.iteye.com/blog/2175538

Step three: Download kafka:http://kafka.apache.org/downloads.html (It's a good idea to download the Scala pre-compiled package, for example, I'm under Kafka_2.10-0.8.1.1.tgz, It means pre-compiled with Scala 2.10 0.8.1.1 version)

Fourth step: Upload the installation package to the server (can be via WINSCP, etc.)

Fifth step: Use " tar-xzvf kafka_2.10-0.8.1.1.tgz " to unzip the installation package:

The directory structure after decompression:


Sixth Step : Modify the configuration file

If you have a simple configuration, you can change/config/server.properties.

The properties that need to be configured are: Broker.id (indicates the current server ID in the cluster, starting at 0), Port,host.name (current server host name), Zookeeper.connect (connected zookeeper cluster) , Log.dirs (log storage directory, remember the corresponding to build this directory), and other configuration can see the corresponding comments:

Seventh step: Copy the configured Kafka directory to several other servers via "Scp-r":

Eighth step: Modify each server corresponding to the configuration file, mainly to modify the Broker.id and Host.name properties:

Broker.id increments from 0, each server must be unique

Nineth Step: Start the Zookeeper cluster first, then start the KAKFA cluster

Kafka Start command: sudo nohup./bin/kafka-server-start.sh config/server.properties &

Tenth step: After the cluster starts successfully, you can try to create a topic, create a producer on one server, another create a consumer, send a message from producer to see if the consumer can be received to verify that the cluster is successful.

Create Topic:sudo./bin/kafka-topics.sh-zookeeper server1:2181,server2:2181,server3:2181-topic Test-replication-factor 2-partitions 5-create

View Topic:sudo./bin/kafka-topics.sh-zookeeper server1:2181,server2:2181,server3:2181-list

Create Producer:sudo./bin/kafka-console-producer.sh-broker-list Kafkaserver1:9092,kafkaserver2:9092,kafkaserver3 : 9092-topic Test

Create Consumer:sudo./bin/kafka-console-consumer.sh-zookeeper server1:2181,server2:2181,server3:2181-from-begining- Topic test

By entering information in the created producer console, testing the output in the consumer console, if you can receive the information synchronously, the simple KAKFA cluster is set up, and it can be further configured according to the actual needs of the project.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.