4, Kafka and characteristics
The Apache Kafka was originally made by LinkedIn and is currently a top open source project under Apache. The primary goal of the Apache Kafka design is to address the vast number of user action records and page browsing records in the LinkedIn site, followed by the Apache Kafka version, which will "meet high data throughput" as the primary goal of version optimization. To achieve this goal, Apache Kafka even made some sacrifices in other functional aspects, such as message transactional. If your system requires a large amount of data acquisition work per unit time, consider adding Apache Kafka to your system design. 4-1, Kafka cluster installation 4-1-1, installation Environment Introduction
The Apache Kafka installation process is simple. In order to save space, I am not prepared to introduce its single machine (single service node) installation process and the simplest producer and consumer coding process as described in Apache ACTIVEMQ. Instead, it's a different way of thinking:
This paper introduces the installation process of the Apache Kafka multi-node cluster directly, and divides multiple partitions for the new topic in this Apache Kafka cluster, demonstrates the message load balancing principle of Apache Kafka. Maybe in this process, I'm going to use words that you don't know much about (or some of the things you don't understand for a while), but it doesn't matter, you just have to follow the steps I've given-these words and actions will be explained in the following text.
First we list the service nodes required in the Kafka cluster that will be installed, and the role each service node plays in it:
Node Location |
node Role |
192.168.61.139 |
Apache Kafka Brocker 1 |
192.168.61.138 |
Apache Kafka Brocker 2 |
192.168.61.140 |
Zookeeper Server |
In this demo instance of the Apache Kafka cluster installation, we prepared two Apache Kafka Brocker service nodes and used one of the nodes to act as the zookeeper running node.
The Apache Kafka cluster needs to be coordinated with the Zookeeper service, so the zookeeper service needs to be installed and run first before installing Apache Kafka. Because this article mainly introduces the working principle of Apache Kafka, so how to install and use zookeeper content will no longer repeat, not clear readers can refer to my another article: "Hadoop series: Zookeeper (1)-- Zookeeper single point and cluster Installation ". Here we run zookeeper just using the Zookeeper service's single node mode, if you need to run the Apache Kafka cluster in the actual production environment, then zookeeper The number of clusters service nodes should be at least 3 (and use a different physical machine). 4-1-2, Kafka cluster installation process First, after we install zookeeper on the 192.168.61.140 server, we can start zookeeper directly:
zkserver.sh start
1 You can download the v0.8.x version of the installation package at the official website of Apache Kafka (http://kafka.apache.org/ downloads.html), do not download the v0.9.x version of the installation package, because the configuration properties of the consumer side of the v0.9.x version have changed quite a lot. Our explanations for this section will be based on the V0.8.1.1 version, and are all for v0.8.x version-compatible configuration attributes (Https://www.apache.org/dyn/closer.cgi?path=/kafka/0.8.1.1/kafka_ 2.10-0.8.1.1.TGZ).
You can use the wget command directly, or you can download it via the browser (or third party software):
wget https://www.apache.org/dyn/closer.cgi?path=/kafka/0.8.1.1/kafka_2.10-0.8.1.1.tgz
1 after downloading, run the command to compress the file to extract the operation:
TAR-XVF./kafka_2.10-0.8.1.1.tgz
1
The author is accustomed to put the running software in the/usr directory, you can according to your own operating habits or your team's specifications to place the unpacked directory (in the formal environment is not recommended to use the root account to run Kafka):
Mv/root/kafka_2.10-0.8.1.1/usr/kafka_2.10-0.8.1.1/
1 Apache Kafka All management commands are stored in the./bin directory under the installation path. So, if you want the following management convenience, you can set the environment variables:
Export Path=/usr/kafka_2.10-0.8.1.1/bin: $PATH
Add the same setting at the end of #记得在/etc/profile file
1 2 Apache Kafka configuration files are stored under the./config directory under the installation path. As shown below:
-rw-rw-r--. 1 root 1202 April 2014 consumer.properties
-rw-rw-r--. 1 root root 3828 April 2014 log4j.properties
- Rw-rw-r--. 1 root 2217 April 2014 producer.properties
-rw-rw-r--. 1 root root 5322 April 23:32 server.properties
-rw-rw-r--. 1 root 3326 April 2014 test-log4j.properties
-rw-rw-r--. 1 root root 995 April 2014 Tools-log4j.pro Perties
-rw-rw-r--. 1 root root 1023 April 2014 zookeeper.properties
1 2 3 4 5 6 7
If you are in the Apache Kafka cluster installation, you only need to care about the "server.properties" profile (the role of other profiles, which we'll discuss later).
There is a zookeeper.properties in the directory that is not recommended for use. This profile is available because the Kafka has a zookeeper running environment, which is used if you use the zookeeper-server-start.sh command in Kafka to start the zookeeper environment. Start editing the server.properties configuration file. There are a number of default configuration entries in this configuration file, but you do not have to make all of the changes. The following is a list of the changes to the configuration file, where you need the primary care attributes to be described in Chinese (and of course the original annotations are retained):
# The ID of the broker.
This must is set to A is a unique integer for each broker. # A very important attribute, the ID of each brocker in the Kafka cluster must be different, otherwise it will error at startup Broker.id=2 # The port the socket server listens on port=9092 # Hostna Me the broker would bind to.
If not set, the server'll bind to all interfaces #host. Name=localhost # The number of threads handling network requests
num.network.threads=2 # The number of threads doing disk I/O # name incredible, that is, how many threads at the same time in the disc IO operation.
# This value is not actually set to the greater the performance the better. # in my Next "storage" topic, if you provide Kafka use of the filesystem physical layer only one head is working # then this value becomes meaningless. num.io.threads=8 # The Send buffer (SO_SNDBUF) used by The socket server socket.send.buffer.bytes=1048576 # The receive buffer (SO_RCVBUF) used by the socket server Socket.rece ive.buffer.bytes=1048576 # The maximum size of a request that the socket server would accept (protection against OOM) sock et.request.max.bytes=104857600 # A Comma seperated list of directories under which to store log files # Many developers, when using Kafka, do not
Attach importance to this attribute. # In fact, most of Kafka's performance depends on what file system you provide Log.dirS=/tmp/kafka-logs # The default number of log partitions per topic.
More partitions allow greater # Parallelism for consumption, but this'll also result in the more files across the brokers. num.partitions=2 # The number of messages to accept before forcing a flush of data to disk # formally writes a message to a threshold on disk from page cache: to stay The dump message quantity is based on #log. flush.interval.messages=10000 # The maximum amount of time a messages can sit in a log before we force a FL Ush # from page cache the message is formally written to the threshold on the disk: #log on the staging interval. flush.interval.ms=1000 # The minimum age of a log file to is eligible for Deletion # Log message information is saved as long as the default is 168 hours log.retention.hours=168 # A size-based retention policy for logs.
Segments are pruned from the log as long as the remaining # segments don ' t drop below log.retention.bytes. # default 1GB, log file does not execute delete policy # in the actual environment, because disk space is not a problem at all, and memory space is large enough.
So the author will set this value larger, such as 100GB.
#log. retention.bytes=1073741824 # The maximum size of a log segment file.
# when it is reached a new log segment'll be created. # The default is 512MB, when it reaches thisSize, Kafka will create a new segmented file for this partition log.segment.bytes=536870912 # the interval at which log segments are checked to if T Hey can be deleted according # to the retention policies # file Deletion retention policy, how often is checked (in milliseconds) # in the actual production environment, a 6-12-hour check is enough log.retention. check.interval.ms=60000 # By default the log cleaner are disabled and the log retention policy'll default to just delete
Segments after their retention expires.
# If Log.cleaner.enable=true is set the Cleaner'll be enabled and individual logs can then to marked for log compaction. Log.cleaner.enable=false ############################# Zookeeper ############################# # Zookeeper
Connection string (zookeeper docs for details).
# root directory for all Kafka znodes.
# To Zookeeper connection information, if there are multiple zookeeper service nodes, use ', ' for split # for example: 127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002 zookeeper.connect=192.168.61.140:2181 # Timeout in MS for connecting to zookeeper # Zookeeper connection Timeout zookeeper.connection. timeout.ms=1000000
1 2 3 4 5 6 7 8 9 10