High-throughput distributed publishing and subscription message system Kafka
I. Overview of Kafka
Kafka is a high-throughput distributed publish/subscribe message system that can process all the action flow data of a website with a consumer scale. Such actions (Web browsing, search, and other user actions) are a key factor in many social functions on modern networks. This data is usually solved by processing logs and log aggregation due to throughput requirements. This is a feasible solution for log data and offline analysis systems like Hadoop that require real-time processing. Kafka aims to unify online and offline message processing through the parallel loading mechanism of Hadoop, and also to provide real-time consumption through cluster machines.
Kafka architecture design of the distributed publish/subscribe message system
Apache Kafka code example
Apache Kafka tutorial notes
Principles and features of Apache kafka (0.8 V)
Kafka deployment and code instance
Introduction to Kafka and establishment of Cluster Environment
Ii. terms related to Kafka
- A BrokerKafka cluster contains one or more servers called brokers.
- Each message published to the Kafka cluster of a Topic has a category called Topic. (Messages of different topics are stored separately physically, messages of a Topic in logic are stored on one or more brokers, but you only need to specify the Topic of the message to produce or consume data without worrying about where the data is stored)
- PartitionPartition is a physical concept. Each Topic contains one or more partitions.
- The Producer is responsible for publishing messages to the Kafka broker.
- Consumer, the client that reads messages from the Kafka broker.
- Consumer Group each Consumer belongs to a specific Consumer Group (you can specify a group name for each Consumer. If no group name is specified, it belongs to the default group ).
Ii. Download and install Kafka
1. Download
1 |
wget http://apache.fayea.com/kafka/0.9.0.1/kafka_2.11-0.9.0.1.tgz |
2. Installation
12 |
tar zxvf kafka_2. 11 - 0.9 . 0.1 .tgz cd kafka_2. 11 - 0.9 . 0.1 |
3. cluster configuration
Two servers are configured: 192.168.1.237 and 192.168.1.238. Each server is installed with two zookeeper and the port is 2181 (zookeeper is not described). Each server is configured with three brokers for Kafka.
3.1 server. properties configuration
123456 |
broker.id = 10 port = 9090 host.name= 192.168 . 1.237 advertised.host.name= 192.168 . 1.237 log.dirs=/tmp/kafka-logs/server0 zookeeper.connect= 192.168 . 1.237 : 2181 , 192.168 . 1.238 : 2181 |
Note: The host. name \ advertised. host. name parameters must be configured as IP addresses. Otherwise, various problems may occur.
3.2 configure server1.properties
1 |
cp config/servier.properties config/server1.properties<br>vim config/server1.properties |
?
123456 |
broker.id = 11 port = 9091 host.name= 192.168 . 1.237 advertised.host.name= 192.168 . 1.237 log.dirs=/tmp/kafka-logs/server1 zookeeper.connect= 192.168 . 1.237 : 2181 , 192.168 . 1.238 : 2181 |
3.3 configure server2.properties
12 |
cp config/servier.properties config/server2.properties vim config/server2.properties |
?
123456 |
broker.id = 12 port = 9092 host.name= 192.168 . 1.237 advertised.host.name= 192.168 . 1.237 log.dirs=/tmp/kafka-logs/server2 zookeeper.connect= 192.168 . 1.237 : 2181 , 192.168 . 1.238 : 2181 |
Note: The same server port and log. dirs cannot be the same. Different server broker. IDS cannot be the same in a cluster.
3.4. Similarly, the server of another server. properties, server1.properties, server2.properties broker. the IDS are 20, 21, and 22 respectively. The ports are 9090, 9091, and 9092. name = 192.168.1.238, advertised. host. name = 192.168.1.238
3.5 start
123 |
bin/kafka-server-start.sh config/server.properties & bin/kafka-server-start.sh config/server1.properties & bin/kafka-server-start.sh config/server2.properties & |
3.6 Monitoring Port
1234 |
netstat -tunpl |grep 2181 netstat -tunpl |grep 9090 netstat -tunpl |grep 9091 netstat -tunpl |grep 9092 |
Check whether the four ports are available. Check whether iptables is enabled when these four IP addresses are added or whether iptables is related. Otherwise, the JAVA connection fails.
Iv. Test
4.1 create a Topic
1 |
bin/kafka-topics.sh --create --zookeeper 192.168 . 1.237 : 2181 --replication-factor 3 --partitions 1 --topic testTopic |
4.2 view creation status
?
1 |
bin/kafka-topics.sh --describe --zookeeper 192.168 . 1.237 : 2181 --topic testTopic |
4.3 producer sends messages
1 |
bin/kafka-console-producer.sh --broker-list 192.168 . 1.237 : 9090 --topic testTopic |
4.4 messages are received for consumption
1 |
bin/kafka-console-consumer.sh --zookeeper 192.168 . 1.237 : 2181 --from-beginning --topic testTopic |
4.5 check the consumer offset position
1 |
bin/kafka-run- class .sh kafka.tools.ConsumerOffsetChecker --zkconnect 192.168 . 1.237 : 2181 --group testTopic |
5. Problems Encountered
1. An error is reported during running for a period of time.
123456 |
# # There is insufficient memory for the Java Runtime Environment to continue . # Native memory allocation (malloc) failed to allocate 986513408 bytes for committing reserved memory. # An error report file with more information is saved as: # //hs_err_pid6500.log OpenJDK 64 -Bit Server VM warning: INFO: os::commit_memory( 0x00000000bad30000 , 986513408 , 0 ) failed; error= 'Cannot allocate memory' (errno= 12 ) |
Solution:
You can adjust the JVM heap size by editingkafka-server-start.sh
,zookeeper-server-start.sh
And so on:
1 |
export KAFKA_HEAP_OPTS= "-Xmx1G -Xms1G" |
The-Xms
Parameter specifies the minimum heap size. to get your server to at least start up, try changing it to use less memory. given that you only have 512 M, you shoshould change the maximum heap size (-Xmx
) Too:
1 |
export KAFKA_HEAP_OPTS= "-Xmx256M -Xms128M" |
For more details, please continue to read the highlights on the next page:
[Content navigation] |
Page 1: installation and testing |
Page 2nd: spring-integration-kafka applications |
Page 7: management tool Kafka Manager |
|