Kafka Performance Tuning

Source: Internet
Author: User

main optimization principles and ideas

Kafka is a high-throughput distributed messaging system and provides persistence. Its high performance has two important features:

    • The performance of disk continuous reading and writing is much higher than that of random reading and writing.
    • concurrency, splitting a topic into multiple partition.

To give full play to the performance of Kafka, you need to meet these two conditions

Kafka read-write units are partition, so splitting a topic into multiple partition can improve throughput. However, there is a premise that different partition need to be located on different disks (can be on the same machine). If more than one partition is on the same disk, it means that multiple processes simultaneously read and write to multiple files on one disk, allowing the operating system to schedule frequent disk reads and writes, which can disrupt the continuity of disk reads and writes.

In the Linkedlin test, each machine is loaded with 6 disks, and does not raid, is to make full use of multi-disk concurrent read and write, and ensure that each disk continuously read and write characteristics.

On a specific configuration, you configure multiple directories of different disks to the broker's log.dirs, for example
Log.dirs=/disk1/kafka-logs,/disk2/kafka-logs,/disk3/kafka-logs
Kafka will distribute the new partition in the least partition directory when creating a new partition, so it is generally not possible to set multiple directories of the same disk to Log.dirs

The consumer and partition within the same consumergroup must be guaranteed to be a one-off consumption relationship at the same time.

Any partition can only be consumed by one consumer within a consumer group at a time (in turn a consumer can consume multiple partition at the same time)

JVM Parameter Configuration

It is recommended to use the newest G1 instead of CMS as the garbage collector.
The recommended minimum version is JDK 1.7u51. The following is the JVM memory configuration parameters for the broker in this experiment:

-Xms30g -Xmx30g -XX:PermSize=48m -XX:MaxPermSize=48m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35

G1 compared to the advantages of CMS:

    • G1 is a server-side garbage collector that balances throughput and responsiveness
    • For memory partitioning methods, Eden, Survivor, and old areas are no longer fixed and memory usage is more efficient. G1 effectively avoids memory fragmentation by partitioning the region of memory.
    • G1 can specify when the GC can be used to pause threads (not guaranteed to be strictly adhered to). CMS does not provide a controllable option.
    • The CMS merges compressed memory only after FULLGC, and G1 merges the collection and merging together.
    • CMS can only be used in the old area, when cleaning young is generally used in conjunction with Parnew, and G1 can unify the two types of partition of the recovery algorithm.

G1 Application Scenarios:

    • JVM consumes large memory (at least 4G)
    • The application itself frequently requests and frees memory, which in turn generates large amounts of memory fragmentation.
    • For applications that are more sensitive to GC time.

JVM parameters in detail: http://blog.csdn.net/lizhitao/article/details/44677659

Broker Parameter Configuration

Configuration optimizations are modifying parameter values in the Server.properties file

1. Network and IO operation thread Configuration optimization

# broker处理消息的最大线程数  num.network.threads=xxx  # broker处理磁盘IO的线程数  num.io.threads

Recommended configuration:

The number of threads used to receive and process network requests, which defaults to 3. Its internal implementation is using the selector model. Starting a thread as a acceptor is responsible for establishing the connection, and then starting the num.network.threads thread to take turns responsible for reading requests from the sockets, generally without changes, unless the upstream and downstream concurrent requests are too large. General num.network.threads main processing network IO, read and write buffer data, basically no IO Wait, configure the number of threads is CPU core number plus 1.

The num.io.threads is primarily disk IO and may have some IO waits during peak times, so the configuration needs to be larger. The number of configured threads is twice times the CPU core and no more than 3 times times the maximum.

2. Log data file brush disk policy
In order to significantly increase producer write throughput, you need to write files on a regular basis.
Recommended configuration:

# 每当producer写入10000条消息时,刷数据到磁盘 log.flush.interval.messages=10000# 每间隔1秒钟时间,刷数据到磁盘log.flush.interval.ms=1000

3. Log Retention policy configuration

When the Kafka server is written to a large number of messages, will generate a lot of data files, and take up a lot of disk space, if not cleaned up in time, may not be enough disk space, Kafka default is reserved for 7 days.
Recommended configuration:

# 保留三天,也可以更短 log.retention.hours=72# 段文件配置1GB,有利于快速回收磁盘空间,重启kafka加载也会加快(如果文件过小,则文件数量比较多,# kafka启动时是单线程扫描目录(log.dir)下所有数据文件)log.segment.bytes=1073741824

Tips

    • Kafka official does not recommend the broker side of the log.flush.interval.messages and log.flush.interval.ms to enforce the disk, that the reliability of the data should be replica to ensure that the forced flush data to disk will be the overall performance of the Shadow Ring.
    • You can tune performance by adjusting/proc/sys/vm/dirty_background_ratio and/proc/sys/vm/dirty_ratio.

      1. A dirty page rate exceeding the first indicator will start Pdflush flush Dirty pagecache.
      2. A dirty page rate exceeding the second indicator blocks all write operations to flush.
      3. According to different business requirements can be appropriate to reduce dirty_background_ratio and improve dirty_ratio.

If the amount of data in the topic is small , consider reducing log.flush.interval.ms and log.flush.interval.messages to force the data to be brushed, reducing the inconsistency that may result from the cache data not being written.

4. Configuring the JMX service
Kafka server does not start the JMX port by default and requires the user to configure

[lizhitao@root kafka_2.10-0.8.1]$ vim bin/kafka-run-class.sh#最前面添加一行JMX_PORT=8060

5. Replica Related configuration:

replica.lag.time.max.ms:10000replica.lag.max.messages:4000num.replica.fetchers:1#在Replica上会启动若干Fetch线程把对应的数据同步到本地,而num.replica.fetchers这个参数是用来控制Fetch线程的数量。#每个Partition启动的多个Fetcher,通过共享offset既保证了同一时间内Consumer和Partition之间的一对一关系,又允许我们通过增多Fetch线程来提高效率。default.replication.factor:1#这个参数指新创建一个topic时,默认的Replica数量#Replica过少会影响数据的可用性,太多则会白白浪费存储资源,一般建议在2~3为宜。

6. Purgatory

fetch.purgatory.purge.interval.requests:1000producer.purgatory.purge.interval.requests:1000

Let's start by introducing what this "purgatory" is for. One of the main tasks of the broker is to receive and process the request sent on the network. Some of these request can be answered immediately, and it is natural that the request will be directly answered. In addition, there is no way or request for a spontaneous delay response (for example, send and receive batch), broker will put this request into Paurgatory, At the same time, each request added to the Purgatory will also be added to the two monitoring pairs queue:

    • Watcherfor queue: Used to check if request is satisfied.
    • Delayedqueue queue: Used to detect if the request timed out.

The final state of the request is only one, complete. The request is fulfilled and the timeout is eventually uniformly considered complete.

There is a certain flaw in the design of the current version of Purgatory. When the request state transitions to complete, it is not immediately removed from the purgatory, but continues to occupy resources, so taking up memory accumulation will eventually cause oom. This is typically triggered only when the topic traffic is low. More detailed information can be found in the extended reading, do not expand here.

In the actual use I also stepped on this pit, at that time the situation is a new cluster on a topic, the initial topic data very little (low volume topic), resulting in that period of time in the early morning 3, 4 or so will randomly have the broker because Oom hung out. Positioning the reason after the *.purgatory.purge.interval.requests configuration adjustment as small as 100 to solve the problem.

Kafka's research and development team has begun to redesign purgatory to enable the request to be removed from purgatory immediately when complete.

    1. Other
num.partitions:1#分区数量queued.max.requests:500#这个参数是指定用于缓存网络请求的队列的最大容量,这个队列达到上限之后将不再接收新请求。一般不会成为瓶颈点,除非I/O性能太差,这时需要配合num.io.threads等配置一同进行调整。compression.codec:none#Message落地时是否采用以及采用何种压缩算法。一般都是把Producer发过来Message直接保存,不再改变压缩方式。in.insync.replicas:1#这个参数只能在topic层级配置,指定每次Producer写操作至少要保证有多少个在ISR的Replica确认,一般配合request.required.acks使用。要注意,这个参数如果设置的过高可能会大幅降低吞吐量。
producer Optimization
buffer.memory:33554432( +M#在Producer端用来存放尚未发送出去的Message的缓冲区大小. After the buffer is full, you can choose to block send or throw an exception, as determined by the configuration of the block.on.buffer.full. Compression.type:None#默认发送不进行压缩, it is recommended to configure a suitable compression algorithm, which can greatly reduce the network pressure and the storage pressure of broker. linger.ms:0#Producer默认会把两次发送时间间隔内收集到的所有Requests进行一次聚合然后再发送, which increases throughput, and linger.ms further, this parameter aggregates more messages by adding some delay to each send. batch.size:16384#Producer会尝试去把发往同一个Partition的多个Requests进行合并, batch.size indicates the upper limit of the total size of requests after a batch merge. If this value is set too small, it may result in all request not being batch. ACKs:1#这个配置可以设定发送消息后是否需要Broker端返回确认. 0: No need to confirm, the fastest speed. There is a risk of data loss.1: Only leader confirmation is required and no ISR is required for confirmation. is a way of compromising efficiency and safety.All :All replica in the ISR are required to receive acknowledgement, which is the slowest and most secure, but since the ISR may shrink to include only one replica, setting the parameter to all does not necessarily prevent data loss.
Consumer Optimization
num.consumer.fetchers:1#启动Consumer的个数,适当增加可以提高并发度。fetch.min.bytes:1#每次Fetch Request至少要拿到多少字节的数据才可以返回。#在Fetch Request获取的数据至少达到fetch.min.bytes之前,允许等待的最大时长。对应上面说到的Purgatory中请求的超时时间。fetch.wait.max.ms:100
    • With consumer Group, you can support both producer consumers and queue access modes.
    • The Consumer API is divided into two types: high and low. The first kind of heavy dependence zookeeper, so the performance is poor and not free, but super worry. The second is not dependent on zookeeper services, regardless of the degree of freedom and performance of the better performance, but all the exceptions (leader migration, offset out of bounds, broker downtime, etc.) and the maintenance of the offset need to self-processing.
    • We can pay attention to the 0.9 release released soon. The developer also rewritten a set of consumer in Java. Combine the two sets of APIs and remove the dependency on zookeeper. It is said that performance has greatly improved OH ~ ~
list of all parameter configurations

Broker default parameters and configurable list of all parameters:
http://blog.csdn.net/lizhitao/article/details/25667831

Kafka principle, basic concept, broker,producer,consumer,topic all parameter configuration list
http://blog.csdn.net/suifeng3051/article/details/48053965

Reference

Http://bbs.umeng.com/thread-12479-1-1.html
http://www.jasongj.com/2015/01/02/Kafka%E6%B7%B1%E5%BA%A6%E8%A7%A3%E6%9E%90/

Official documents:
Http://kafka.apache.org/documentation.html#configuration

Kafka Performance Tuning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.