Kafka Architecture design of distributed publish-Subscribe message system

Source: Internet
Author: User

Why are we building this system?

Kafka is a messaging system that was originally developed from LinkedIn as the basis for the activity stream of LinkedIn and the Operational Data processing pipeline (pipeline). It is now used by several different types of companies as multiple types of data pipeline and messaging systems.

Activity flow data is the most common part of the data that all sites use to make reports about their site usage. activity data includes content such as page views, information about the content being viewed, and search conditions. This data is typically handled by writing various activities to a file in the form of a log, and then periodically analyzing the files in a statistical manner. Operational data refers to the performance data of the server (CPU, IO Utilization, request time, service log, and so on). There are a wide variety of statistical methods for operating data.

In recent years, activity and operational data processing has become a critical component of the website's software product features, which requires a slightly more complex infrastructure to support it.


Several use cases for activity flow and operational data
    • Dynamic rollup (News Feed) feature. Broadcast information about your friends ' various activities to you.

    • relevance and sorting. It is most relevant to determine the item in a given set of entries by using Count rating, poll (votes), or CTR (Click-through).

    • Security: Websites need to block misbehaving web crawlers (crawler), Rate limits on API usage, detect attempts to spread spam information, and support other behavioral detection and prevention systems to cut off some of the site's unusual activities.

    • Operational monitoring: Most websites require some form of real-time and adaptable way to monitor the efficiency of a site's operations and trigger a warning in the event of a problem.

    • Reporting and batching: it is common to load data into a data warehouse or Hadoop system for offline analysis, and then report on the business behavior accordingly.

Characteristics of activity Flow data

This high-throughput data stream, composed of immutable (immutable) activity data, represents a real challenge to computing power, because its volume of data can easily be 10 to 100 times times larger than the amount of data in a second-place data source on a site.

Traditional log file statistical analysis is a very good and scalable method for reports and batch processing, but this method is too time-delayed for real processing, and it also has a high degree of operational complexity. On the other hand, the existing Message Queuing system (messaging and queuing system) is well suited for use in real-time or near-real-time (near-real-time) situations, but they are not very good at processing long, unhandled message queues. Data persistence is often not considered as a top priority. This creates the case that when a large amount of data is transferred to an offline system such as Hadoop, these offline systems can only process some of the source data per hour or per day. The goal of the Kafka is to become a queue platform that can be used both offline and online to support both scenarios.

The Kafka supports very common message semantics (messaging semantics). Although our article is primarily intended to be used for activity processing, there is no restrictive condition that makes it only applicable to this purpose.


Deployment


The following is an example of a topology formed by each system after deployment in LinkedIn.


It is important to note that a single Kafka cluster system is used to process all activity data from a variety of different sources. It also provides a single data pipeline for both online and offline data users, creating a buffer layer between online activity and asynchronous processing. We also use Kafka to replicate all data (replicate) to a different data center for offline processing.

instead of having a single Kafka cluster system spanning multiple datacenters, we want Kafka to support the data flow topology of the multi-datacenter. This is achieved by mirroring or "synchronizing" between clusters. This is a very simple feature, and the mirrored cluster is just running as the data consumer of the source cluster. This means that a single cluster will be able to centralize data from multiple datacenters into one location. An example of a multi-datacenter topology that can be used to support bulk load (batch loads) is shown below:

Note that there is no communication connection between the two clusters in the upper part of the diagram, which may be of different sizes and with a different number of nodes. This single cluster in the following section can mirror any number of source clusters.


The main design elements

Kafka is different from most other information systems because of a few of the more important design decisions:

    1. Kafka is designed to consider persistent messages as a common use case.

    2. The primary design constraint is throughput, not functionality.

    3. State information about what data has been used is saved as part of the data consumer (consumer) instead of being stored on the server.

    4. Kafka is an explicit distributed system. It assumes that data producers (producer), proxies (brokers), and data consumers (consumer) are scattered over multiple machines.

These design decisions are detailed in the following article.


Basic knowledge


First look at some basic terminology and concepts.

message refers to the basic unit of communication. a message from the message producer (producer) about a topic (topic) , which means that the message was sent in a physical way as a Agent (Broker) server (which may be another machine). Several message consumers (consumer) Subscribe (subscribe) to a topic, and then each message that the producer publishes will be sent to all users.

Kafka is an explicit distributed system in which producers, consumers, and agents can runas a logical unit, theon different machines in a cluster that collaborates with each other. For agents and producers, it's natural to do this, but users need some special support. Each consumer process belongs to aUser Group (consumer group)。To be precise, each message is sent only to one process in each user group. Therefore, the user group makes many processes or multiple machines logically appear as a single user. The user group concept is very powerful and can be used to support JMSQueuing (queue)OrTopic (topic)Both of these semantics.to supportQueuesemantics, we can make all the users a single user group, in which case each message is sent to a single user. To supportTopicsemantics, each consumer can be divided into its own consumer group, and all subsequent users will receive each message. In our use, a more common scenario is that we logically divide multiple user groups, each of which has a cluster of multiple user computers as a logical whole. In the case of big data, Kafka has an added advantage, and for a topic, no matter how many users subscribe to it, the messages are stored only once.

Messages persisted (message persistence) and their cachesDon't be afraid of file systems!


when storing and caching messages, Kafka heavily relies on the file system. It is generally thought that "disk is slow", so people are skeptical about the performance of persistent knot (persistent structure) that can provide plausible performances. In fact, compared to people's expectations, the disk can be said to be both slow and fast, depending on how the disk is used. A well-designed disk structure can often be as fast as a network.

One of the key facts about disk performance is that in the last more than 10 years, the throughput of the hard disk is becoming significantly inconsistent with the disk seek time. As a result, on a RAID-5 disk array consisting of 6 7200rpm SATA hard drives, the linear write (linear write) speed is approximately 300mb/seconds, but the write is only 50k/seconds, where the difference is nearly 10,000 times times. Linear reads and writes are one of the most predictable modes of all usage patterns, so the operating system uses pre-read (Read-ahead) and post-write (Write-behind) techniques to detect and optimize disk reads and writes. pre-reading is to read the contents of a larger disk block into memory in advance, and the latter is to combine some small logical writes into a larger physical write operation. In some cases, sequential disk access can be faster than the memory access immediately!

to counteract this performance fluctuation, the modern operating system has become increasingly active in using main memory as a disk cache. all modern operating systems will be happy to All free memory goes to disk cache, and there are some performance costs in case you need to reclaim these memory . This unified cache is required for all disk read and write operations. It is not easy to discard this feature unless you are using direct I/O. therefore, for a process, even if it holds a copy of the data in the in-process cache, the data may be duplicated in the OS's page cache (Pagecache), and the structure is saved two times in a single piece of data.


Further, we are a system developed on the basis of the JVM, as long as anyone who has known about memory usage in Java knows these two points:

    1. The memory overhead (overhead) of Java objects is very large, often twice times (or worse) of the memory that is stored in the object.

    2. Memory garbage collection in Java is becoming more and more ambiguous as data in the heap grows, and the cost of recycling becomes larger.

because of these factors, using the file system and relying on the page cache is better than maintaining a cache or structure in memory - - by automatically having access to all free memory, we will at least double the available cache size. Then, by saving the compressed byte structure instead of a single object, the available size of the cache can then be doubled again. in doing so, we can get up to 28 to 30G of cache on a machine with 32G of memory in case the GC performance is not lost. Furthermore, this cache will remain valid even after a service restart, unlike in-process caching, which requires a cache rebuild in memory after a process restart (10G of cache rebuild may take up to 10 minutes) or it will need to start running with a full empty cache (so that its initial performance can be very bad). This also greatly simplifies the code, because all the logic for maintaining consistency between the cache and the file system is now implemented in the OS, which is more efficient and more accurate than the one-time caches we do in the process. If you use disk more in favor of linear read operations, then with each disk read operation,Pre-Reading willvery efficient use of data that can then be used to populate the cache.

Kafka Architecture design of distributed publish-Subscribe message system

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.