Kafka Series Tutorial 2 (Design Construction and principles 1)

Source: Internet
Author: User

Kafka uses a number of non-mainstream (unconventional) and practical designs to make it efficient and scalable. In practical use, Kafka shows the superiority of the message system relative to the common popular. And can process hundreds of gigabytes of new data every day. Similar to collecting real-time data to get queries, recommendations, and advertisers interested in content, you need to calculate a large number of fine-grained clickthrough rates, as well as those pages that are not clicked. On Facebook about 6TB logging of user behavior events, China Mobile generates about 5-8TB logs for call logs. Early processing of this data is done by taking the logs offline and fetching the logs (scraping log) for processing. Recently some distributed log collection has been produced, such as Facebook's scribe, Yahoo's Datahighway,cloudera flume. These systems are primarily designed to collect and load data to warehouse or Hadoop for offline consumption processing. But in our Linkdin (social networking) Sometimes we need some extra demand, unlike offline analysis, we need some real-time and similar applications with just a few seconds of delay. So we made a Kafka, which combines the advantages of a traditional log collection and messaging system. On the one hand, Kafka is distributed and extensible, and provides high throughput. On the other hand. Kafka provides APIs similar to the messaging system and enables applications to consume data in real time. has been running successfully for more than 6 months in the LinkedIn production environment. Related InformationTraditional messaging systems are used to process data flows asynchronously. However, they are not very suitable for log processing. The reasons are as follows: 1. Traditional enterprise messaging systems have mismatched features: they focus on the assurance of message delivery (such as WEBSPHEREMQ support transactions to allow applications to automatically insert multiple queues, and the JMS specification allows each individual message to be acknowledged. There is a potential for the message to be non-sequential. )。 These guarantees are often too lethal, and occasional event loss is irrelevant for log collection. 2. Most systems do not focus on high throughput. For example, JMS does not have an API to explicitly mass-produce data in a single request. 3. These systems are weaker for distributed support. There are no number of methods for partitioning and storing messages into multiple clusters. 4. Most systems assume that messages will be consumed immediately. Messages that are not consumed in the queue are usually very small. Once the message is stacked (accumulate), performance will be significantly reduced. A large number of dedicated log collectors are also available for some time, such as Facebook's scribe. Yahoo's Datahighway project. Cloudera's flume is a relatively new log collector that supports extensible pipes and sinks so that log stream data can be very flexible and also supports distribution. But most use it for offline consumption and expose some unnecessary implementation details (such as the Miniute file). In addition, most use a "push" approach to enable broker to push messages to consumers. This is not very useful in LinkedIn (consumers want the fastest retrieval, avoid "pushing" too much, resulting in higher performance consumption than the consumption of message processing). At the same time, pull also facilitates "rewind" (multiple times to obtain data for the same period). Recently Yahoo developed the pub/sub system called Hedwig support for extended and reliable, durable assurance. This tends to be used on the commit log of the storage data storage System. Kafka structure and principle  We use a lot of ways to ensure the efficient  1 of the system. Simple storage. The Kafka has a very simple storage design. Each partition corresponds to a logical log. The physical last log is made up of multiple files. Each file contains a fragment of the log (estimated at approximately 1GB). Each time a producer posts a message to a partition, the broker simply appends the message to the final file fragment. For better performance, we flush a file fragment once the message volume reaches a specified amount or time exceeds a certain value. Only the messages that have been flush can be seen by consumers. Unlike other typical messaging systems, messages stored in Kafka do not have a clear message ID, in fact, each message is specified in the logical offset position of a log. This can reduce the overhead of maintaining, dense queries, random storage, and so much access to the index structure (the data structure used for the map message ID to the actual message location). Although our message IDs is growing, it is not coherent. In order to get the ID of the next message, we must add the length of the current message to its own ID. So our message ID and offset can actually be converted to each other. Typically consumer sequences consume a partition, if the consumer has an ACK to the offset of a partition, which indicates that the consumer receives all messages before offset. Behind this (underthecovers) is actually the consumer asynchronous pull request broker to get a buffer and be ready to consume the application. Each pull request contains the starting offset of the message and the number of bytes that are acceptable (note that it is not the number of messages). Each broker has a sorted list of offsets indexes in memory, containing the offset of the first message in each shard file. The broker instructs the specific message in the request to be in that fragment file, and then sends the message back to the consumer. When the consumer receives the message, it calculates the offset of the next message and consumes and acts as the next request parameter. For Kafka logs and memory index depictions such as:    2. Efficient transmission We are very concerned about the incoming outgoing message in Kafka. We said that producers can submit a batch of messages as a request. Although the consumer API in the traversal message is one, and behind the consumer is to get a batch of messages as a request. Another non-mainstream option is that we avoid explicit caching of messages in the Kafka layer. In fact, we rely on the underlying file system page cache. This helps to avoid double caching, and that is, the message caches only one copy in the page cache. This also has the added advantage of keeping the cache warm after Kafka restarts. Therefore, Kafka does not cache messages in process at all. Therefore, the GC overhead is also very small. It is also possible to develop an efficient implementation based on the VM language. Since consumers and producers are sequences of manipulating file fragments, consumers lag behind producersThe heuristic cache (write-through caching and readahead) of the common operating system can be very good. We find that both consumption and production maintain a linear data size regardless of the amount of terabytes of data. In addition, we optimize the consumer's network access. Kafka is to support multiple consumers accessing the same message at the same time. Typically a typical message sent from a local file to a remote socket is completed by the following steps: 1.os reads data from disk to page file. 2. Copy the data from the page file to the application cache. 3. Copy the app cache to another core cache. 4. Send the core cache to the socket. This includes 3 data replication and two system calls. In fact, many systems have a sendfile api that can directly funa data into a file channel to Socketchannel. This avoids 2 replications and one system call (2nd and 3rd steps). Kafka uses SENDFILEAPI to effectively send data from file fragments to consumers.  3. A stateless broker differs from other messaging systems. The status of how many messages the consumer has handled is not maintained by the broker. They are maintained by the consumers themselves. This design reduces a lot of complex operations and overhead. But it also makes it tricky to delete messages. Because the broker is not sure whether all subscribers consume the message. Therefore, Kafka uses a simple service-side-determined time-based (TIME-BASESLA) retention policy. That is, messages are automatically deleted (usually 7day) after a certain period has been retained. And it actually works well in practice. Kafka does not cause performance degradation due to the large number of data files. Another aspect of this is also beneficial for consumers to repeatedly acquire and consume data that has already been consumed. This is contrary to the concept of the usual queue. But it is proven to be a necessary feature among many consumers. For example, the consumer side when a logic error occurs, the application can replay some messages until the error is corrected. This mechanism is particularly important in the process of loading ETL data into a data warehouse or Hadoop system. For example, consumers may flush messages to persistent storage on a regular basis. If the song consumer hangs, the message will be lost without flush. If this is the case, the consumer can set a checkpoint to record the smallest non-flush offset. Once the consumer restarts, you can continue spending from this checkpoint. We note that the "rewind" mechanism is simpler to use with a rabbi.   Distributed Coordination  Kafka has a concept of a consumer group. Each consumer is topics by one or more consumers in common consumption. For example, a message can only be received by one consumer in the consumer group, while consumers in different consumer groups can get all the messages and do not need to reconcile between the consumer groups. Consumers of the same consumer group can be located in different processes or even different machines. Our aim is to divide these consumers into different brokers. It does not require much co-ordination overhead at the same time. The first decision is that at the same time, the different partitions in each topic can only be consumed by one consumer in the consumer group. If multiple consumers consume the same partition, it can incur additional overhead (such as reconciling which consumer consumes which message, and the cost of lock and status). In our design, the consuming process only needs to be reconciled when the load is re-assigned (which partitions are allocated to which consumer consumption) (this coordination is not recurring and the overhead can be ignored). In order to ensure the effectiveness of load balancing. We require that the number of partitions be greater than the number of consumers in the consumer group. The second decision was not to have a central node, but to let the consumers coordinate themselves (to a central style). To avoid causing a variety of complex operations due to concerns about the central node being hung out. For this we introduced the zookeeper. Kafka use Zookeeper to do the following things. 1. Detect the addition or removal of broker and consumer. 2. Trigger the reload of each consumer process when 1 occurs. 3. Maintain the consumer relationship and track the offset of the consumer's message on the partition consumption. In particular, when each broker or consumer is up, the corresponding information is registered to the zookeeper. such as the broker's Hostname,port, what Topic,partition location, consumer in which consumer group, which topic in consumption and so on. Each consumer group is an owner-registered unit that contains the partition of each subscription, the content of which consumer is currently consuming which partition (i.e. the consumer is the owner of the partition), the location of the offset that has been consumed. Broker, Consumer, owner's registered path is ephemeral (temporary path). The registration of offset is a persistent path. If the broker is hung, all of the above partition registration information will be removed. If a consumer hangs up will lose the consumer registration information and the own partition. Each consumer should pay attention to the broker registration information as well as the consumer registration information. If the broker or consumer group content is sent, the change is subject to notification. When a consumer initiates or receives notification of changes in broker or other consumer numbers, the consumer initializes the reload processor to determine which new partitions are owned by that consumer. Processing algorithms see:   by reading broker and consumer registration letters in the ZookeeoerThe remaining available collection of partitions PT and the currently existing consumer Collection CT, and then the number of PT/CT partitions divided into several pieces. The current consumer's position in the CT determines which block to belong to, and then writes the information to its own relevant registration information. Finally, the consumer initiates a thread to pull data from the corresponding partition. Offset is from the offset registration information. When the message is pulled, the consumer periodically updates the offset to the offset registration information. When more than one consumer is in the same consumer group, the change in broker or consumer numbers will give each consumer a notification, so that each consumer may have a slightly different time to receive the notification. This is where a consumer-owned partition may also be a partition owned by another consumer. If this happens, the first consumer simply frees up the owning partition, waits for a while and then re-loads the balance again. In practice, the re-load can be stabilized after several retries. When the consumer group is created, there is no data in the offset registration information. At this point the consumer chooses to start reading from the smallest or largest offset position (configuration decision). The offset data can be obtained by invoking broker with the API we provide.   Delivery Guarantee  Typically Kafka is guaranteed to be delivered at least once (at-least-oncedelivery). The exact one-time delivery (Oncedelivery) usually requires two phase commit, which is not required in our application. In most cases, the message is actually delivered once to each consumer group. But once the consumer process is hung up and not cleaned up, the other consumer takes over the partition and begins to read the message after the last successful commit to offset in zookeeoer, which is likely to repeat the message that the original consumer did not successfully submit the latest offset. If the application cares about this situation, it needs to write down the logic itself. You can rely on the offset we provide or take the unique key in the message. This approach is more cost effective than two-phase submissions. Kafka guarantees that messages from the same partition are delivered to the consumer in order. However, the order of the messages from different partitions is not guaranteed. To avoid log corruption, Kafka stores the CRC in the log for each message. Once the broker has an IO error, Kafka runs the recovery process to remove any messages that are inconsistent with the CRC match. Using the CRC also allows us to check for network anomalies during production and consumption. Once the broker is dead, all messages that are not consumed will not be available. If persistent data is permanently destroyed, messages that are not consumed are lost forever. In the future we plan to add the built-in redundancy (redundantly) replica to Kafka in multiple brokers (in fact, Kafka now supports replicas).The application of Kafka in LinkedIn  Shows the initial development version of the scenario. We have a Kafka cluster running in the data center with user Facingservice. The frontend waiter becomes a large number of logs and is sent to the broker in bulk. We rely on hardware load-balancer to distribute publishing requests. Online consumer Kafka data is also in the same data center. We also deployed a Kafka cluster to perform offline analysis in another data center. Close proximity to Hadoot clusters and other data warehouses. The Kafka instance here runs embedded consumers pulling data from Kafka to the field data center. We then Yunxi the data-loading process from the copy to Hadoop and other data warehouses. We also run a lot of reporting processes and analyze the data, and we use the Kafka cluster for prototype analysis (prototyping) and run small scripts to process the original event stream for temporary queries. The end-to-end delay can be around 10 seconds without too much adjustment, which has already met our needs. Contrast TestWe conducted some experimental comparisons, mainly comparing ACTIVEMQ (using kahadb persistence), RABBITMQ and Kafka performance. We make the best possible configuration for each system. Test machine: 2 Linux machines, configured with 8 2GHz cores, 16GB memory, 6 RAID10 disks. Use a 1Gb network. One machine is used for servers, and the other is for producers and consumers. Producer Test: The results are as follows: The comparison value Kafka is at least twice times the RABBITMQ, which is an order of magnitude of activemq. One is because the Kafka does not require an ACK, and the other is because the Kafka message header is small, 9bytes, and ACTIVEMQ because of the JMS protocol makes its head 144bytes larger. Consumer testing, Results: We use a consumer consumer this 10 million message. Both RABBITMQ and ACTIVEMQ are set to acknowledge automatically. On average, Kafka consumption is about 4 times times that of Activemq and RABBITMQ. This is due to the fact that the storage format of the Kafka is more efficient and transmits less data, and the other reason is that both activemq and RABBITMQ need to maintain the delivery status after each message is sent.

Kafka Series Tutorial 2 (Design constructs and principles 1)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.