December 2010, written in Scala, with Push/pull architecture, which is more suitable for the transfer of heterogeneous cluster data.
Kafka Features
Persistent message: No information is lost, providing stable terabytes of message storageHigh throughput: Kafka design works on commercial hardware, providing millions of messages per secondDistributed architecture, capable of partitioning messagesReal
250,000 messages per second (in megabytes), processing 550,000 messages per second (in megabytes).
Persistent operation is possible. Persist messages to disk, so it can be used for bulk consumption, such as ETL, and real-time applications. Prevent data loss by persisting data to the hard disk and replication.
Distributed system, easy to scale out. All producer, brokers, and consumer will have multiple, distributed. Extend the machine without
Kafka Learning Road (ii)--improve the message sending process because Kafka is inherently distributed , a Kafka cluster typically consists of multiple agents. to balance the load, divide the topic into multiple partitions , each agent stores one or more partitions . multiple producers and consumers can produce and get messages at the same time . Process:1.
Kafka FoundationKafka has four core APIs:
The application uses Producer API a publishing message to 1 or more topic (themes).
The application uses Consumer API to subscribe to one or more topic and process the resulting message.
Applications use Streams API acting as a stream processor, consuming input streams from 1 or more topic, and producing an output stream to 1 or more output topic, e
there is only one server, there is no redundant backup, it is a single machine instead of a cluster
If you have more than one server
Messages are stored sequentially in order that the message can only be appended, the message cannot be inserted, each message has an offset, which is used as the message ID, and the only offset in a partition is consumer saved and managed, so the reading order is actually completely consumer determined. Messages that are not necessarily linear have a
publishing and subscriptions. It is understood that the Kafka can produce about 250,000 messages per second (in megabytes), processing 550,000 messages per second (in megabytes).2. Persistent operation is possible. persist messages to disk, so it can be used for bulk consumption, such as ETL, and real-time applications. Prevent data loss by persisting data to the hard disk and replication.3. Distributed system, easy to scale out, can be combined with
/zookeeper.properties (with to be able to exit the command line)2. Start Kafka server:bin/kafka-server-start.sh. /config/server.properties 3. Kafka provides us with a console for connectivity testing, and we'll run producer:bin/kafka-console-producer.sh--zookeeper localhost:2181--topic Test this is equivalent to open
task 0.0 in stage 483.0 (TID 362) 2018-10-22 11:28:16 INFO shuffleblockfetcheriterator:54-getting 0 N On-empty blocks out of 1 blocks2018-10-22 11:28:16 INFO shuffleblockfetcheriterator:54-started 0 remotes fetches in 0 ms2018-10-22 11:28:16 INFO executor:54-finished task 0.0 in stage 483.0 (TID 362). 1091 bytes result sent to driver2018-10-22 11:28:16 INFO tasksetmanager:54-finished task 0.0 in stage 483.0 (TID 3 4 ms on localhost (executor driver) (1/1) 2018-10-22 11:28:16 INFO taskscheduleri
interface implementation of the Tupletokafkamapper.
Kafka version 0.10 is using the new producer API for the api,0.11 version of old producer
For Old Producer
If key = = NULL, then in the Kafka, the Random inch of a partition to write the data, then as long as not restart,
broker.
Topic: Each message published to the Kafka Cluster has a category, which is called Topic. (Physically different topic messages are stored separately, logically a topic message is saved on one or more brokers but the user only needs to specify the topic of the message to produce or consume data without worrying about where the data is stored)
Partition:partition is a physical concept, and each topic contains one or more Partition.
In the previous section (Point this transfer), we completed the Kafka cluster, in this section we will introduce the new API in version 0.9, and the test of Kafka cluster high availability1. Use Kafka's producer API to complete the push of messages1) Kafka 0.9.0.1 Java Client dependency:2) Write a Kafkautil tool class
Kafka does not provide a high availablity mechanism in previous versions of 0.8, and when one or more broker outages, all partition on the outage cannot continue to provide services. If the broker can never be restored, or if a disk fails, the data on it will be lost. And Kafka's design goal is to provide data persistence, at the same time for the distributed system, especially when the cluster size rise to a certain extent, one or more machines down
Http://www.haokoo.com/internet/2877400.htmlKafka in versions prior to 0.8, the high availablity mechanism was not provided, and once one or more broker outages, all partition on the outage were unable to continue serving. If the broker can never recover, or a disk fails, the data on it will be lost. One of Kafka's design goals is to provide data persistence, and for distributed systems, especially when the cluster scale rises to a certain extent, the likelihood of one or more machines going down
threads that limit parallel consumer messages cannot be greater than the number of partitions(4). The number of partitions also limits the producer send message is the specified partition. If the partition is set to 1,producer when the topic is created, a custom partitioning method is used to specify that the partition is 2 or higher, and the number of partitions can be increased by alter–partitions.replic
into sequential write, combined with the zero-copy features greatly improved IO performance. However, this is only one aspect, after all, the ability of single-machine optimization is capped. How can you further increase throughput by horizontally scaling even linear scaling? kafka is the use of partitioning (partition), which enables the high throughput of message processing (either producer or consumer)
sequential write, combined with the zero-copy features greatly improved IO performance. However, this is only one aspect, after all, the ability of single-machine optimization is capped.How can you further increase throughput by horizontally scaling even linear scaling? Kafka is the use of partitioning (partition), which enables the high throughput of message processing (either producer or consumer) by bre
:2182,127.0.0.1:2183
Modify server2.properties as follows:
broker.id=2listeners=PLAINTEXT://127.0.0.1:9094port=9094host.name=127.0.0.1log.dirs=/opt/kafka/kafkalogs2zookeeper.connect=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183Start the Kafka cluster and Test
1. Start the service
# Start the Kafka cluster from the background (three need to be started) # enter the
on, the reliability of the step-by-step analysis, and finally through the benchmark to enhance the knowledge of Kafka high reliability.
2 Kafka Architecture
As shown in the figure above, a typical Kafka architecture consists of several producer (which can be server logs, business data, page view generated at the f
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.