Learn Apache Kafka

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Apache Kafka

Challenge: ① collects massive amounts of data; ② analysis.

Analysis includes: User behavior data, application performance tracking, Dynamic Data displayed in log, event information ...

Kafka can process real-time information and quickly route it to multiple consumers. Provides seamless integration of information between producers, without blocking consumption, and the producer does not need to care who the consumer is.

It is an open-source, distributed, partitioned, and post-subscription messaging system that is based on replication log submissions.

① Persistent Messaging: ensure that messages are not lost, provide an O (1) constant-time performance disk design, and support high-capacity storage (TB). Information is persisted to the hard disk and replicated in the cluster to prevent data loss;
② High Throughput: processing Hundreds of MB of read and write operations per second;
③ Distributed: cluster-centric, message partitioning on Kafka servers (Maintenance of ordering semantics on each partition) and distribution of consumption on the cluster. Clusters can grow flexibly and transparently without downtime;
④ Multi-client: supports simple integration of clients from different platforms (Java, NET, PHP, Ruby, Python);
⑤ Real-time: messages generated by producer threads are immediately visible to consumer threads (this feature is very important for event-based systems, eg. complex event processing (CEP) systems)

Provides a real-time publish-subscribe solution that also supports parallel data loading in Hadoop.

in terms of production, there are different types of producers: eg.
① log generated by the front-end Web application;
② generate a producer agent for web analytics logs;
③ the producer adapter that generated the conversion log;
④ generates a producer service that invokes the trace log.

in terms of consumption: eg.
① offline consumers are using messages and storing them in Hadoop or traditional data warehouses for offline analysis;
② near real-time consumers, consuming information and storing it in NoSQL (Eg.hbase or Cassandra) for near real-time analysis;
③ like spark or storm, you can filter messages in memory to trigger alert events for related groups. 2. Why do we need Kafka?

Data typically includes user activity, event logins, page access, clicks, social networking activities such as like, share, and comment, actions
and system metrics (due to high throughput (millions of messages per second), typically handled by logging & Legacy Log Aggregation solutions-for offline analysis Eg.hadoop)
Very limited to building real-time processing systems.

Real-time analysis includes:
① based on search-related relevance, recommendations based on popularity, co-occurrence or sentiment analysis, advertising to the public, crawling from spam or unauthorized data, device sensors that send high-temperature alerts, any unusual user behavior, or hacker behavior of the application.

The real-time use of these multiple sets of data collected from production systems is a challenge due to the large volume of data being collected and processed.

The Kafka goal is to unify both offline and online processing by providing a mechanism:
The parallel load in the Hadoop system and the ability to consume the partitions on a set of machines in real time (it is useful to process streaming data).
From an architectural point of view, it is closer to a traditional messaging system, such as ACTIVEMQ or RABITMQ.

Reference: Learning Apache Kafka Second Edition

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Learn Apache Kafka

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Learn Apache Kafka

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support