Kafka Project-Application Overview of real-time statistics of user log escalation

Source: Internet
Author: User

1. Overview

  Video tutorial for this course Address: Application Overview of the Kafka Combat project

This course is carried out by a user in real-time escalation of the log, through the introduction of Kafka business and application scenarios, and with everyone to build the Kafka project of the actual development environment. Let's take a look at the lessons of this course, as shown in:

Next, we begin the first class of study: "Kafka review".

2. Content 2.1 Kafka Review

This lesson briefly outlines the considerations for Kafka platform deployment, as well as the business scenarios and scenarios for Kafka in the enterprise. Let everyone know the use of Kafka in the enterprise.

This lesson mainly contains the following knowledge points, as shown in:

First, let's take a look at Kafka's business scenario, which contains the following knowledge points, as shown in:

    • First: Decoupling

It is extremely difficult to anticipate what needs to be encountered in future projects at the start of the project. The message system inserts an implicit, data-based interface layer in the middle of the processing process, which is implemented on both sides of the processing. This allows you to independently extend or modify the processing on both sides, as long as you ensure that they adhere to the same interface constraints.

    • Second: Increase redundancy

In some cases, the process of processing data will fail. Unless the data is persisted, it is lost. Message Queuing persists the data until it has been fully processed, bypassing the risk of data loss in this way. In the Put-get-delete paradigm used by many message queues, it is necessary for your processing system to explicitly indicate that the message has been processed before it is removed from the queue, ensuring that your data is safely saved until you are finished.

    • Third: Improve scalability

Because Message Queuing decouples your processing, it is easy to increase the number of messages queued and processed, as long as additional processing is required. No need to change the code, do not need to adjust parameters. Expansion is as simple as adjusting the power button.

    • IV: Buffering

In any important system, there will be elements that require different processing times. For example, loading a picture takes less time than applying a filter. Message Queuing uses a buffer layer to help the task perform the most efficient execution ——— the processing of the write queue is as fast as possible. This buffering helps to control and optimize the speed of the data flow through the system.

    • V: Asynchronous communication

Many times, users do not want or need to process messages immediately. Message Queuing provides an asynchronous processing mechanism that allows a user to put a message into a queue, but does not immediately process it. How many messages you want to put into the queue, and then deal with them when you need them.

The above is Kafka business scenario introduction, below I give you introduce Kafka application scenario.

The main content of the scenario is as follows:

    • The first is: Push Message

Kafka can be applied to the message system, such as: The more popular message push, these messages push the system's message source, you can use Kafka as the core middleware of the system to complete the production of messages and the consumption of messages.

    • Then: Website Tracking

We can send the Enterprise Portal, user's operation record and other information to Kafka, according to the actual business needs, can be real-time monitoring, or offline processing.

    • The last one is: Log collection Center

A log collection system similar to the Flume suite, but Kafka's design architecture uses push/ Pull. Suitable for heterogeneous clusters, Kafka can submit messages in bulk, for producer, there is essentially no consumption in terms of performance, and in consumer, we can use other systematic storage and analysis systems such as Hadoop,storm.

After we have mastered Kafka's business and application scenarios, let's look at a few things to keep in mind when building a real-time statistics platform: as shown, this is the architecture diagram of the Kafka cluster. Composed of three Kafka nodes, a high-availability cluster, connected by the zookeeper cluster, provides a fast, high-availability, fault-tolerant, distributed coordination service.

Platform considerations include the following knowledge points:

    • Ha Characteristics of Kafka
    • Configuration of the platform core files
    • Cluster boot steps
    • Cluster demo

For detailed procedures and demonstration steps you can watch the video, here I do not do more to repeat. "View Address"

2.2 Project Brief

This lesson explains how to plan the overall process of a project, how to obtain the data sources needed for a project, and how to consume the data.

Let everyone master the project development process, prepare for the subsequent project analysis and design phase.

Its primary knowledge points include the following, as shown in:

Next, we start by outlining the overall process of the project, as shown in:

This is a project of the overall process, this project is a user log escalation system as a case, from the process can be seen, we will the process of the project is divided into four modules: data collection, data collection, real-time streaming computing, data output.

The advantages of using this package are as follows:

    • Business modularity
    • Functional components

We believe that the role of Kafka in the whole process should be single, the whole process of the project she is a middleware. The entire project flow is as shown, so the partitioning makes each business modular and more clearly functional.

    • The first is the Data collection module: We use Apache flume Ng, which is responsible for collecting user-reported log data in real time from each node.
    • Then the data access module: Because the speed of the collected data and the speed of data processing is not necessarily consistent, so here we add a middleware to do processing, using Apache Kafka, in addition, here is a part of the data flow to the HDFs Distributed file system, Facilitates the provision of data sources for offline statistical services.
    • And then there's the real-time streaming: After we've collected the data, we need to do real-time processing of that data, using Apache storm. About Storm cluster installation is relatively simple, specific installation details, you can go to watch the video, "Watch address."
    • After we have completed the streaming compute module, and finally the data output module: After using storm to do the data processing, we need to persist the results of processing, due to the high response speed, the use of Redis and MySQL to do the persistence.

This is even the architecture diagram for the entire process. After describing the flowchart of the entire architecture, let's take a look at the data source production introduction as shown in:

, we can see that the Log collection cluster, formed by Flume Ng, collects the log set and then sends the data to the specified Kafka middleware through the flume sink component, so that the Kafka end of producer has a data source. From Flume to Kafka, the production phase of the data source is accomplished through sink.

After the production of the data source has been completed, let's look at how to consume the data. For the consumption of data sources, let's take a look at the following picture:

From this, we can see that the data source exists in Kafka, through kafkaspout data to storm, then Storm will Kafka data in the consumption processing, through the Storm compute module, according to business needs to do the corresponding processing, to complete the data consumption, Finally, the results of the statistics are persisted to the DB Library.

For more details, you can watch the video tutorial, "Watch Address".

2.3 Kafka Engineering Preparation

This lesson explains the work of creating a project, and the basic environment in which the project needs to be prepared, including the KAFKA monitoring system and the preparation of the Storm swarm. It lays a good foundation for the completion of coding practice.

Its primary knowledge points include the following, as follows:

Let's take a look at what the underlying environment needs to prepare. The contents are as follows:

    • Overview.
    • Basic software Introduction.
    • Tips for use.
    • Preview and demo.

In the Kafka project, we use storm to do computing on the consumer side, so the development language we use is Java, and the code that writes Java is done in the IDE, and the IDE facilitates developer coding. The IDE used in the course is JBoss Studio 8, an IDE developed by Redhat, which is essentially eclipse, but the IDE's integrated plug-in is rich, and if you need it, you can download it on its website. In addition, in the development process, we also need a more important plug-in--KAFKA monitoring tools. This tool can provide us with the convenience of observing the production and consumption of Kafka data, and another on the Storm UI management interface, where we can observe the operation of storm jobs through the storm UI.

After we are familiar with the use of basic software and plug-ins, we create the Kafka project and note the following points when creating a Kafka project:

    • Preparation for the underlying environment, such as Java's basic development package JDK, this needs to be ready, otherwise, we cannot compile and execute the relevant Java code

In addition, prepare the MAVEN environment for easy packaging of our projects.

    • In our existing base environment, we will create related Kafka projects. Here is a demonstration of the specific operating procedures.

About the detailed process of the demonstration, we can go to watch the video, here I do not do more than repeat. "View Address".

3. Summary

In this course we review the relevant content of Kafka, briefly describe the basic outflow of the project and the use of the basic software, we should grasp the following knowledge points, as shown in:

4. Concluding remarks

This is the main content of this course, mainly on the Kafka project preparation, the follow-up study Kafka project actual content laid a good foundation.

If this tutorial can help you, I hope you can click in to watch, thank you for your support!

  Reproduced please specify the source, thank you for your cooperation!

Video tutorial for this course Address: Application Overview of the Kafka Combat project

Kafka Project-Application Overview of real-time statistics of user log escalation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.