Distributed message system

Last Update:2018-12-05 Source: Internet

Author: User

Tags syslog

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Http://dongxicheng.org/search-engine/log-systems/

Including Facebook's scribe, Apache's chukwa, LinkedIn's Kafka, and cloudera's Flume

Kafka

Http://www.cnblogs.com/fxjwind/archive/2013/03/22/2975573.html

Http://www.cnblogs.com/fxjwind/archive/2013/03/19/2969655.html

FlumeFlume User Guide, http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html1.1. Architecture

Flume's architecture is simple, robust, and flexible.

The graph above shows a typical deployment of flume that collects log data from a set of application servers. The deployment consists of a number Logical nodes, Arranged into three tiers. The first tier isAgentTier. Agent nodes are typically installed on the machines that generate the logs and are your data's initial point of contact with flume. They forward data to the next tierCollectorNodes, Which aggregate the separate data flows and forward them to the finalStorage tier.

Logical nodes are a very flexvisible component action. Every logical node has just two components-SourceAndSink.

Both Source and Sink can additionally be configuredDecoratorsWhich perform some simple processing on data as it passes through.

The source tells a logical node where to collect data.

The sink tells it where to send the data.

The only difference between two logical nodes is how the source and sink are configured.

The source, sink, and optional decorators are a powerful set of primitives.

Logical and physical nodes

It's important to make the distinction Logical nodesAndPhysical nodes. A physical node corresponds to a single JAVA process running on one machine in a single JVM instance. Usually there is justOne physical node per machine.

Physical nodes actContainersForLogical nodes, Which are wired together to form data flows. Each physical node can play host to handle logical nodes, and takes care of arbitrating the assignment of machine resources between them.

So, although the agents and the collectors in the preceding example areLogicallySeparate processes, they cocould be running on the samePhysicalNode.

The master assigns a configuration to each logical nodeRun-Time-All components of a node's configuration are instantiated dynamically at run-time, and therefore deployments can be changed frequently times throughout the lifetime of a flume service without having to restart any Java processes or log into the machines themselves. in fact, logical nodes themselves can be created and deleted dynamically.

1.2. Reliability

Flume can guarantee that all data has ed by an agent node will eventually make it to the collector at the end of its flow as long as the agent node keeps running. That is, data can beReliablyDelivered to its eventual destination.

This seems to be better than Kafka and can be customized. It can be divided into the following levels. You can choose either of them as needed:

However, reliable delivery can be very resource intensive and is often a stronger guarantee than some data sources require. therefore, flume allows the user to specify, on a per-flow basis, the level of reliability required. there are three supported reliability levels:

TheEnd-to-EndReliability level,

The first thing the agent does in this setting is write the event to disk in'Write-ahead log'(WAL) so that, if the agent crashes and restarts, knowledge of the event is not lost.

After the event has successfully made its way to the end of its flow, an acknowledgment is sent back to the originating agent so that it knows it no longer needs to store the event on disk.

This reliability level can withstand any number of failures downstream of the initial agent.

TheStore on FailureReliability level, only require an acknowledgement from the node one hop downstream.

If the sending node detects a failure, it will store data on its local disk until the downstream node is retried red, or an alternate downstream destination can be selected.

Data can be lost if a compound or silent failure occurs.

TheBest-effortReliability Level sends data to the next hop with no attempts to confirm or retry delivery. if nodes fail, any data that they were in the process of transmitting or inserting ing can be lost. this is the weakest reliability level, but also the most lightweight.

1.3. scalability

Scalability is the ability to increase system performance linearly-or better-by adding more resources to the system. flume's goal is horizontal scalability-the ability to incrementally add more machines to the system to increase throughput.

1.4. manageability

Manageability is the ability to control data flows, monitor nodes, modify settings, and control outputs of a large system.

The flume master is the point where global state such as the data flows can be managed, byWeb InterfaceOrScriptable flume command shell.

Via the flume master, users canMonitorFlows on the fly, such as load imbalances, partial failures, or newly provisioned hardware.

You can dynamicallyReconfigureNodes by using the flume master. You can reconfigure nodes by using small scripts written in a flexible dataflow specification language, which can be submitted via the flume master interface.

1.5. extensibility

Extensibility is the ability to add new functionality to a system. For example, you can extend flume by adding connectors to existing storage layers or data platforms.

Some general sources include files from the file system, Syslog and syslog-ng emulation, or the standard output of a process. More specific sources such as IRC channels and Twitter streams can also be added.

Similarly, there are generating output destinations for events. although HDFS is the primary output destination, events can be sent to local files, or to monitoring and alerting applications such as ganglia or communication channels such as IRC.

3. Pseudo-distributed mode

Flume is intended to be run as a distributed system with processes spread out writable SSBytesMachines. It can also be run as several processes onSingleMachine, which is called "pseudo-distributed" mode.

3.1. Starting pseudo-distributed flume daemons

There are two kinds of processes in the system: the flumeMasterAnd the flumeNode.

The flume master is the central management point and controls the data flows of the nodes. It is the single logical entity that holds global state data and controls the flume node data flows and monitors flume nodes.

Flume nodes serve as the data path for streams of events. They can be the sources, conduits, and consumers of event data. The nodes periodically contact the master to transmitHeartbeatAnd to get their data flow configuration.

3.1.1. the master

The master can be manually started by executing the following command:

$ flume master

After the master is started, you can access it by pointing a web browser to http: // localhost: 35871 /. this web page displays the status of all flume nodes that have contacted the master, and shows each node's currently assigned configuration. when you start this up without flume nodes running, the status and configuration tables will be empty.

Yes, we provide webui-based monitoring...

3.1.2. The flume Node

To start a flume node, invoke the following command in another terminal.

$ flume node_nowatch

To check whether a flume node is up, point your browser toFlume node status pageAt http: // localhost: 35862 /.

3.2. Refreshing a node via the master

Requiring nodes to contact the master to get their configuration enables you to dynamically change the configuration of nodes without having to log in to the remote machine to restart the daemon. you can quickly change the node's previous data flow configuration to a new one.

The following describes how to "wire" nodes using the master's Web interface.

On the master's web page, click on the config link. you are presented with two forms. these are web interfaces for setting the node's data flows. when flume nodes contact the master, they will notice that the data flow version has changed, instantiate, and activate the configuration.

This is really convenient. When webui is enabled, you can configure the name, Souce, sink, and so on for each node... When node heartbeat is executed next time, its configuration is automatically updated.

If you enter:

Node name:host

Source:text("/etc/services")

Sink:console("avrojson")

You get the file with each record in JSON format displayed to the console.

3.5. Tiering flume nodes: agents and collectors

A simple network connection is your actly just another sink. it wocould be great if sending events over the network was easy, efficient, and reliable. in reality, collecting data from a distributed set of machines and relying on networking connectivity greatly increases the likelihood and kinds of failures that can occur. the bottom line is that providing reliability guarantees introduces complexity and other tradeoffs.

Why do we need to layer flume nodes and write them directly to the storage layer? Why are they divided into agents and collectors?

First, the agent operating system is often not very stable, and there are various fail possibilities, and before storage, it should be more effective if data is pre-processed and integrated.

4. Fully-distributed mode

Steps to deploy flume on a cluster

Install flume on each machine.

Select one or more nodes to be the master.

Modify a static configuration file to use site specific properties.

Start the flume master node on at leastOneMachine.

Start a flume node onEachMachine.

4.2. Multiple collectors 4.2.1. Partitioning agents processing SS multiple collectors

The preceding graph and dataflow spec shows a typical topology for Flume nodes. for reliable delivery, in the event that the collector stops operating or disconnects from the agents, the agents wowould need to store their events to their respective disks locally. the agents wocould then periodically attempt to recontact a collector. because the Collector is down, any analysis or processing downstream is blocked.

When a collector fails, the agent can cache the data locally until the collector recovers and continues sending the data.

This is obviously a bit silly. Isn't there any other collector? It's a bad job. It's better to use other ones. adjust them manually. Manually specifying failover chains

Of course, it is better to adjust it automatically, but not currently work when using multiple masters.

4.4. Multiple masters

The master has two main jobs to perform. the first is to keep track of all the nodes in a flume deployment and to keep them informed of any changes to their configuration. the second is to track acknowledgements from the end of a flume flow that is operating inReliable ModeSo that the source at the top of that flow knows when to stop transmitting an event.

This obviously has a single point of failure...

4.4.3. Running in distributed mode

Running the flume master in distributed mode provides better fault tolerance than in standalone mode, and scalability for hundreds of nodes.

Rolling ing machines to run as part of a distributed flume master is nearly as simple as standalone mode. As before,flume.master.serversNeeds to be set, this time to a list of machines:

<property><name>flume.master.servers</name><value>masterA,masterB,masterC</value></property>

How many machines do I need?The distributed flume master will continue to work correctly as long as more than half the physical machines running it are still working and haven't crashed. therefore if you want to keep ve one fault, you need three machines (because 3-1 = 2> 3/2 ).

Why do we need more than half of them? Even if there is one left, when standalone mode runs, isn't it? I don't understand it now...

Each master process will initially try and contact all other nodes in the ensemble. until more than half (in this case, two) nodes are alive and contactable, the configuration store will be unable to start, and the flume master will not be able to read or write configuration data.

Less than half do not work...

4.4.4. Configuration stores

The flume master stores all its data inConfiguration store. Flume has a pluggable configuration store architecture, and supports two implementations.

The memory-backed config store (MBCS) Stores configurations temporarily in memory. if the master node fails and reboots, all the configuration data will be lost. the MBCS is incompatible with distributed masters. however, it is very easy to administer, computationally lightweight, and good for testing and experimentation.
TheZookeeper-backed config store (zbcs)Stores configurations persistently and takes care of synchronizing them between multiple masters.

Flume and Apache zookeeper.Flume relies on the Apache zookeeper coordination platform to provide reliable, consistent, and persistent storage for node configuration data. a zookeeper ensemble is made up of two or more nodes which communicate regularly with each other to make sure each is up to date. flume embeds a zookeeper server inside the master process, so starting and maintaining the service is taken care. however, if you have an existing zookeeper service running, flume supports using that external cluster as well.

We still need to rely on zookeeper. This is so useful...

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More