NSQ of message middleware in depth and practice

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

1. Introduction

Recently in the study of some message middleware, commonly used MQ such as Rabbitmq,activemq,kafka. NSQ is a distributed real-time messaging platform based on the Go language, which is based on the MIT Open Source protocol, an easy-to-use message middleware developed by Bitly Corporation.
The official and third parties also developed numerous client function libraries for NSQ, such as the official HTTP-based NSQD, go client go-nsq, Python client pynsq, node. JS-based JavaScript client NSQJS, asynchronous C client LIBNSQ, Java client Nsq-java and numerous third-party client function libraries based on a variety of languages.

1.1 Features

1). Distributed
NSQ provides a distributed, decentralized, and non-single-point-of-failure topology with stable message delivery guarantees and high fault tolerance and HA (high availability) features.
2). Scalable easy to expand
NSQ supports horizontal scaling, without centralized brokers. The built-in discovery service simplifies the addition of nodes in the cluster. Supports pub-sub and load-balanced message distribution at the same time.
3). Ops Friendly
NSQ is very easy to configure and deploy, and is inherently bound to a management interface. Binary packages do not have run-time dependencies. The official has Docker image.
4.Integrated Height Integration
The official Go and Python libraries are available. It also provides libraries for most languages.

1.2 Components

    • Topic: A Topic is a logical key for a program to publish messages, and Topic is created when the program first publishes the message.
    • Channels:channel is related to consumers, is the load balance between consumers, channel in a sense is a "queue." Whenever a publisher sends a message to a topic, the message is copied to all the consumer connected channel, and the consumer reads the message through this special channel, in effect creating the channel when the consumer first subscribes. The channel arranges the message, and if no consumer reads the message, the message is first queued in memory, and the equivalent is too large to be saved to disk.
    • Messages: The message forms the backbone of our data stream, and consumers can choose to end the message, indicate that they are being processed properly, or queue them up again to be processed later. Each message contains the number of delivery attempts, and when the message is delivered more than a certain number of thresholds, we should discard the messages or treat them as additional messages.
    • NSQD:NSQD is a daemon that is responsible for receiving, queuing, and delivering messages to clients. It can run on its own, but usually it is configured by the cluster where the NSQLOOKUPD instance resides (it can declare topics and channels in this way so everyone can find it).
    • NSQLOOKUPD:NSQLOOKUPD is the daemon responsible for managing topology information. The client discovers the producer of the specified topic (topic) by querying NSQLOOKUPD, and the NSQD node broadcasts the topic (topic) and channel information. There are two interfaces: the TCP interface, which NSQD uses to broadcast. HTTP interface, which the client uses to discover and manage.
    • Nsqadmin:nsqadmin is a WEB UI that aggregates real-time statistics for a cluster and performs different administrative tasks.

Common tool Classes:

    • Nsq_to _file: Consumes the specified topic (topic)/channels (channel), and writes to the file, there are selected scrolling and/or compressed files.
    • Nsq_to _http: Consumes the specified topic (topic)/channels (channel) and executes the HTTP requests (Get/post) to the specified endpoint.
    • Nsq_to _NSQ: Consumer-specified topic/channel and republish message to destination NSQD via TCP.

1.3 Topological structure

NSQ recommends using co-locating publishers with their corresponding NSQD instances, which means that even in the face of network partitions, messages are kept locally until they are read by a consumer. More importantly, publishers do not have to discover other NSQD nodes, and they can always post messages to local instances.

First, a publisher sends a message to its local nsqd to do this by first opening a connection and then sending a publish command containing the topic and the message body, in which case we publish the message to the event topic to be dispersed among our different workers.
Event topic replicates these messages and queues on each channel that connects topic, and in our case there are three channel, one of which acts as the file channel. Consumers will get these messages and upload them to S3.

Messages for each channel are queued until a worker consumes them, and if the queue exceeds the memory limit, the message is written to disk. The NSQD node broadcasts their location information to Nsqlookup first, and once they are registered successfully, the worker discovers all NSQD nodes that contain the event topic from the Nsqlookup server node.

Each worker then subscribes to each NSQD host to indicate that the worker is ready to accept the message. We don't need a complete connectivity graph here, but we have to make sure that each individual NSQD instance has enough consumers to consume their messages, otherwise the channel will be piled up in the queue.

2. Internals

2.1 Message Delivery Guarantee

NSQ guarantees that the message will be delivered at least once, although the message may be duplicated. Consumers should be aware of this, deleting duplicate data or performing operations such as idempotent.
This guarantee is part of the Protocol and workflow, and works as follows (assuming the client successfully connects and subscribes to a topic):
1) The customer indicates that they are ready to receive the message.
2) NSQ sends a message and temporarily stores the data locally (at Re-queue or timeout)
3) Client replies FIN (end) or REQ (re-queued) indicates success or failure, respectively. If the client does not reply, the NSQ will time out at the set times, automatically re-queue the message
This ensures that the only possible scenario for message loss is that the NSQD process is not properly ended. In this case, any information that is in memory (or any buffers that are not flushed to the disk) will be lost.
How to prevent message loss is most important, even if this unexpected condition can be mitigated. One solution is to make a redundant NSQD pair (on a different host) to receive a copy of the same part of the message. Because the consumers you implement are idempotent, processing these messages twice times a day does not affect downstream, and allows the system to withstand any single node failure without losing information.

2.2 Simplified configuration and management

A single NSQD instance is designed to handle multiple data streams at the same time. Streams are called "topics" and topics have 1 or more "channels". Each channel receives a copy of all messages in a topic. In practice, a channel is mapped to a downstream service consumption topic.
Neither the topic nor the channel is pre-configured. Topics are created by first posting a message to a named topic or by subscribing to a naming topic for the first time. The channel is created for the first subscription to the specified channel. All buffered data for the topic and Channel are independent of each other, preventing slow consumers from causing backlogs in other channels (also applicable to topic levels).
A channel typically has multiple client connections. Assuming that all connected clients are in the state of readiness to receive messages, each message is passed to a random client. NSQLOOKUPD, it provides a directory service where consumers can find NSQD addresses that offer topics of interest to them. In terms of configuration, decouple the consumer from the producer (each of them only needs to know where to connect the common instances of NSQLOOKUPD, not the other), reducing complexity and maintenance.
At the bottom level, each nsqd has a long-term TCP connection with NSQLOOKUPD, which periodically pushes its state. This data is nsqlookupd used to inform consumers of NSQD addresses. For consumers, an exposed Http/lookup interface is used for polling. To introduce a new consumer to the topic, simply launch a NSQ client configured with the Nsqlookup instance address. There is no need to change the configuration for adding any new consumers or producers, greatly reducing the overhead and complexity.

2.3 Eliminate single point of failure

NSQ are designed to be used in a distributed manner. The NSQD client (via TCP) connects to all producer instances of the specified topic. No intermediary, no message agent, no single point of failure.
This topology eliminates single-stranded, aggregated, and feedback. Instead, your consumers have direct access to all producers. Technically, which client is connected to which NSQ is not important, as long as there are enough consumers to connect to all the producers to meet a lot of messages, to ensure that everything will eventually be processed. For NSQLOOKUPD, high availability is achieved by running multiple instances. They do not communicate directly with each other and the data is considered final and consistent. The consumer polls all the configured NSQLOOKUPD instances and merges the response. Failed, inaccessible, or otherwise failed nodes will not cause the system to halt.

2.4 Efficiency

For the data protocol, push the data to the client to maximize performance and throughput, rather than waiting for the client to pull the data. This concept, called RDY State, is basically a form of client traffic control.
When a client connects to NSQD and subscribes to a channel, it is placed in a RDY of 0 state. This means that no information has been sent to the client. When the client is ready to receive a message to send, update its command RDY state to the number it prepares to process, such as 100. Without any additional instructions, when 100 messages are available, it is delivered to the client (the server side decrements the RDY count for that client each time). The client library is designed to meet the configuration in RDY number Max-in-flight
The 25% sends a command to update the RDY count (and appropriately consider the connection to multiple NSQD cases, allocated appropriately).

2.5 Heartbeat and timeouts

The TCP protocol for NSQ is push-oriented. After establishing a connection, a handshake, and a subscription, the consumer is placed in a RDY state of 0. When the consumer is ready to receive the message, it updates the RDY status to the number of ready to receive messages. The NSQ client library continues to be managed behind the scenes, resulting in a message control flow. Every once in a while, NSQD will send a Heart jumper connection. The client can configure the interval between heartbeats, but NSQD will expect a response before it sends the next heart drop.
Combination of application-level heartbeat and RDY state, avoid head blocking phenomenon, may also make the heartbeat useless (that is, if the consumer is in the back of the processing message flow in the receive buffer, the operating system will be filled, blocking the heartbeat) in order to ensure progress, all network IO time limit is bound to the configured heartbeat interval associated. This means that you can literally unplug between the network connection NSQD and the consumer, which detects and correctly handles the error. When a fatal error is detected, the client connection is forced to shut down. Messages in transit are super-and re-queued for delivery to another consumer. Finally, errors are recorded and accumulated into various internal indicators.

2.6 Distributed

Because NSQ does not share information between daemons, it is born for distributed operations from the start. Individual machines can be randomly started without affecting the rest of the system, and message publishers can publish locally, even in the face of network partitions.
This "distributed-first" design concept means that NSQ can basically be extended indefinitely, requiring higher throughput? Then add more nsqd. The only shared state is saved on the lookup node, even if they do not need a global view, it is very simple to configure some NSQD to register on some lookup nodes, and the only key is that the consumer can get all the complete set of nodes through the lookup node. A clear fault event--NSQ establishes a set of fault tradeoff mechanisms within the component that are clear about possible failures, which makes sense for message passing and recovery. Although they may not provide a strict level of assurance like the Kafka system, the NSQ simple operation makes the failure situation very obvious.

2.7 No replication

Unlike other queue components, NSQ does not provide any form of replication and clustering, which is what makes it so simple to run, but it does not have enough guarantees for some high-assurance, high-reliability message releases. We can partially avoid this by reducing the time it takes to sync files, simply by configuring it with a flag and supporting our queues via EBS. However, there is still a message that died immediately after it was released, and a valid write situation was lost.

2.8 Not in strict order

Although Kafka is composed of an ordered log, NSQ is not. Messages can enter the queue in any order at any time. In the cases we use, this is usually not the case, as all the data is added to the timestamp, but it is not suitable for situations that require strict ordering.

2.9 No data deduplication function

NSQ for a time-out system, it uses a heartbeat detection mechanism to test whether the consumer is alive or dead. There are a number of reasons why our consumer cannot complete heartbeat detection, so there must be a separate step in consumer to ensure idempotent.

3. Practice the installation process

This article will nsq cluster specific installation process omitted, we can refer to the official website, relatively simple. This part introduces the topology of the author's experiment, and the related information of nsqadmin.

3.1 Topological structure

The experiment uses 3 NSQD service, 2 lookupd service.
Using the officially recommended topology, the message is published by the service and NSQD on a single host. Altogether 5 machines.
NSQ Basic No configuration file, configuration parameters are specified by command line.
The main commands are as follows:
LOOKUPD command

bin/nsqlookupd

NSQD command

bin/nsqd --lookupd-tcp-address=172.16.30.254:4160 -broadcast-address=172.16.30.254bin/nsqadmin --lookupd-http-address=172.16.30.254:4161

Tool classes, which are stored to local files after consumption.

bin/nsq_to_file --topic=newtest --channel=test --output-dir=/tmp --lookupd-http-address=172.16.30.254:4161

Publish a message

curl -d 'hello world 5' 'http://172.16.30.254:4151/put?topic=test'

3.2 nsqadmin

The details of the streams are viewed, including information such as the NSQD node, the specific channel, the number of messages in the queue, and the number of connections.

List all the NSQD nodes:

Statistics for messages:

List of Lookup hosts:

4. Summary

The basic core of NSQ is simplicity and is a simple queue, which means it is easy to do fault reasoning and easy to spot bugs. Consumers can handle failure events on their own without affecting the rest of the system.

In fact, simplicity is the primary factor in our decision to use NSQ, which facilitates maintenance with many of our other software, by introducing a queue that gives us the perfect performance, and even allows us to add a few orders of magnitude of throughput through the queue. More and more consumer require a rigorous set of reliability and sequencing guarantees, which exceeds the simple functionality offered by NSQ.

In combination with our business system, the messages piled up by the node that we need to transfer are relatively sensitive, unable to tolerate a nsqd outage, or if the disk is unusable. This is the main reason why we did not choose the message middleware. Simplicity and reliability do not seem to be fully satisfied. More responsible operations are being shouldered than kafka,ops. On the other hand, it has a replicable, orderly log that can provide us with a better service. But for other consumer suitable for NSQ, it serves us quite well and we look forward to continuing to consolidate its solid foundation.

PS: This article starts with the author's Csdn blog, where it is added to a personal blog.

Reference

    1. NSQ: A distributed real-time messaging platform
    2. Nsq-nyc Golang Meetup
    3. NSQ Docs
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.