Hqueue: hbase-based message queue

Last Update:2014-07-24 Source: Internet

Author: User

Tags failover hadoop mapreduce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hqueue: hbase-based message queue Ling Bai? 1. hqueue Introduction

Hqueue is a distributed and persistent Message Queue developed by the offline system team of Taobao search web page capturing Based on hbase. It uses htable to store message data, uses hbase coprocessor to encapsulate the original keyValue data into the message data format for storage, and encapsulates the hqueue client API for Message Access Based on hbase client API.

Hqueue can be effectively used in scenarios where time series data needs to be stored, used as input and output of mapreduce jobs and istream for upstream and downstream data sharing.

? 2. hqueue features

Hqueue is based on hbase for message access, so standing on the shoulders of HDFS and hbase makes it have the following features:

? (1) supports multiple partitions. You can set the queue scale as needed and support high-concurrency access (multiple region of hbase );

? (2 )? Supports automatic failover. If any machine is down, partition can be automatically migrated to other machines (hbase's failover mechanism );

(3 )? Supports Dynamic Load Balancing. Partition can be dynamically scheduled to the most reasonable machine (the loadbalance mechanism of hbase can be dynamically adjusted );

? (4) use hbase for persistent storage of messages without data loss (hbase hlog and HDFS append );

? (5) The read/write mode of the queue is compatible with the storage features of hbase, and has good concurrent read/write performance (the latest message is stored in memstore, And the write message is directly written to memstore, generally, memory-level operations are performed );

? (6) Support classified access to messages by topic (qualifier in hbase );

? (7) supports message TTL and automatic clearing of expired messages (hbase supports keyValue-level TTL );

? (8) hqueue = htable schema design + hqueue coprocessor + hbase client wrapper, fully scalable for development without any hack work and can be automatically upgraded with hbase;

(9 )? Hqueue client API is simple encapsulation Based on hbase client wrapper. hbase's thriftserver enables it to support multilingual APIs. Therefore, hqueue can easily encapsulate multilingual APIs;
(10) The hqueue client API can naturally support hadoop mapreduce job and istream's inputformat mechanism, and use the locality feature to schedule computing to the nearest storage machine;

? (11) hqueue supports message subscription (hqueue 0.3 and later versions ).

3. hqueue system design and processing process 3.1. hqueue System Structure

? Hqueue system structure (1:

Figure (1): hqueue System Structure

Where :?

? (1) Each queue corresponds to an htable. You can create a queue in the presharding table mode, which is conducive to Server Load balancer.

? (2) Each queue can have multiple partitions (hbase regions), which are evenly distributed among multiple region servers in the hbase cluster.

? (3) Each partition can be dynamically migrated in multiple region servers of the hbase cluster. If any region server fails, the hqueue partition running on it can be automatically migrated to other region servers without data loss. When the Cluster load is not balanced, hqueue partition will be automatically migrated to the region server with low load by hmaster.

? (4) Each message corresponds to an hbase keyValue pair, which is stored in hbase region in chronological order by messageid. Messageid consists of Timestamp and auto-incrementing sequenceid under the same timestamp. For more information, see message storage structure.

? 3.2. Message Storage Structure

? Message storage structure (2:

Figure (2): Message Storage Structure

Where :?

? (1) rowkey: consists of partitionid and messageid.

? Partitionid :? A single queue can have multiple partitions. Currently, a maximum of short. max_value partitions are supported. The partition ID can be set when the message object is not created, but when the message is sent, or a random partition ID is used without being specified.
? Messageid: Message ID, which consists of Timestamp and sequenceid. Timestamp is the timestamp when a message is written into hqueue, in milliseconds. Sequenceid is the sequence number of messages under the same timestamp. Currently, a maximum of short. max_value messages under the same timestamp are supported.

? (2) column: it consists of column family and message topic.

Column family: hbase column family, which is the fixed value "message ".
Message topic: hbase column qualifier, message topic name. You can store messages under different topics as needed, or obtain the topics message data you are interested in from the queue.

? (3) value: the message content.

? 3.3. hqueue message writing and coprocessor processing process

? Hqueue uses the hqueue client API to write message data. To ensure unique and ordered messages, hqueue uses coprocessor to process the messageid of the messages written by the user, and then immediately puts it into hbase memstore so that it can be accessed, in the last persistent hlog. Specific processing logic (3:

? Figure (3) Data Writing and coprocessor processing process

? Where:

? (1) hqueue encapsulates the hqueue client API. You can use put or other methods to write messages to hqueue.

(2) The hqueue client uses message. makekeyvaluerow () to convert the message data structure to hbase rowkey. The rowkey format required by hqueue can be used in the preceding content.
(3) After converting rowkey, The hqueue client calls the htable PUT Method to write messages according to the hbase standard writing process.
(4) hqueuecoprocessor is registered on hqueue, which is extended from baseregionobserver. Before writing the message data, hregion calls the prebatchmutate method of hqueuecoprocessor. This method is mainly used to adjust the messageid to ensure that the messageid is unique and orderly.
(5) In the prebatchmutate method of hqueuecoprocessor, the durability is adjusted to skip_wal at the same time, so that hbase will not take the initiative to persist the message data into hlog.
(6) After hregion writes the message data, it will call the postbatchmutate method of hqueuecoprocessor. This method is mainly used to persist the message data into hlog.

? 3.4. hqueue scan processing process

? To facilitate data scanning from queue, hqueue encapsulates clientclustering and provides dimensions such as queuescanner, partitionscanner, and combinedpartitionscanner for different scenarios. The specific processing process of hqueue scan is as follows (4:

Figure (4): hqueue scan processing process

Where:

(1) You can obtain the required queue from the hqueue client as needed. Currently, three queue types are provided:

Queuetions: used to scan all partitions in the queue;

? Partitionpartition: used to scan data specified by partition in the queue;

? Combinedpartitiontions: used to scan several specified partitions in the queue.

(2) After obtaining the consumer, you can call the next method of consumer cyclically to retrieve the message data until no data is returned. This scan ends. After the scan is complete, the user should take the initiative to close the Shard to release resources in a timely manner.
(3) When the user no longer uses the previously created queue object, the user should take the initiative to close the queue so as to release resources in time.

? 3.5. hqueue subscription process 3.5.1. Overall process

Hqueue has been available for subscription since version 0.3. A subscriber can subscribe to multiple partitions and topics of a queue. Compared with the way users use the consumer to actively scan message data, the subscription method has the following features: (1) Once the message data is written into the queue, the message is actively pushed to the subscriber, and the message delivery is more timely; (2) the subscriber passively receives new messages, saving the unnecessary scan operations when hqueue does not have new message data and reducing system overhead.

Hqueue subscription process processing logic (5:

? Figure (5): hqueue subscription process processing logic

Where:

(1) hqueue subscription consists of subscriber, Zookeeper, and coprocessor. Where:

? Subscrier: the subscriber. It mainly writes subscription information to zoeokeeper, starts listening, receives new messages, and calls back messagelistener.
? Zookeeper: used to save the subscription information submitted by the subscriber. It mainly includes the queue, partitions, and topics of the subscriber subscription, and the subscriber's address and checkpoint. For more information, see subsequent descriptions.
? Coprocessor: Mainly used to obtain subscription information from zookeeper, use internalkeeper to scan the latest messages from the queue, send new messages to subscribers, and update the current checkpoint to zookeeper.

(2) The main process of coprocessor is as follows:
Step 1: Create a subscriber, add subscription information and message processing functions, write subscription information to zookeeper, and start the listener to wait for receiving new messages. Subscription information written into zookeeper mainly includes:

? The name of the queue subscribed by the subscriber;
? The queuee partitions subscribed by the subscriber and the start ID of the message on each partition. A subscriber can subscribe to multiple partitions. If not specified, it is considered that all partitions of the queue are subscribed.
? Topics subscribed by the subscriber. A subscriber can subscribe to multiple topics. If this parameter is not specified, all topics on the queue are subscribed.
? The addresss/hostname and listening port of the subscriber. When creating a subscriber, you can specify a listener port. If no listener port is specified, a currently available port is randomly selected as the listener port.

Step 2: coprocessor obtains the subscription information from zookeeper and registers watcher with zookeeper, so that zookeeper can notify coprocessor in time when the subscription information in zookeeper changes. After obtaining the subscription information, coprocessor creates subscriptionworker and other working threads as needed to scan messages from hqueue partition and send them to subscriber.
Step 3: coprocessor scans new messages from hqueue partition.
Step 4: coprocessor sends the new message to subscriber.
Step 5: When the subscriber receives a new message, it calls back the callback function registered on it.
Step 6: after the new message is successfully sent, coprocessor updates the message checkpoint to zookeeper for later use.
Step 7: subscriber cancels the subscription and deletes necessary subscription information from zookeeper.
Step 8: zookeeper notifies coprocessor of changes in subscriber subscription information by registering the watcher on it. coprocessor suspends working threads such as subscriptionworker based on changes in subscription information.

3.5.2. hqueue subscriber

? Hqueue subscriber structure and main processing logic (6:

? Figure (6): hqueue subscriber structure and main processing logic

Where:

? (1) Subscriber consists of subscriberzookeeper and thrift server. Among them, subscriberzookeeper mainly completes several operations related to zookeeper, including writing subscription information and deleting subscription information. Communication between coprocessor and subscriber is completed through thrift. The thrift server is started in subscriber to listen to the specified port and wait for receiving new messages sent from coprocessor.

(2) After the subscriber receives a new message through the thrift server, it calls back the messagelisteners registered on it and returns the status code to coprocessor.
(3) You can register multiple messagelisteners on a subscriber. Multiple messagelisteners will be called in sequence.

? 3.5.3. hqueue coprocessor

? Hqueue coprocessor structure and main processing logic (7:

? Figure (7): hqueue coprocessor structure and main processing logic

Where:

? (1) coprocessor: consists of subscriptionzookeeper and subscriptionworker.

Subscriptionzookeeper: Mainly used to complete zookeeper-related work, including obtaining subscription information from zookeeper and registering related watcher and subscriptionworker to update checkpoint to zookeeper.
Subscriptionworker mainly consists of messageproducer and messagesender. It completes operations such as scanning new messages, sending messages to subscriber, and updating checkpoints.

(2) messagemessages mainly creates internalmessages, scans new messages from queue partition, and puts them into the Buffer Queue for moderate operations.

? When there is no free space in the Buffer Queue, messageenders will wait until the messages in the buffer queue are consumed by messagesender to free up space.
? When no new message exists in the queue partition, messageworker actively sleep. When a new message is written, coprocessor will wake up messageworker through subscriptionworker and start a new scan.

(3) messagesender extracts new messages from the Buffer Queue, sends them to the subscriber, and waits for the subscriber to send back a response. When no new messages exist in the Buffer Queue, messagesender waits until new messages arrive.
(4) checkpointupdater in messagesender regularly writes the current checkpoint to the relevant subscription node in zookeeper for later use.

? 3.5.4. subscription information hierarchy

Hqueue-related subscription information is stored in zookeeper, And the subscription information hierarchy in zookeeper (8) is shown in:

Figure (8): subscription information hierarchy

Where:

(1) The Subscriber node (subscriber_x) records the checkpoint of the subscriber on the queue partition. This checkpoint is written by the subscriber when the subscriber initiates the subscription and updated by the checkpointupdater in subscriptionworker messagesender.
(2) There are two temporary nodes under the subscriber node: Address and topics, respectively saving the subscriber's IP address/hostname: Port and the subscribed topic. When the subscriber cancels the subscription, the two temporary nodes are deleted. When the subscriber unexpectedly exits and the session fails, Zookeeper deletes the temporary node.

? 3.5.5. subscriber thrift Service

The hqueue subscription function uses thrift to simplify multi-language client support. Subscriber starts the thrift server, listens to the specified port, receives messages, and calls back messagelisteners to process messages. The interface used to describe the services provided by hqueue subscriber is defined as follows:

namespace java com.etao.hadoop.hbase.queue.thrift.generated/*** HQueue MessageID*/struct TMessageID {  1: i64 timestamp,  2: i16 sequenceID}/*** HQueue Message*/struct TMessage {  1: optional TMessageID id,  2: optional i16 partitionID,  3: binary topic,  4: binary value}/*** HQueue Subscriber Service*/service HQueueSubscriberService {  i32 consumeMessages(1:list<TMessage> messages)}

4. hqueue uses 4.1. hqueue Toolkit

For ease of use, hqueue encapsulates the hqueue client API for accessing message data. In hqueue 0.3, the hqueue log O & M tool is integrated into the hqueue shell to form the hqueue toolkit, which provides users with one-stop services to facilitate the management of queue and queue subscribers.

Similar to hbase shell, you can use $ {hbase_home}/bin/hqueue shell to access the hqueue shell command line tool. Note that you must ensure that the hqueue toolkit has been deployed before using the hqueue toolkit.

? Hqueue toolkit includes the following commands: Create a queue, disable queue, enable queue, delete a queue, and clear a queue .? Example:

(1) create a queue

Usage: Create 'queue _ name', partition_count, TTL, [configuration dictionary]

Descriptions:

Queue_name: name of the hqueue to be created. Required parameter.

Partition_count: Number of partitions of the hqueue to be created. A required parameter.

TTL: The expiration time. A required parameter.

Configuration dictonary: optional configuration parameters. Currently, the following configuration parameters are supported: (1) hbase. hqueue. partitionsperregion; (2) hbase. hregion. memstore. flush. size; (3) hbase. hregion. majorcompaction; (4) hbase. hstore. compaction. min; (5) hbase. hstore. compaction. max; (6) hbase. hqueue. compression; (7) hbase. hstore. blockingstorefiles.

Examples:

Hqueue> Create 'q1', 32,864 00

Hqueue> Create 'q1', 32,864 00, {'hbase. hqueue. partitionsperregion '=> '4', 'hbase. hstore. compaction. min' => '16', 'hbase. hstore. compaction. max '=> '32 ′}

(2) Clear the queue

USAGE：truncate_queue ‘queue_name‘DESCRIPTIONS：

Queue_name: name of the queue to be cleared. A required parameter.

EXAMPLES：

hqueue(main):013:0> truncate_queue ‘replication_dev_2_test_queue‘

Note that this command is different from the truncate command in hbase shell. This command only deletes data in the queue and retains the presharding information of the queue .? For more information, see: http://searchwiki.taobao.ali.com/index.php/HQueue_Toolkit#Queue.E7. AE .A1.E7.90.86 (3) add subscriber usage: add_subscriber 'queue _ name', 'subscriber _ name' descriptions:

Queue_name: queue name, required parameter.

Subscriber_name: subscriber name, required parameter.

EXAMPLES：

add_subscriber ‘replication_dev_2_test_queue‘, ‘subscriber_1‘

(4) Delete A subscriber

USAGE：delete_subscriber ‘subscriber_name‘, ‘queue_name‘DESCRIPTIONS：

Queue_name: the name of the queue subscribed by the subscriber. A required parameter.

Subscriber_name: subscriber name, required parameter.

EXAMPLES：

hqueue(main):040:0> delete_subscriber ‘replication_dev_2_test_queue‘, ‘subscriber_1‘

For more information, see: http://searchwiki.taobao.ali.com/index.php/HQueue_Toolkit#.E8. AE .A2.E9.98.85.E8.80.85.E7. AE .A1.E7.90.86

4.2. Put

? The put operation in the hqueue client API can write user message data to hqueue. Put supports batch operation. The usage example is as follows:

Hqueue queue = new hqueue (queuename); string topic1 = "crawler"; string value1 = "http://www.360test.com"; // write a single message data with no partition ID specified. If the partition ID is not specified, one of all partitions of the queue is randomly selected. Message message1 = new message (bytes. tobytes (topic1), bytes. tobytes (value1); queue. Put (Message); // specify the partitionid explicitly when writing a message. Short partitionid = 10; queue. put (partitionid, message1); List <message> messages = new arraylist <message> (); messages. add (message1); string topic2 = "dump"; string value2 = "http://www.jd.com"; message message2 = new message (bytes. tobytes (topic2), bytes. tobytes (value2); messages. add (message2); // write multiple message data without specifying the partition ID. Queue. Put (messages); // write multiple message data and specify the partition ID. Queue. Put (partitionid, messages); queue. Close ();

4.3. Scan

? To facilitate the user to scan message data from the queue, the hqueue client API provides three custom partitions: queuescanner, partitionscanner, and combinedpartitionscanner. The example is as follows:

String queuename = "subscription_queue"; queue = new hqueue (queuename); // start timestamp long currenttimestamp = system. currenttimemillis (); messageid startmessageid = new messageid (currenttimestamp-6000); messageid stopmessageid = new messageid (currenttimestamp); scan = new scan (startmessageid, stopmessageid ); // Add the topic scan. addtopic (bytes. tobytes ("topic1"); scan. addtopic (bytes. tobytes ("topic2"); Messa GE message = NULL; // use queuetions to scan queuescanner queuetions = queue. getqueuetions (SCAN); While (Message = queuetions. Next ())! = NULL) {// No-op} queueworkflow. close (); short partitionid1 = 1; // use partitionpartition to scan the data partitionpartition specified in the queue. partitionpartition = queue. getpartitioninterval (partitionid1, scan); While (Message = partitioninterval. next ())! = NULL) {// No-op }? Partitionmessages. close (); short partitionid2 = 2; Map <short, scan> partitions = new hashmap <short, scan> (); // Add multiple partitionspartitions. put (partitionid1, scan); partitions. put (partitionid2, scan); combinedpartition?combined=queue. getcombinedpartitiontions (partitions); While (Message = combinedtions. next ())! = NULL) {// No-op }? Combinedtasks. Close ();?? Queue. Close ();

? 4.4. subscribe to messages

? Hqueue has provided the subscription function since version 0.3. The usage example is as follows:

Hqueue queue = NULL; hqueuesubscriber subscriber = NULL; try {string queuename = "subscription_queue"; queue = new hqueue (queuename); set <pair <short, messageid> partitions = new hashset <pair <short, messageid> (); // Add the subscribed partitions pair <short, messageid> partition1 = new pair <short, messageid> (short) 0, null); partitions. add (partition1); pair <short, messageid> partition2 = new pair <short, messagei D> (short) 1, null); partitions. add (partition2); pair <short, messageid> partition3 = new pair <short, messageid> (short) 2, null); partitions. add (partition3); // Add the subscribed topics set <string> topics = new hashset <string> (); topics. add ("topic_1"); topics. add ("topic_2"); topics. add ("topic_3"); // subscribername string subscribername = "subscriber_1"; subscriber = new subscribername (subscribername, topic S); subscribe. addpartitions (partitions); // Add the callback function list <messagelistener> listeners = new listener list <messagelistener> (); messagelistener blackholelistener = new listener (subscribername); listeners. add (blackholelistener); // create subscriber = queue. createsubscriber (subscriber, listeners); subscriber. start (); thread. sleep (600000l );? Subscriber. Stop ("time out, request to stop subscriber:" + subscribername );?} Catch (exception ex) {log. Error ("caught ed unexpected exception when testing subpipeline.", ex) ;}finally {If (queue! = NULL) {try {queue. Close (); queue = NULL;} catch (ioexception ex) {// ignore the exception }}}

4.5. thriftserver API

? Hbase's own thriftserver provides support for htable multi-language APIs. hqueue has extended support for hqueue in hbase thriftserver, this allows convenient access to hqueue in C ++, Python, PHP, and other languages.

? Hqueue currently provides the thrift API as follows:

1	Scannerid messagescanneropen (1: Text queuename, 2: I16 partitionid, 3: tmessagescan messagescan)	Based on scan, open the partition on a partition in the queue
2	Tmessage messagescannerget (1: scannerid ID)	Get Message one by one
3	List <tmessage> messagescannergetlist (1: scannerid ID, 2: i32 nbmessages)	Batch get messages
4	Void messagescannerclose (1: scannerid ID)	Disable scannerid
5	Void putmessage (1: Text queuename, 2: tmessage)	Write a message to the queue and use a random partition ID.
6	Void putmessages (1: Text queuename, 2: List <tmessage> tmessages)	Batch write messages to the queue, using random partition ID
7	Void putmessagewithpid (1: Text queuename, 2: I16 partitionid, 3: tmessage)	Write a message to the queue and use the specified partition ID.
8	Void putmessageswithpid (1: Text queuename, 2: I16 partitionid, 3: List <tmessage> tmessages)	Batch write messages to the queue using the specified partition ID
9	List <text> getqueuelocations (1: Text queuename)	Obtain the address of the host with all partition in the queue.

5. Summary

The above is a brief description of the concept, features, system design, processing process, and application of hqueue, and I hope to help you.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More