consumer-Cluster Push Mode-Introduction: 0, Background introduction
Consumer is mainly used to request producer generated messages from the broker and consume them; for ROCKETMQ, we must be very curious, how to achieve distributed consumer consumption, how to ensure the order of consumer, not duplication of it.
The problems that exist:
1. If you add/reduce groups (group) Consumers in cluster mode, it may lead to duplication of consumption, because:
Suppose the new consumer before, Consumera is consuming messagequeue-m, consumption to 3rd offset,
This time the new Consumerb, then according to the cluster model of the Allocatemessagequeue strategy, may messagequeue-m be assigned to the Consumerb, This time consumera due to the consumption of offset not real-time updates back, will lead to Consumerb and consumera before the consumption overlap;
2. What to do if consumption fails.
3. Exception Handling
4. Use of threads, auto variables
I. Introduction of the Terminology
Topic: The most fine-grained subscription unit, a group can subscribe to multiple topic messages
Group: Groups, a group can subscribe to multiple topic
ClientId: The identification of a service (ip/machine) in which a machine can have multiple groups, while the clientId of multiple identical group groups together to consume messages
MessageQueue: Message Queuing, one topic of a broker has multiple messageQueue
Offset: In each message queue there is also the difference between offset (commitoffset, offset), why 2 offset it??
Cluster consumption:
Broadcast consumption:
Immediate consumption:
Sequential consumption:
Consumption location:
Offsetstore---------Commitoffset: offset from consumption
pullrequest------The difference between offset: the position of the pull
II. Overall Framework
third, data structure
Data structure is mainly divided into 2 parts to explain:
Part of the mqclientinstance in the unified management, whether consumer or producer, can be unified management of the parts are placed in this region;
There is also a part of the consumer or producer to distinguish between management, such as the respective subscription MessageQueue, the following for these two sections are described separately;
-----------------------------------------------Parti:mqclientinstance---------------------------
1. Topicroutedata: Used to hold all queue information, whether consumer or producer
Private String ordertopicconf;//brokername:num Count
private list<queuedata> Queuedatas;
Private list<brokerdata> Brokerdatas;
Private hashmap<string/* brokeraddr * *, list<string>/* Filter Server */> filterservertable;
2.QueueData: Internal by wirte or read to differentiate the queue belongs to consumer (read)/producer (write)
Private String brokername;
private int readqueuenums;
private int writequeuenums;
private int perm;
private int topicsynflag;
Address information for 3.brokerdata:broker
Private String brokername;
Private hashmap<long/* Brokerid * *, string/* broker address */> Brokeraddrs;
4. Pullrequest: Pull the request information, including the group information, to pull the offset information, queue information, consumption progress information
Private String Consumergroup;
Private MessageQueue MessageQueue;
Private Processqueue Processqueue;
Private long Nextoffset;
5. Pullmessageservice: Pulls the information the service, will continuously traverse each pullrequest to carry on the information the fetching
Private final linkedblockingqueue<pullrequest> pullrequestqueue = new Linkedblockingqueue<pullrequest> () ;
Private final mqclientinstance mqclientfactory;
------------------------------------------------------------------part II: Distinguishing Consumer--------------------------- -----------------------------1. Topicpublishinfo: This is the data structure that producer uses to save MessageQueue
Private Boolean ordertopic = false;
Private Boolean havetopicrouterinfo = false;
Private list<messagequeue> messagequeuelist = new arraylist<messagequeue> ();
Private Atomicinteger Sendwhichqueue = new Atomicinteger (0);
2. Subscriptiondata: Packaging Consumer consumption information, such as topic, subscribe to tags
Public final static String Sub_all = "*";
Private Boolean classfiltermode = false;
Private String topic;
Private String subString;
Private set<string> Tagsset = new hashset<string> ();
Private set<integer> CodeSet = new hashset<integer> ();
Private Long subVersion = System.currenttimemillis ();
3.RebalanceImpl
concurrenthashmap<string/* Topic * *, set<messagequeue>> topicsubscribeinfotable
ConcurrentHashMap <string/* Topic * *, subscriptiondata> subscriptioninner
<messagequeue, processqueue> Processqueuetable
4.MessageQueue
Private String topic;
Private String brokername;
private int Queueid;
5. Processqueue
Private final Treemap<long, messageext> msgtreemap = new Treemap<long, messageext> ();
Private volatile long Queueoffsetmax = 0L;
Private final Atomiclong Msgcount = new Atomiclong ();
6.RemoteBrokerOffsetStore
Private final mqclientinstance mqclientfactory;
Private final String groupname;
Private final Atomiclong storetimestotal = new Atomiclong (0);
Private Concurrenthashmap<messagequeue, atomiclong> offsettable =
new Concurrenthashmap<messagequeue, Atomiclong> ();
Iv. Main class management (group, instance, topic)
4.1 Defaultmqpushconsumer (Group): used to set the main parameters, including: Group name, consumption mode, consumption offset, number of threads, bulk pull out the size
4.2 Defaultmqpushconsumerimpl (Group): including Rebalanceimpl,offsetstore,allocatestrategy
4.3 Offsetstore (Group): There are 2 modes, cluster mode and broadcast mode are different; the first is: Remotebrokeroffsetstore, the second is Localfileoffsetstore, It will record the offset position we have consumed.
4.4 Rebalanceimpl (Group): There are 2 modes, Rebalancepushimpl,rebalancepullimpl, which correspond to the process of push-pull 2 modes, which are used to distribute all MessageQueue evenly, Then, for the push mode, it will be drawn according to different position, and for the pull mode, its pull position will always be the No. 0 one;
4.5 Pullmessageservice: Loop All the pullrequest, constantly call pullmessage for MessageQueue pull
4.6 Rebalanceservice: Loops all consumer, calls to all consumer dorebalance
4.7 Allocatemessagequeuestrategy: The strategy of assigning messages, dividing all MessageQueue into instance
4.8 Pullapiwrapper
4.9 Consumemessageservice: There are 2 types of models, Consumemessageconcurrentlyservice and Consumemessageorderlyservice, Used to invoke MessageListener for specific consumption
4.10 MessageListener: Client-implemented interface for business logic processing
4.11 Mqclientapiimpl: For network connection processing
v. Overall module
Consumer is mainly divided into the following modules:
1. Rebalance module:
It consists of the following parts:
Rebalanceimpl
Allocatemessagequeuestrategy
Rebalanceservice
Add Pullrequest
1.1 The average distribution method: If the use of allocatemessagequeueaveragely, the main work is as follows:
Used to assign a topic mqset to individual consumer cidset by policy, explain the terms:
Mqset: Is can consume all queue, can understand into a piece of big cake;
Cidset: It can be understood as all the consumers of the topic, who eat this piece of cake.
The strategy here is to traverse each consumer, then iterate through each of the consumer's topic and invoke rebalancebytopic on each topic; The average equalization strategy here is to get all midset and Cidset, And then divide them evenly, according to the illustrated words:
A. Midset<cidset
B.midset > Cidset, and Midset%cidset!=0
C.midset >= Cidset, and midset%cidset=0
1.2 If the Allocatemessagequeueconsistenthash, consistent hash algorithm is adopted, then the allocation strategy is as follows:
The main line with the consistency of the hash algorithm consistent, here mainly involves a few parameters, one is the queue, one is consumer, in fact, is to assign the queue to consumer above consumption. Each consumer has the first CID, in fact, is the start of the setting of the instancename, if not set this value, then he will be ip@pid, which is the physical node above the ring, but in fact, the use of the virtual node, the virtual node of the CID is what it. is actually the ip@pid-index of the physical node (where index is the number of virtual nodes on this physical point). The CID is then MD5 to obtain its ID on the ring.
Here, by the way, what happens when you expand and reduce nodes, and at this point, the rebalance module will actually find that there are new or reduced nodes, then he will call back the allocate policy for reallocation.
2. Pullmessage module:
It consists of the following parts:
Pullmessageservice
Pullapiwrapper
Pullcallback
Consumemessageconcurrentlyservice.processconsumeresult
Consumemessageconcurrentlyservice.consumerequest
The main tasks are as follows:
Iterate through the Pullmessageservice pullrequestqueue,take each pullrequest, and then call Pullmessage for the pull of the message. Call Pullcallback for callback processing after pull
3. Remotebrokeroffsetstore Module
An offset variable is maintained in offsettable, with 2 kinds of operations on this offset, the first of which is to operate the offsettable in Remotebrokeroffsetstore to maintain its local offset And another is persist, which solidify these variables into a remote broker
3.1 Updateconsumeoffsettobroker
Set Updateconsumeroffsetrequestheader as head, then call Updateconsumeroffsetoneway to Update_consumer_offset as request code, Sending information to a broker server
3.2 Set the Removeoffset and remove it from the offsettable
3.3 Query consumer sequence long offset,queryconsumeroffset,query_consumer_offset
4. Consumer module:
This is combined with the above pullmessage, and when Pullmessage is finished, the Pullcallback is recalled. This will call the Consumemessageservice submitconsumerequest for processing, and then update the Offsetstore consumption location and other information
5. Update module:
Update Namesrv
Update Topicrouteinfofromserver: This involves adding subscribe
Update Sendheartbeat: Registering consumer
Update Persistallconsumersetinterval: Update Offsetstore
Update thread pool
6. Network Transmission Module
Mqclientapiimpl
Vi. main processes (push+ cluster model)
Rough article:
1. Defaultmqpushconsumer Create group "cid_001"
2. Call subscribe, the <TOPIC,SUBSCRIPTIONDATA> will be registered to Rebalanceimpl, for subsequent message filtering
3. Defaultmqpushconsumerimpl.start ()
3.1 copysubscription (): Copy defaultmqpushconsumer subscribe information to Defaultmqpushconsumerimpl
3.2 Get Mqclientinstance
3.3 Setting the Rebalanceimpl information
3.4 Creating Pullaipwrapper
3.5 Create Offsetstore, (broadcating) Localfileoffsetstore, (clustering) Remotebrokeroffsetstore
Detailed article:
Corresponds to a topic, corresponding to a subscriptiondata, corresponding to a lot of MessageQueue;
And each MessageQueue, and corresponds to the processqueue,processqueue of each queue to the consumption progress
1.1 Main functions: Lock, unlock, give a lock to the addr given by the function, or unlock the MQ operation for subsequent consumption
1.2 Main functions: dorebalance; traversal <String,SubscriptionData> subscriptioninner structure of each topic, call rebalancebytopic;
Rebalancebytopic:
1.2.1 If it's broadcast mode,
1.2.2 If it's a cluster mode
1.2.2.1 first get topic corresponding to all the Messagequeue,mqall, this is the message queue
1.2.2.2 get all the cidall below the group, this is the consumer queue
1.2.2.3 call Strategy.allocate Get the consumer to consume Set<messagequeue>allocateresultset
1.2.2.4 calls Updateprocessqueuetablerebalance (topic,allocateresultset) to update processqueuetable,
A. First, traverse processqueuetable, find it there, and Allocateresultset not, call removeunnecessarymessagequeue delete it;
B. Secondly, if both have, but in the push mode, reached the pullexpired time, call processqueuetable;
C. Traverse Allocateresultset, find the records that are not in processqueuetable, and add them to list<pullrequest>pullrequestlist, At the same time will Processqueuetable.put (MQ, Pullrequest.getprocessqueue ())
D. Call Dispatchpullrequest (pullrequestlist) with the added list<pullrequest> as a parameter;
To be continued, the above 2 functions
Removeunnecessarymessagequeue
Dispatchpullrequest (pullrequestlist);
seven or one some practical reading experience
1. HeartBeat: Heartbeat needs to be added lock, because heartbeat is equivalent to register, and unregister when the equivalent of cancellation, lock is to prevent the cancellation after the registration, resulting in problems, here the critical variable is consumertable
2. Volatile: When multithreading a variable, use this keyword to prevent compiler optimization, resulting in reading from the register, rather than in real time read from memory
3. Concurrenthashmap: Segment lock, ensure thread safety 4. Atomicinteger: Atom self-reducing