I. ROCKETMQ Network Deployment Features 650) this.width=650; "src=" http://img1.tuicool.com/amaIbe.jpg "style=" BORDER:0PX;HEIGHT:AUTO;FONT-SIZE:10.5PT; line-height:1.5; "alt=" amaibe.jpg "/>
(1) nameserver is an almost stateless node that can be deployed in a cluster with no information synchronization between nodes
(2) Broker deployment is relatively complex, broker atmosphere master and slave, a master can correspond to multiple slaver, but a slaver can only correspond to one master, The correspondence between master and slaver is defined by specifying the same brokername, different Brokerid, Brokerid 0 for master, and not 0 for slaver. Master can deploy multiple. Each broker establishes a long connection with all nodes in the nameserver cluster, registering topic information regularly to all nameserver
(3) A long connection is established between producer and one of the nodes in the nameserver cluster (randomly selected), the topic routing information is periodically taken from the nameserver, and a long connection is made to the master that provides the topic service, and the heartbeat is sent to master at timed intervals. Produce is completely stateless and can be deployed in a cluster
(4) Consumer with one of the nodes in the nameserver cluster (random selection) to establish a long connection, periodically from the nameserver to take topic routing information, and to provide topic Service Master, slaver to establish a long connection, and timing to master, Slaver sends a heartbeat. Consumer can subscribe to messages from master or subscribe to messages from slave, the subscription rules are determined by the broker configuration
Two. ROCKETMQ Storage Features
(1) 0 copy principle: Consumer consumption message process, using 0 copies, 0 copies including a 2 in the way, ROCKETMQ use the first way, because the small pieces of data transmission requirements more effective than sendfile way
A) using the Mmap+write method
Pros: High efficiency with small file block transfers, even with frequent calls
Disadvantage: Can not be very good use of DMA mode, will be more than Sendfile CPU resources, memory security control complex, need to avoid JVM crash problem
b) using the Sendfile method
Advantages: Can use DMA mode, consumes less CPU resources, large file transfer efficiency, no memory security new problems
Disadvantage: Small block file efficiency is less than mmap mode, can only be bio-mode transmission, can not use NiO
(2) Data storage structure
650) this.width=650; "src=" Http://img2.tuicool.com/IFjMV3.png "style=" Border:0px;height:auto "alt=" ifjmv3.png "/ > three. ROCKETMQ key Features 1. Stand-alone support for more than 1W of persistent queues
650) this.width=650; "src=" Http://img0.tuicool.com/InaIRz.png "style=" border:0px none;height:auto;vertical-align: middle;text-align:center;margin:0px auto; "/>
(1) All data stored separately in commit Log, fully sequential, random read
(2) The queue shown to the end user actually stores the message only in the location of the commit Log, and the serial method brushes the disk
The benefits of doing this:
(1) The queue is lightweight and the amount of data in a single queue is very low
(2) Access to the disk jargon, avoid disk contention, not due to increased queue iowait increase
Each scenario has pros and cons, and his weaknesses are:
(1) Although the write is sequential, but the reading has become a random read
(2) Read a message, read the consume Queue first, then read the commit Log, increase the cost
(3) To ensure that the commit Log is fully consistent with the consume queue, increasing the complexity of the programming
The above shortcomings how to customer service:
(1) Random read, as far as possible to read hit Pagecache, reduce IO operations, so the larger the better the memory. If there are too many messages piled up in the system, reading data to access the hard disk will not cause the system performance to drop due to random reading, the answer is no.
A) when accessing Pagecache, even if only 1K messages are accessed, the system will pre-read more data ahead of time and may hit pagecache at the next reading.
b) random access to commit LOG disk data, the system IO scheduling algorithm is set to NoOp mode, will be to some extent the full random read into a sequential jump mode, and sequential skip read more complete random reading performance 5 times times higher
(2) Because consume queue storage is very small, and sequential read, in Pagecache and read cases, consume queue read performance and memory almost always, even if the accumulation of cases. So you can assume that the consume queue does not hinder read performance at all
(3) The commit log stores all meta information, including the message body, similar to MySQL, Oracle Redolog, so as long as there is a commit log exists, consume queue even if the loss of data, can still recover
2. Brush Disk Strategy
All messages in the ROCKETMQ are persisted, written to the system Pagecache and then brushed to ensure that both the memory and the disk have a piece of data that can be read directly from memory when accessed
2.1 Asynchronous Brush Disk
In the RAID card, SAS 15000 to disk test sequence write file, speed can reach 300M per second, and the network card is generally gigabit network cards, write disk speed is significantly faster than the data network entry speed, then whether the memory can be done to return to the user, by the background thread brush disk it?
(1). Because the disk speed is larger than the network card speed, then the progress of the brush disk must be able to keep up with the message writing speed.
(2). In case of the system pressure is too large, may accumulate the message, in addition to write Io, and read Io, in the event of a disk reading lag, will cause system memory overflow, the answer is no, the reason is as follows:
A) When writing messages to Pagecache, if there is not enough memory, try discarding the clean PAGE, freeing up memory for new messages to use, policy
is the LRU method.
b) If the clean page is insufficient, the write Pagecache will be blocked, the system attempts to brush the disk part of the data, about each attempt 32 PAGE,
To find out more clean PAGE.
In general, memory overflow does not occur
2.2 Sync Brush Disc:
The only difference between the synchronous brush disk and the asynchronous brush disk is that the asynchronous brush disk is finished Pagecache direct return, while the synchronous brush disk needs to wait for the brush disk to complete before returning, the synchronization brush disk process is as follows:
(1) After writing Pagecache, thread waits, notifies brush disk thread to brush disk.
(2) After the brush disk thread brush disk, wake up the front-end waiting thread, may be a batch of threads.
(3) The front-end waiting thread returns success to the user.
650) this.width=650; "src=" Http://img1.tuicool.com/amyie2.png "style=" border:0px none;height:auto;vertical-align: middle;text-align:center;margin:0px auto; "/>
3. Message query
3.1 Query messages by MessageID
650) this.width=650; "src=" Http://img0.tuicool.com/mYfAZn.png "style=" border:0px none;height:auto;vertical-align: middle;text-align:center;margin:0px auto; "/>
MsgId A total of 16 bytes, containing the message storage host address, message commit Log Offset. Resolves the broker's address and commit Log offset address from msgid, then resolves to a full message according to the location of the stored format message buffer
3.2 Query messages by message key
650) this.width=650; "src=" Http://img0.tuicool.com/j2uEru.png "style=" border:0px none;height:auto;vertical-align: middle;text-align:center;margin:0px auto; "/>
1. Obtain a specific slot position based on the hashcode%slotnum of the key being queried (Slotnum is the maximum number of slots contained in an index file, as shown in the example slotnum=500w)
2. Find the last item in the list of index entries (in reverse order, Slotvalue always point to the newest index entry) according to the Slotvalue (the value of the slot location)
3. Iterating through the index entry list returns the result set in the query time range (default to the maximum of 32 records returned)
4.Hash collision, looking for the slot position of key is equivalent to perform two hash function, a key hash, a key hash value modulo, so there are two conflicts; First, the hash value of key is different but the modulus is the same, At this point in the query will be compared to the first key hash value (each index key to save the hash value of key), filter out the hash value do not want to wait for the situation. Second, the hash value equal key does not want to wait, due to the performance of the collision detection of the client processing (the original value of the key is stored in the message file, to avoid parsing the data file), the client compares the message body key is the same
5. Storage, in order to save space the time stored in the index entry is the difference (storage time-start time, the start time is stored in the index file header), the entire index file is fixed, and the structure is constant
4. Server Message Filtering
ROCKETMQ message filtering method is different from other message middleware, is at the time of subscription, then do filtering, first look at the consume queue storage structure
650) this.width=650; "src=" Http://img0.tuicool.com/B3Ijmy.png "style=" border:0px none;height:auto;vertical-align: middle;text-align:center;margin:0px auto; "/>
1. On the broker side of the message tag comparison, first traverse the consume Queue, if the stored message tag and the message tag of the subscription does not meet, then skip, continue than the next, in accordance with the transfer to consumer. Note that the message tag is in the form of a string, and the consume queue stores its corresponding hashcode, which is also compared to hashcode
The 2.Consumer receives the filtered message and also executes the operation on the broker side, but the true message tag string, rather than the hashcode
Why does the filter do this?
1.Message Tag Storage hashcode is designed to store in consume queue for long storage, saving space
2. The commit LOG data is not accessed during the filtering process and can be efficiently filtered in a stacked condition
3. Even if there is a hash conflict, can also be modified on the consumer side to ensure foolproof
5. A single JVM process can also take advantage of the machine's very large memory
650) this.width=650; "src=" Http://img2.tuicool.com/mmyqia.png "style=" border:0px none;height:auto;vertical-align: middle;text-align:center;margin:0px auto; "/>
1.Producer send message, message from socket into Java heap
2.Producer send message, message from Java heap into Pagecache, physical memory
3.Producer sends a message, brushed by an asynchronous thread, and the message is brushed from the Pagecache into the disk
4.Consumer Pull message (normal consumption), the message is transferred directly from Pagecache (data in physical memory) to the socket, to consumer, not through the Java heap. This consumption scenario is the most, 96G physical memory on the line, according to the 1K message count, you can physically cache 100 million messages
5.Consumer Pull message (abnormal consumption), the message is transferred directly from Pagecache to socket
6.Consumer Pull message (abnormal consumption), because the socket accesses the virtual memory, resulting in a missing fault, it will generate disk IO, from the disk load message to Pagecache, and then directly from the socket sent out
7. Same as 58. Same as 66. Message Stacking Problem Resolution
Stacking performance Indicators |
1 |
The stacking capacity of the message |
dependent disk size |
2 |
The throughput size of the message is affected by the extent |
No slave situation, will be affected by certain has slave, not affected |
3 |
will the normal consumption of consumer be affected by the |
No slave situation, will be affected by certain has slave, not affected |
4 |
How much throughput is available when accessing messages stacked on disk |
is related to the concurrency of the access and will eventually drop to around 5000 |
In the case of slave, master once found consumer access to the data piled up on the disk, recall consumer issued a redirect instruction, so that consumer pull data from slave, so that normal messages and normal consumption will not be affected by the accumulation, Because the system splits the stacked scene with the non-stacked scene in two different node processing. Here will produce a problem, slave will not write performance degradation, the answer is no. Because the slave message writes only pursues the throughput, does not pursue the real-time nature, as long as the overall throughput is higher, and slave each time pulls a batch of data from the master, such as 1M, this batch order writes the way to make the heap situation, the overall throughput influence is relatively small, only writes the RT to grow
This article is from the "Rise" blog, please be sure to keep this source http://binbinwudi8688.blog.51cto.com/3023365/1673371
Characteristics and characteristics of RocketMQ-03