The HBase Client API provides a way to write a buffer, that is, to bulk submit a batch of put objects to the HBase server. This article will combine hbase related source code, carry on the thorough introduction to it, analyze how to set up and use it reasonably in the actual project. 1. When do I need write Buffer?
By default, a put operation performs an RPC operation with region server and its execution can be split into the following three sections:
T1:rtt (round-trip time), that is, the network round-trip delay, which refers to the beginning of sending data from the client, to the client received from the service side of the confirmation, the total experience of the delay, does not include the timing of data transmission;
T2: Data transfer time, that is, the time cost of the data transferred between the client and the server, when the data volume is large, the time cost of the T2 can not be neglected; T3: service-side processing time, for put operations, that is, write Wal log (if Wal ID is set to TRUE), Update Memstore and so on.
Among them, T2 and T3 are unavoidable time expenses, then can reduce T1? Assuming that we package multiple put operations and commit them to the server at once, you can reduce the total time of the T1 portion from T1 * N to T1, where T1 refers to a single RTT time and N to the number of record bars put.
It is for these reasons that HBase provides the user with a way to bulk commit the client cache (that is, write Buffer). Assuming that the RTT is longer, such as 1ms, this approach can significantly improve the write performance of the entire cluster.
So, what scenario applies to that pattern? Here is a brief analysis:
If put commits small data (such as KB or even smaller), then T2 is very small, thus reducing the overhead of T1 by this pattern can significantly improve write performance. If the put commits a large data (such as MB) records, then the T2 may already be much larger than the T1, and T1 can be ignored compared to T2, so using this pattern is not a good performance boost, and it is not recommended to use this mode by increasing the write buffer size.
2. How to configure the use of write Buffer?
If you want to start write buffer mode, the following API that invokes htable sets the auto flush to false:
void Setautoflush (Boolean autoflush)
By default, the Write buffer size is 2MB and can be customized in any of the following ways, depending on the actual application:
1 invokes the Htable interface setting and only works on the Htable object:
void Setwritebuffersize (Long writebuffersize) throws IOException
2) configured in Hbase-site.xml, all htable are in effect (set to 5MB below):
<property> <name>hbase.client.write.buffer</name> <value>5242880</value> </ Property>
The timing of submitting to the server in this mode is divided into explicit and implicit situations:
1) Explicit submission: User calls Flushcommits () for submission;
2) Implicit submission: When the write buffer is full, the client automatically executes the commit, or executes the commit operation unconditionally when the htable close () method is invoked.
3. How do I determine the actual number of RPCs per flushcommits ()?
After the client commits, all the put operations may involve different rows, and the client is responsible for grouping the put objects according to the row key by region server, then region the server to region server, each region Server makes an RPC request. As shown in the following illustration:
More Wonderful content: http://www.bianceng.cnhttp://www.bianceng.cn/database/extra/