Table of Contents
- Regionserver Functional Responsibilities
-
- Lease Management
- Nonce Management
- Heap Memory Monitoring
- Health Testing
Regionserver Functional Responsibility Lease management
The lease management function of HBase is mainly applied on the scan query, if the client performs a scan operation, the scanner is not closed within 60 seconds, and the lease is not displayed, then the query lease will expire. Regionserver will forcibly close the corresponding scanner to prevent an excessive number of invalid connections to occur. The expiration time of the query lease can be declared by the Hbase.client.scanner.timeout.period parameter, which defaults to 60 seconds.
In the function implementation, the lease management logic is mainly encapsulated by the leases class, which internally declares the following data structure for storing all lease information (lease information is encapsulated by lease object)
Leases:map<string, lease>
It also declares that the Createlease and Cancellease methods are used to add/remove leases to the leases collection. After the leases thread is started, it loops through the leases collection and notifies its Leaselistener for callback processing once the lease has been found to have expired. The lease object primarily implements the Java delayed interface, which can be getdelay to return the current lease and how long it will expire.
Nonce Management
After the client submits the RPC request, if the response of the server times out, it will resend the request until the retry count reaches the specified parameter value before exiting the request logic. After this processing, the server may have the following problems:
Take append operation example, the original client just want to add a keyvalue data to the database, but because the server response time-out, will cause the append request repeated multiple sends, the result is the target data in the server is added multiple times, there is append operation redundancy situation.
To prevent this from happening, HBase declares the nonce management function (through the Servernoncemanager Class), the client each application and the duplicate request uses the same nonce to describe, sends to the service end, the service side will first determine whether the nonce exists, If it does not exist, you can safely perform the operation of the nonce (such as append or increment). Otherwise, the corresponding callback should be processed according to the state of the current nonce:
If the nonce is in the wait state, indicating that the operation of the nonce is being executed, the current thread waits for its execution to be completed and is further processed according to its execution result;
If the nonce is in the proceed state, it indicates that the operation of the nonce has been executed, but the execution result fails and therefore can be re-executed here;
If the nonce is in the Dont_proceed state, it indicates that the operation of the nonce has been executed successfully, and there is no need to do the processing here.
Therefore, when the nonce enters the Dont_proceed state, all operations performed by it will be ignored, thereby preventing the operation redundancy from occurring. It is important to note that when a nonce is counted into the dont_proceed or proceed state, the time it can survive is controlled by the parameter (hbase.server.hashNonce.gracePeriod), which defaults to 30 minutes. After 30 minutes, Servernoncemanager will delete the nonce, through its Cleanupoldnonces method.
Heap Memory Monitoring
When the percentage of heap memory used for Memstore and Blockcache reaches 80%, the system throws an exception.
Therefore, when setting the relevant parameters, the following criteria should be satisfied:
Hfile.block.cache.size + hbase.regionserver.global.memstore.size <= 0.8
When the heap memory usage reaches 95%, the system will print a warning message
The 95% parameter values are set by Hbase.heap.occupancy.low_water_mark and the relevant warning information is as follows:
Heapoccupancypercent is above heap occupancy alarm watermark
Elastically adjusts the spatial proportions of the memstore and Blockcache, but the total size cannot be greater than 80% of the heap memory
When the following 4 parameters are specified, Regionserver dynamically adjusts the size of the Memstore and Blockcache based on the current heap memory usage (implemented by Heapmemorytuner).
Memstore Elastic Space: [Hbase.regionserver.global.memstore.size.min.range, Hbase.regionserver.global.memstore.size.max.range]
Blockcache Elastic space: [Hfile.block.cache.size.min.range,hfile.block.cache.size.max.range]
Monitor application pause times caused by GC
If the pause time is greater than 1 seconds (jvm.pause.info-threshold.ms parameter control), print the following output information:
Detected pause in JVM or host machine (eg GC): Pause of approximately ...
If the pause time is greater than 10 seconds (jvm.pause.warn-threshold.ms parameter control), the above output information is also printed, except that the information level is warn. The specific monitoring logic can refer to the implementation of the Org.apache.hadoop.hbase.util.JvmPauseMonitor.Monitor class (HADOOP-9618).
Health Testing
After the Regionserver process is started, the Healthcheckchore thread is turned on in the background and, by default, a health check is performed on Regionserver every 10 seconds to see if it is in a healthy state. The detection cycle is declared through the hbase.node.health.script.frequency parameter.
In the detection process is mainly the execution of the hbase.node.health.script.location parameter corresponding script (sample script can refer to hbase-examples/src/main/sh/healthcheck/ healthcheck.sh), if the script has the following exception during execution:
(1) The return value of the script execution is not 0;
(2) Script execution timeout (timeout time is set by hbase.node.health.script.timeout parameter, default is 60 seconds);
(3) During the execution of the script, the text message beginning with error is printed;
The Regionserver health test will end in failure. And if the Healthcheckchore thread has 3 (hbase.node.health.failure.threshold parameter control) health detection failures during the loop, and the time interval of two failures is less than 30 seconds, It will think that Regionserver is not in a healthy state, will forcibly shut it down, and print the following information:
The node reported unhealthy {threshold} number of times consecutively.
The time interval of 30 seconds is calculated as follows:
Hbase.node.health.script.frequency * Hbase.node.health.failure.threshold
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Regionserver Functional Responsibilities