I. Background of the problem
When a client sends an RPC request to the server, it is likely that, for various reasons, the response times out. If the client is just waiting, the operation against the data, it is likely that the server side has been processed, but unable to notify the client, at this point, the client can only re-initiate the request, but it may cause the service side to duplicate processing requests. How to solve the problem.
Second, the solution
In fact, after the client sends the RPC request to the server, if the response times out, the client repeats the request until the maximum number of retries for the parameter configuration is reached. Moreover, the first time the client sends and later resend the request, will be accompanied by the same nonce, the service side as long as the nonce to determine whether it is the same request, and according to the previous request processing results, the decision is to wait, reject or direct processing.
Iii. how HBase is implemented
In Hregionserver, there is a member variable noncemanager of the Servernoncemanager type, which is responsible for managing the nonce on that regionserver. It is defined as follows:
Final Servernoncemanager Noncemanager;
There is a very important method in Servernoncemanager, which is used when an operation fails to respond to the client in a timely manner after execution of the server, and the client re-initiates the request for the same operation with the same noncegroup and nonce. The service side according to Noncegroup and nonce make corresponding judgment. Defined as follows:
/** * Starts the operation if operation with such nonce have not already succeeded.
If the * operation is in progress, waits for it to end and checks whether it has succeeded. * * If the operation is not successful, restart an operation.
If the operation is in progress, wait for it to complete and determine if it is successful.
* @param group Nonce Group.
* @param nonce.
* @param stoppable stoppable that terminates waiting (if any) when the server is stopped. * @return True if the operation have not already succeeded and can proceed;
False otherwise. */Public Boolean startoperation (long group, long nonce, stoppable stoppable) throws Interruptedexception {//IF
The incoming nonce is 0, then returns True, indicating that the operation can be performed if (nonce = = hconstants.no_nonce) return true;
Constructing Noncekey instances nk Noncekey NK = new Noncekey (group, nonce);
Constructs the OperationContext instance CTX, with the initial state of wait operationcontext CTX = new OperationContext (); while (true) {//maps Noncekey to OperationContext, added to Nonces of concurrenthashmap type OperationContext Oldresult = n Onces.putifabsent (NK, CTX);
If it has not previously, then the operation can directly execute if (Oldresult = = null) return true;
Collision with some operation-should is extremely rare.
If the operation was previously present, remove the operation nonce corresponding to the OperationContext synchronized (oldresult) {//Get the OperationContext state corresponding to the nonce
int oldstate = Oldresult.getstate ();
Log.debug ("Conflict detected by nonce:" + nk + "," + Oldresult); If the previous state is not a WAIT if (oldstate! = operationcontext.wait) {//If the previous state is proceed, the previous operation was completed and ended in a failure, here returns True, Indicates that the operation can be executed again return oldstate = = Operationcontext.proceed;
Operation ended}//wait for a period of time to continue the loop oldresult.sethaswait (); Oldresult.wait (THIS.CONFLICTWAITITERATIONMS);
Operation is still active ... wait and loop//Determine Regionserver status if (stoppable.isstopped ()) {
throw new Interruptedexception ("Server stopped"); }
}
}
}
In Rsrpcservices's append () method, there is the following code:
if (r = = null) {
long nonce = Startnonceoperation (M, noncegroup);
Boolean success = false;
try {
r = region.append (Append, Noncegroup, nonce);
Success = true;
} finally {
endnonceoperation (M, Noncegroup, success);
}
if (region.getcoprocessorhost () = null) {
region.getcoprocessorhost (). Postappend (Append, R);
}
}
Among them, the Startnonceoperation () method source code is as follows:
/** * Starts the nonce operation for a mutation, if needed.
* * If required, open a nonce operation for mutation * * @param mutation mutation.
* @param noncegroup Nonce Group from the request.
* @returns Nonce Used (can be no_nonce). */Private Long startnonceoperation (final Mutationproto mutation, long Noncegroup) throws IOException, Operationco nflictexception {//if noncemanager on Regionserver is null, or if the mutation does not have a NONCE, then return directly to Hconstants.no_nonce, which is 0 if (regions Erver.noncemanager = = NULL | |
!mutation.hasnonce ()) return hconstants.no_nonce;
Flag bit, whether Boolean canproceed = False can be run; try {//Call Noncemanager's Startoperation () method on Regionserver to determine if the operation can be performed canproceed = RegionServer.nonceManager.sta
Rtoperation (Noncegroup, Mutation.getnonce (), regionserver);
} catch (Interruptedexception ex) {throw new Interruptedioexception ("Nonce start Operation interrupted"); } if (!canproceed) {//If not run, throws operationconflictexception exception, i.e.Operation conflict exception//Todo:instead, we could convert append/increment to get W/MVCC String message = "The operation with nonce {"+ Noncegroup +", "+ mutation.getnonce () +"} on Row ["+ Bytes.tostring (Mutation.getrow (). Tobytearray ()
) + "May has already completed";
throw new operationconflictexception (message);
}//Finally, return mutation's nonce return mutation.getnonce (); }
It calls the Noncemanager startoperation () method on Regionserver to determine whether the operation can be performed.