In a replication system, in order to maintain consistency, each replicated server is serially executed so that performance is slower than a system with only one server, because only one server can be processed in parallel. This is a great step forward if the various servers in the replication system can also be processed in parallel.
However, if there are no shared variables between the threads, it is also possible to do parallel processing on each server in the replication system, and in fact many of the previous replication system parallelism is done on this basis.
P-SMR also depends on the dependencies between commands to determine what situations can be parallel, and under what circumstances is serial.
We need to abstract the service program into a state machine, a mealy state machine. To build a state-machine replication system, the necessary modules are Proxy,client proxy and server proxy. For example, when the request arrives at the server, the server proxy is required to evaluate whether it can be executed in parallel, and how to go parallel, and also pass the information between replicas and client-server through proxy. Like RPC (Remote procedure Call). For example, for a single command, multiple servers will send multiple identical response to the client, but through the server proxy, only one response will be issued to the client.
Let's look at what two commands are interdependent (dependent): They access (read or write) a public variable V, and at least one of the commands changes the value of V. If one command is read V, the other is Update v. However, either the system or the replication system cannot be executed in parallel, and the command can only be processed sequentially according to total order. So if the command is dependent, replication can only take serial execution. So the more said the more wrong, so the replication system to solve what problem? Do you want independent commands to execute in parallel, or do you want dependent's commands to be executed in parallel? Find the answers below.
The answer is independent. You might ask: What's so hard about this? In fact, we ignore a problem, in addition to the execution of the command can be parallelized, how to also let the command parallel delivery (delivery), at this time, multiple commands or serial in the delivery, so this should be P-SMR to go to the challenge of the place, do all parallelization, rather than semi-parallelization.
so P-SMR means parallel state-machine replication, A method that enables both command execution and delivery to be parallelized. For a replica in addition to the parallel execution of commands, but also in multicast this link to achieve parallelism. What is the difficulty of multicast this place in parallel?  
C-DEP (Command dependencies) , generally speaking, if the parameters of the two command are the same object, and is the update operation, you can pre-judge these two commands depend on each other. In the client proxy, there is a function called command-to-group (c-g) , which maps the command of the client and the place where the command goes. Tell which command to go somewhere. Here is the algorithm for this procedure:  
Let's imagine that in this c-g function, we need to do this:  
1 "c-g function to be able to determine the C-DEP
3) If there is a dependent,c-g between the 4 commands, let the 4 commands be serially processed.
How to do 2?" C-g to assign these 4 commands to a different group groups.  
is 4, we can define 4 worker thread (T1, T2, T3, T4) for each server, these 4 thread runs on 4 cores, At this point we can say that the system has 4 groups, namely:  
g1:a (T1), B (T1), C (T1), D (T1), E (t1)  
g2:a (T2), B (T2), C (T2), D (T2), E (T2)
g3:a (T3), B (T3 ), C (T3), d (T3), E (T3)
g4:a (T4), B (T4), C (T4), D (T4), E (T4)
g5:a (T5), B (T5), C (T5), D (T5), E (T5)
Take a look at a c-g application scenario. Suppose there are two commands at this time:
Set_state (v)
Get_state (v)
For get_state, although this is a replication system, this command does not need to be sent to all replicas, only one of them executes and returns the result, and for set_state, it is necessary to have each replica executed.
Here is a note of the advantages of P-SMR:
P-SMR offloads scheduling decisions from the replicas, avoiding a bottleneck-prone scheduler, which must deliver a sin GLE stream of commands and assign them to the worker threads for execution.
Try to explain: the traditional parallel execution, is to let the Linux kernel to do the dispatch, the 4 Independent command allocated on 4 threads. In the P-SMR in the client proxy c-g played the role of the Dispatcher (scheduler), that is, by the c-g to develop the scheduling strategy of the thread allocation, so that you can specify a command assigned to a particular thread, which is not available in the traditional Linux kernel scheduling, This is also the core idea and method that P-SMR can achieve parallel execution. Here is my understanding of P-SMR:
See state machine Replication through P-SMR