Definition
In a distributed system that contains several Erlang nodes, applications may need to be controlled in a distributed manner. If an application runs on a node, the application will be restarted on another node.
Such an application is calledDistributed Application. Note that application control is distributed, and all applications can be distributed, for example, services on other nodes.
Because a distributed application may move between nodes, there must be some addressing mechanism to ensure that it can be found by other applications, regardless of the node on which it is currently running. This issue will not be discussed here, the kernel moduleglobal
And stdlib modulespg
Can be used for this purpose.
Specify Distributed Application
A Distributed Application is controlled by an application controller and a distributed application.dist_ac
To control. These processes arekernel
Part of the application. Therefore, distributed applications must be configured throughkernel
Use the following configuration parameters (seekernel(6)
):
-
distributed = [{Application, [Timeout,] NodeDesc}]
-
Application
Application = atom()
Where to execute.
NodeDesc = [Node | {Node,...,Node}]
Is the list of node names, sorted by priority. The priority of a node in a meta-group is uncertain.
Time = integer()
Specifies the number of milliseconds to wait before restarting the application on another node.
In order for the distribution of application control to work properly, the nodes running the distributed application must communicate with each other and determine where to start the application. This is to use the followingkernel
Configured parameters:
-
sync_nodes_mandatory = [Node]
-
Specify which other nodes must be started (
sync_nodes_timeout
Within the specified timeout period)
-
sync_nodes_optional = [Node]
-
Specify which other nodes can be started (
sync_nodes_timeout
Within the specified timeout period)
-
sync_nodes_timeout = integer() | infinity
-
Specify how many milliseconds wait before other nodes start
After startup, the node will wait for allsync_nodes_mandatory
Andsync_nodes_optional
The specified node is ready. When all nodes are ready, or when all forced nodes are readysync_nodes_timeout
After the specified time, all applications will be started. If not all the forced nodes are ready, the node will be terminated.
For example, an applicationmyapp
Should run on nodes[email protected]
. If the node failsmyapp
It should be in[email protected]
Or[email protected]
. For[email protected]
A system configuration file of should be:
[{kernel, [{distributed, [{myapp, 5000, [[email protected], {[email protected], [email protected]}]}]}, {sync_nodes_mandatory, [[email protected], [email protected]]}, {sync_nodes_timeout, 5000} ] }].
[email protected]
And[email protected]
Except for the list of forced nodes[email protected]
It should be[[email protected], [email protected]]
,[email protected]
It should be[[email protected], [email protected]]
.
Start and Stop distributed applications
After all participating (forced) nodes are started, you canOn all these nodesCallapplication:start(Application)
To start distributed applications.
Of course, you can also use the startup script (seeRelease) Automatically start the application.
The application willdistributed
Configure the node that has been started and runs as specified by the parameter. The application is started normally. That is to say, the application master program is created and the application callback function is called:
Module:start(normal, StartArgs).
For example, in the previous section, all three nodes are started and the following system configuration files are specified:
> erl -sname cp1 -config cp1> erl -sname cp2 -config cp2> erl -sname cp3 -config cp3
When all the nodes are up and running, you can startmyapp
. This is done by callingapplication:start(myapp)
. Then, it is started oncp1
As shown in.
Similarly, the application must callapplication:stop(Application)
To stop the application.
Failover
If the node running the application fails, the application will be restarted after the specified timeout.distributed
Configure the first startup and running node specified by the parameter. This process is calledFailover(Failover ).
The application is started on the new node in the normal way, that is, it is called through the Application main program:
Module:start(normal, StartArgs)
Note: If the application definesstart_phases
Key (seeIncluded applications), The application must also use the following calls:
Module:start({failover, Node}, StartArgs)
WhereNode
Is the terminated node.
Example: Ifcp1
Then the system checks other nodes --cp2
Orcp3
-- The minimum number of running applications, but it takes 5 seconds to getcp1
Restart. Ifcp1
No restart, at the same timecp2
Running application ratiocp3
Less, somyapp
Thecp2
.
Assume thatcp2
It also fails and does not restart within 5 seconds. So nowmyapp
Thecp3
.
Take over
If a node is starteddistributed
It has a higher priority than the node currently running the distributed application, so the application will first restart on the new node and stop on the old node. This process is calledTake over.
In this case, the following application main program is called to start the application:
Module:start({takeover, Node}, StartArgs)
WhereNode
It is an old node.
For example, ifmyapp
Currently running incp3
, Ifcp2
If it is restarted, it will not restartmyapp
Because the nodecp2
Andcp3
The order between them is uncertain.
However, ifcp1
Also restarted, Functionapplication:takeover/2
Willmyapp
Movecp1
Becausecp1
Priority Ratio of the applicationcp3
High. In this casecp1
RunModule:start({takeover, [email protected]}, StartArgs)
To start the application.
9. distributed applications