When doing the system operation, we often need to execute the same command on multiple machines simultaneously, this time can use the parallel Execution shell tool like pssh,pdsh.
Of course, before the use of these tools, if there is no password ssh exchange visits, we can write our own for the loop to execute, but our own write for loop is serial,
PDSH is concurrency.
For example, when you take over a new big data cluster, one thing you need to do is to familiarize yourself with all of the configuration information to be aware of, and after you are familiar with the master configuration file,
We need to be consistent with the configuration file for the Datanode node, and at this point we can use PDSH.
Pdsh–w dn[1-50] ' md5sum/app/hadoop/etc/hadoop/core-site.xml ' | Dshbak–c
The Datanode list of the same group is listed directly and different, and one can see clearly which is the same and which is different.
or when troubleshooting the problem, it is more consistent than the system configuration file.
When installing the PDSH package, it comes with a tool PDCP parallel copy tool that can help in parallel file distribution.
For example, you modify a mapred-site.xml file on the DN1, and then you need to send this file to other DN nodes, you can do this
Pdcp–w dn[2-100]/pathtohadoop/etc/hadoop/mapred-site.xml/pathtohadoop/etc/hadoop/
, very easy to use.
You can also do similar system kernel updates, you can use PDCP first to send out the installation package, and then through the PDSH to execute the installation command, complete the installation of all machines.
The pdsh of Linux operation and maintenance type