Before this article, I suggest reading this article first: Linux High Performance Computing cluster –beowulf cluster One, may encounter problems in building a cluster
1. Cluster design/Layout of a major difficulty is the network, each family has a strategy, generally around the Beowulf variant.
2. The deployment of the software lacks practical debugging, and it cannot be concluded that such deployment is successful and superior.
3. Manual/Automated deployment of the entire process of operating system selection and installation system, required software (including cluster management tools, driver software, energy monitoring software, etc.).
4. Each of the hardware-provided counterparts has almost always launched their own cluster management software, the wave seems to have no, in the HPC wave seems to have no corresponding development community, Dell and IBM these large companies have.
5. The measurement of energy consumption, the specific optimization direction and strategy. Second, HPC overview ① Overall composition
Outside Network: External net
Master node: primary
Compute Nodes: Compute node
Storage: Memory
Computational network: Compute Networks
Management Network: Managing Networks
② Most HPCC systems are equipped with two networksTCP-based Management network compute Network, which can be TCP or other protocol based, usually a high-speed network such as InfiniBand or Myrinet 10G
③ topology Diagram
Generally accepted on the Internet:
This is "imaginary" according to the actual situation:
④ required Software components (by installation order): 1, for the installation system:
The operating system is required for each node in the cluster (HPCC node article link), master node, login node, and compute node. The operating system can be installed on a node's hard disk drive and can even be installed on RAMDisk, which is sometimes referred to as a "diskless" or "stateless" node. In general, the master node creates a so-called "image" and then sends it to the compute node for installation (on a hard drive or ramdisk).
The system installed in the memory will be faster, but after the power outage to take the system to copy the method to the hard disk storage again, more trouble.
features of several installation tools:
However, the comparison is known to have rocks (simpler than xCat installation), XCat.
XCAT:
A, support their own system independent, you can choose the latest version of Rhel-based system;
B, command-line installation, need to manually edit the configuration file, install one and then use the script for network installation of bare metal node;
C, xcat in the installation like a separate software, such as other parallel computing needs related software can be installed through the XCAT command;
Rocks:
A, ROCKS is based on the Red Hat release, which is appropriate for most people, but not suitable for those who use SUSE or want to use images created on the RH 6.2 release, where the software can be selectively installed. In addition, ROCKS is not a cloning solution;
b, before the need to burn Cd,gui interface deployment, after the manual node naming and node IP configuration, installation of one at a time;
C, Rocks is like an integrated package, including tools and software;
Rocks = CentOS + Rolls
Roll package includes: Base: Basic rocks Cluster management tool Sge:sun Grid Engine, job scheduling HPC under cluster: providing a running environment for parallel applications on a cluster (MPI,PVM) Area51: Analyzing the integrity of files and kernels on a cluster Ganglia: Cluster monitoring software (mainstream HPC cluster basically have this) Bio: A Clustered Bioinformatics tool
In general, these two methods only differ in applicability and convenience.
There is a video lesson http://edu.51cto.com/course/course_id-507.html
For the choice of operating system, research a lot, generally with red hat,centos mostly, on the one hand is the community active, on the one hand is the source code development, the resource is compatible more.
(Mic development basically uses red Hat,centos,suse,windows also started to have) 2, installs the drive and the development tool.
Includes IB drivers, compilers, editors, debuggers, libraries, and more.
Parallel Computing development environment: installs the Intel Core Platform software Stack (MPSS), which contains various drivers.
Https://software.intel.com/en-us/articles/intel-manycore-platform-software-stack-mpss#lx34rel If we use Intel's compilers (we should be using Intel's):
Download and install Intel Parallel Studio XE 2015 with performance analysis tools, compilers, high-performance libraries, parallel programming tools, and more to optimize and upgrade Xeon Phi. Cilk PLUS,OPENMP,TBB multithreaded programming and vectorization techniques have been implemented on Xeon & Xeon Phi, and software developers do not need additional porting costs. (30 days trial)
Https://software.intel.com/en-us/intel-parallel-studio-xe/try-buy Xeon Phi and third-party tools are available, refer to Intel FAQ:
https://software.intel.com/en-us/articles/ INTEL-AND-THIRD-PARTY-TOOLS-AND-LIBRARIES-AVAILABLE-WITH-SUPPORT-FOR-INTELR-XEON-PHITM 3, Configuration node information storage System
Installing NFS, PVFS, Lustre, Luster, GPFS, SNFs, and so on, generally large HPC clusters with Lustre to achieve better performance, but not suitable for small clusters, small clusters can be considered with NFS and PVFS, but NFS is not oriented parallel computing, Recommended to use PVFS a bit better.
About Lustre:
A lustre file system consists of the following four components: Management Server Management Server (MGS), metadata server meta data Target (MDT), object Storage server Objects Storge target (OST), Client Lustre Clients (LC).
It mainly consists of three parts: the metadata server MDS (Metadata server), the object Storage server OSS (objects Storage Server), and the client clients.
The normal boot sequence is: OST, MDS, CLIENT
"The cloud file system is not Oracle's first product based on a clustered file system," said Bob Thome, head of product management at Oracle. Oracle manages the lustre project, and lustre is more suitable for large scale HPC (high performance computing) deployments with thousands of servers. The cloud file system is more suitable for small-scale deployments of around 25 nodes, although lustre has passed tests of up to 100 nodes. Lustre can also implement many of the same features, but the use of high thresholds, installation and configuration is more cumbersome, and not suitable for small-scale deployment. ”
Blog about lustre: http://www.cnblogs.com/jpa2/category/384788.html
PVFS has the following deficiencies:
1) Single management node. As mentioned above, there is only one Management node in PVFS to manage metadata, when the cluster system reaches a certain scale, the management node will be over-busy, then the management node will become the system bottleneck.
2) Lack of fault-tolerant mechanism for data storage. When an I/O node fails to work, the data above will be unavailable.
3) static configuration. The configuration of the PVFS can only be performed before it is started, and the original configuration cannot be changed once the system is running. 4. Cluster management tools (consider whether some components are integrated)
Cluster management tool (CMT), which functions as a management cluster. It has multiple functions and is available with optional features. The necessary features include maintaining a list of compute nodes (that is, the nodes included in the cluster). Simply by simply/etc/hosts, you can replicate or send to each compute node through local DNS to create, manage, or install a packet set on a compute node to send an image or packet to a compute node (typically through PXE) to perform basic monitoring of the compute nodes (for example, node performance. What node is going up and down. COMPUTE node Power control (not a hard requirement, but highly recommended). That is, the remote on/off node, this feature can be implemented in various ways, and some methods need to use additional hardware.
Although this list of features is too brief for people with cluster experience, the list of features is the real core of CMT. It's good to have other features, but it's not essential for a cluster.
CMT includes Platform OCS, Clustercorp rocks+, Microsoft Windows CCS and Platform Manager (Platform Manager), Mon, and more. 5. Optional components:
There are not many tools required for clustering, but with these you can achieve the basic operation of the cluster. However, it can only meet the needs of 1 users or 2 to 3 users, in addition, to achieve full control and master the running of the cluster. To install some optional components, although technically optional, without these tools, the cluster is not productive.
Some components can be added to the CMT or CMI upper layer. A person with years of experience managing multiple clusters, it is strongly recommended that you consider using the following add-ons: a wider range of monitoring tools, including the cluster status graphical view, such as ganglia (link-http://ganglia.info/), Cacti (Link-http:// www.cacti.net/) and Nagios (link-http://www.nagios.org/) Reporting tool, which allows you to create report user account management tools for cluster operation (allows you to create user accounts on the entire cluster, allow users to set passwords, It is then propagated to all nodes of the cluster, allowing no password login nodes, which is required to run the MPI application)
Another theoretically optional, but highly recommended component-the Task Scheduler (also known as the Resource Manager) Task Scheduler is a queueing system that allows users to submit tasks that do not run on tasks. The Task Scheduler queues the submitted tasks and starts running when the resource (that is, the node) is available. Task Scheduler includes: Platform LSF, Pbs-pro, and MOAB. 6. Testing
Third, reference links:
http://www.ibm.com/developerworks/cn/linux/l-cluster1/
http://zh.community.dell.com/techcenter/w/techcenter_wiki/50
http://www.hpcblog.com.cn/
attach a high-definition mic diagram: