Do you need a machine that can perform hundreds of trillions of floating point operations per second? Or do you want to know something interesting about how the supercomputer in your basement is powered on? Building a self-owned computing cluster-that is, a super computer-is what every professional geeks who are so busy and have money to burn can do! Technically speaking, a modern multi-processor supercomputer is actually a network of computers that work in parallel to solve specific computing problems. This article will unveil the secrets of building a supercomputer from both hardware and software perspectives.
Steps
1. Determine the hardware components and required resources.
You need a head node, at least a dozen compute nodes, an Ethernet switch, a power distribution unit, and a server rack. Calculate power consumption, cooling demand, and land occupation demand. Similarly, you need to determine the IP address segment of your private network, the name of the node, the expected software package, and the technology used to build the service cluster (more will be explained later ).
- ● Although the hardware is expensive, the software listed here is free and mostly open-source;
- ● If you want to know how fast your supercomputer can theoretically be, you can use this tool: http://hpl-calculator.sourceforge.net/
2. Create a computing node
You need to assemble computing nodes by yourself, or you can use pre-configured servers.
- ● Select a rack-mounted server that maximizes space, cooling, and energy consumption efficiency;
- ● Alternatively, you can use a dozen or so idle outdated servers-they work together to produce more performance than the total number of independent runtime servers, saving you a lot of money! The processor, network adapter, and motherboard of the entire system should be of the same type to achieve the best running efficiency. Of course, do not forget to assign memory and hard disk to each node, and at least assign an optical drive to the first node.
3. Install the server on the rack
The installation starts from below, which can avoid heavy head of the rack. You may need the help of your friends to complete this task-so many servers will be very heavy, it will be very difficult to put them on the slide of the rack.
4. install an Ethernet switch on the top of the rack
Now, configure the vswitch: Allow 9000-byte frames, set the IP address to the static address you identified in step 1, and disable unnecessary routing protocols such as SMTP sniffing.
5. Install the energy allocation Unit
Based on the current maximum requirements of your node, V may be able to meet your high-performance computing needs.
6. After everything is properly installed, you can start the configuration process.
Linux is a de facto standard for high-performance computing clusters (HPC cluster) operating systems, not only because Linux is an ideal environment for scientific computing, it is also because Linux does not incur any cost when it is installed on hundreds or even thousands of nodes. Imagine how much will it cost you to install Windows on so many nodes?
- ● Update the BIOS firmware of all nodes to the latest version starting from the update of the BIOS of the motherboard;
- ● Install your preferred Linux release on each node. The first node must be supported by the graphical interface of the installation team. Popular options include centos, opensuse, scientific Linux, RedHat, and sles;
- ● We strongly recommend that you use rocks cluster distribution to build a computing cluster. In addition to all the tools required by the computing cluster, rock also provides a solution for batch deployment through PXE and Redhat's "kick start.
7. Install the message transmission interface, resource manager, and other required libraries.
If you did not select rock as the operating system of your node in the previous step, you need to manually set the software required for parallel computer.
- ● First, you need a portable bash management system, such as torque Resource Manager, which allows you to divide and allocate computing tasks;
- ● If torque Resource Manager is installed, you need Maui cluster scheduler to complete the settings;
- ● Second, you need to install the message passing interface to share data between processes on different computing nodes. You don't need to think about it. OpenMP is your dish!
- Finally, do not forget to use a multi-threaded mathematical library and compiler to compile your computing tasks. Did I say that you only need rocks?
8. Connect all computing nodes to the network
The first node is responsible for assigning tasks to the computing node, and the computing node returns the results. The same is true for message transmission between nodes, so of course the faster the better.
- ● Use a private network to interconnect all nodes in the cluster;
- ● The header Node also acts as the NFS, PXE, DHCP, and NTP servers in the LAN;
- ● You must separate the network from the Internet to ensure that the broadcast packets in the network do not affect other networks;
9. Test the Cluster
Before you deliver your powerful top500 computing cluster to customers, you need to test its performance. The hpl (High Performance lynpack) evaluation package is a common choice for testing the computing speed of a cluster. You need to compile it from the source code. during compilation, all possible optimization options are opened based on the architecture you choose.
- ● Of course, when compiling the source code, you need to open all possible compilation optimization options. For example, if you are using amd cpu, add the-0fast optimization option when compiling open64;
- ● Compare the running score result with the fastest computer on top500.org!