How to manage various computing resources (such as CPU time, memory space, network bandwidth, and applications) is a headache for many enterprises. Especially for companies with employees all over the world, it is critical to make full use of limited computing resources. The lsf multi-cluster system developed by platform enables multiple heterogeneous computers to share computing resources through the LAN or WAN and provide users with transparent access to resources. Currently, Lsf supports the following large-scale resource sharing methods:
- Multiple departments in a large enterprise, each department can have one or more clusters, so that resources can be shared within and between departments.
- Data Centers in Small departments. Large and expensive computing resources such as supercomputer can be transparently shared directly with smaller departments in the distance.
- Share resources through loosely connected sites.
Lsf connects multiple clusters. A cluster is usually a department in an enterprise. Each cluster has a master machine that collects the load information of each host in the cluster system, the job is scheduled Based on the load information of each host. Various cluster systems share resources according to certain policies. Each master machine defines the cluster system resources that the cluster can share. When a user sends a task request, The lsf system can send the task to the corresponding resource location and select a machine with a lighter load according to the scheduling policy to process the task. When multiple users request the same resource, ensure that the emergency tasks of users with higher priority can be satisfied first according to the user's request priority. Lsf also has the following features:
- Provides enhanced computing capabilities.
With global resource sharing, users can access a variety of computing resources. Many idle computers can now make full use of it for task processing, and many machines can execute the same task in parallel, this greatly enhances users' computing capabilities.
- Lsf provides user-configurable security policies.
By enabling users to use RFC931 protocol, Kerberos, and DCE authentication policies, the system ensures that remote tasks come from authorized users.
- Each cluster is an autonomous system.
The configuration file of the master machine in each cluster records the following information: for example, the number and type of tasks that can be transferred between Multiple Cluster Systems, the user name that allows resources to be shared among multiple clusters.
- Provides non-shared user accounts and file systems.
When a task is transferred between multiple clusters, the user's account can map according to the configuration file. To support heterogeneous systems, lsf provides support for non-shared file systems by transferring files between cluster systems before and after task execution.
- Good scalability.
In a single cluster system, all configuration information is managed by the master machine. Information Transmission Between Multiple Cluster Systems is mainly related to the master machine, it is independent of other hosts in the cluster. Therefore, lsf clusters can be easily expanded to hundreds or even thousands of machines.
- The lsf system supports multiple operating system platforms.
For example, the main UNIX platforms are Sun Solaris, HP-UX, ibm aix, Digital UNIX/Compaq Tru64 UNIX, sgi irix, Red hat Linux, Windows NT, and Windows 2000.
TurboCluster is an enterprise-level cluster solution that allows you to build highly available and scalable networks on multiple computers. It supports Intel and Alpha chips and Linux, Windows NT, and Solaris operating systems. Using the TurboCluster system can significantly improve the service quality of multiple network services based on TCP/IP protocols, including Web, Mail, News, and Ftp. TurboCluster has good availability, scalability, and manageability. The number of servers in the cluster can be expanded to unlimited. TurboCluster is a software-based cluster system solution that supports heterogeneous network environments. Its structure is 2-1.
When a customer sends a request to the cluster system, the request first reaches the advanced traffic manager, the advanced traffic manager forwards the request to an actual server in the cluster through a certain scheduling policy. The final response request will be sent directly to the customer. Because the final response request is not directly sent to the customer through the advanced browser manager, this greatly reduces the load on the advanced browser manager and thus reduces the possibility of bottlenecks. The scheduling policies used in TurboCluster include Round Robin, Weighted Round Robin, and Least Connection ). To reduce the possibility of failure of the advanced traffic manager, TurboCluster has prepared a backup machine for the advanced traffic manager. The backup machine keeps asking the manager to confirm that it is working normally. Once the master manager is found to be invalid, the backup machine will continue to work.
Figure 1 turbocluster Structure
TurboCluster has the following enhancements.
- Heartbeat monitors the running status of the traffic manager on the backup machine to provide higher availability.
- Automatically manages the failures of unpredictable systems or applications.
- Provides dynamic load balancing to eliminate bottlenecks and handle peak loads.
- The advanced traffic manager only forwards authorized requests and increases network security by adding a virtual firewall to the front end of the actual application server.
- TurboCluster only displays the IP address of the traffic manager, and the IP address of the actual server in the cluster system is invisible to the outside world.
- When the actual server is providing external services, it can be maintained.