Openmosix Learning Experience: differences between openmosix and PBS Systems

Source: Internet
Author: User
Openmosix (www.openmosix.org) is an open-source project, formerly mosix, and openmosix is an open-source implementation of mosix. Generally, HPC clusters are called Beowulf clusters, while mosix clusters are completely different from HPC clusters. The openmosix cluster is a kernel patch that balances tasks at the kernel layer. For example, a cluster has 10 nodes, so if we want to suppress 10 MP3, we only need to submit this task to any node, then openmosix will automatically distribute the task to 10 nodes, and the task migration time is second-level. Because openmosix is at the kernel level, you do not need to change it for upper-layer applications. As long as such tasks are openmosix friendly, openmosix will automatically schedule these tasks for us, distribute tasks to machines with low CPU load. In addition, openmosix also has the auto discovery function, that is, if a new node is added, openmosix can automatically discover it and schedule the task.

Openmosix installation is very simple. Currently, openmosix only supports 2.4 of the kernel and does not support 2.6. As long as we have a 2.4 kernel source code tree, we can use patch to install the openmosix source code patch, then re-compile and generate the kernel, and use the new kernel to boot the system. Replace the kernel of openmosix on each machine in the cluster.

Note that openmosix cannot change a serial application to a parallel application! Openmosix only allows a cluster to form a logically SMP machine and then transparently distribute tasks. Here is an official openmosix FAQ, which is very classic. It describes what openmosix can do, what it cannot do, and how to install it (that is, compile a new kernel) the GCC version is very detailed and practical:

Appendix 1

Next, let's talk about the comparison between openmosix and PBS systems. An interesting thing is also mentioned here. On the openmosix website, we can see that there is such a project: checkpoint, that is, CHPO. It can implement the checkpoint function very well, but this software is for the mosix cluster, therefore, HPC clusters cannot be used.

Here, I have a question: What is the difference between openmosix and PBS? The user can submit the task to the PBS system, where PBS is responsible for scheduling the task in the cluster. It looks like openmosix. For this reason, I sent an email to maillist of openmosix. Here is the answer from a foreigner, to sum up, there are several points:

1. To use the PBS system, you must define the logic of the task by yourself and use commands such as qsub of the PBS system to submit the task. Openmosix is completely transparent. PBS is powerless for some applications that we cannot fix or customize task features.

2. for interactive programs, the PBS system is difficult to process. openmosix has no such problem.

3. after the PBS system submits a task, the task is ignored, even if the CPU load on the node increases, tasks submitted to this node will continue to be executed on this node. Openmosix is different. If openmosix finds that the CPU load on this node is too high, and there is a CPU free node in the cluster, it will migrate the task again. In other words, openmosix is always "scheduled" tasks.

Here are some answers from foreigners for your reference:

Not sure exactly what a PBS is but form what you describe it wowould
Run on top of an openmosix cluster. openmosix creates what is called
Single System image. This WYA your jobs do not have to know anything
About a cluster, they do not have to be submitted to a schedted, you
Just run and forget. The cluster will automatically shift the load
Round to get best CPU per job usage. A scheduler on the other hand
Requires either the jobs to understnadhow to do parrallel work or
The submitter to pre-split the job to make use of the cluster. That is
A bit of a simple explantation but ithin it shoshould give you a good
Idea of the difference.

========================================================== ======================================

While having similar aims, the way Batch systems like the Grid Engine
Achieve them is quite different from the openmosix approach. Roughly
Speaking, batch systems start jobs on free nodes by "ssh-ing" to
Given node (or do something a bit more clever but still somehow
Equivalent). This is also the reason why they have bring their own job
Management tools (e.g. qstat, qsub etc.) -- jobs just have to be
Processes on your local node. This does also limit the kind of jobs
You can use with batch systems, as they're usually unable to execute
Interactive or X applications.

Contrary to that, openmosix engages in a much lower level. It is
Kernel Patch that allows processes running locally on a node to be
Transmitted to another node transparently during runtime. The last
Part is quite important, as it has some interesting consequences:

* If there is a load inequality among the cluster nodes, it can be
Equalized much smoother by simply migrating some jobs to the idling
Machines within seconds. Batch systems can only equalize load
Starting new jobs which isn' t as elegant and, more important, Will
Fail if the queue is empty.

* You don't have to use special job management tools, as openmosix
Can migrated nearly every process on your node (OK, there are some
Limitations: No multithreading, no shared memory, but for Om's use
Case this is a weak limitation ).

* Om does also work with interactive and X applications. For instance
If you have a graphical fractal generator which is creating high
Load on your login machine, Om cocould easily migrate it to an idling
Machine without you noticing it.

So, Om is, despite some limitations, way more elegant. Give it a try.

========================================================== ======================================

By using the openmosix kernel along side other clustering apps, a more
Generalised Beowulf style cluster can be built to cater for all types
Use.

I have used PBS and found it tricky to set up jobs to run quickly restart SS
Nodes but that wocould not mean that you cannot use PBS along side openmosix.

If a job you schedule for a participant node is openmosix firiendly, then
Openmosix cocould cause that pays-per job to migrate on to a faster free
Node and if your participant ular job spawns sub processes that are openmosix
Friendly, then each one of those processes cocould infact migrate in order
Get 100% CPU usage from all the openmosix nodes in your network.

IE
PBS spawns 10 openmosix friendly processes for 1 node on the network,
Openmosix wowould migrate each of those processes to a different node.
If one node is then used for something else, then openmosix cocould migrate
The process again to find the maximum CPU use for that process.
Without openmosix, PBS wocould only allow you to set the same 10 processes
Run nodes ss 10 nodes and stay where they run.

Quote from Andreas sch? Er:
So, Om is, despite some limitations, way more elegant. Give it a try.

* Yes, and use it along side PBS and other clustering apps .*

I also use dsh from within cron...
My cron scheduler runs on a designated master node, jobs are set using
Crontab-E from any node and dsh is used to run them.

Eg:

0 ***** dsh-C-M 192.168.1.20-M 192.168.1.21/home/mydir/myscript. Sh
Wocould cause the script 'myscript. Sh' To Run hourly on nodes 192.168.1.20 and
21 concurrently.
(Please note that/home And/var/spool/cron/are available from NFS)
If myscript. Sh contains om friendly processes, then those too will migrate
Using ss the network to other nodes.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.