Checkpoint/restart Technology on the Crak--linux

Source: Internet
Author: User

We have used the virtual machine, it can let save a running system state, the technology used here is checkpoint, in the future can also be restart, but this is for the entire system, it is possible to only checkpoint a process, this direction attracted the research craze, Crak is one of the more nice things to do.


Checkpoint/restart Technology Application Scenario:

1. In distributed load balancing, it is often necessary to move a process from one host to another

2. Rollback: Can be rolled back after an error

3. Checkpoint before the system shuts down, can resume the program execution when the system restarts


But unfortunately, many mainstream business and popular operating systems, Unix,linux, etc., are not enough to fault-tolerance or distributed attention, and there is no checkpoint/restart mechanism, which also gives it a great difficulty in joining this mechanism. Although it is very difficult to modify the kernel source code, someone has come up with a clever way to introduce the Checkpoint/restart mechanism in the form of kernel loadable modules. Crak is the practitioner of this philosophy and can be applied as a module in the Linux kernel 2.6.25 version (But 2.6.24, 2.6.27 and 2.6.32 do not support Crak).


First look at the Crak design concept:

1. Do not rewrite the OS and use the generic OS to work on the OS in the form of module loading

2. Support Legacy applications

3. Low overhead, so do not rely on stub process and home node


Crak workflow (Tell process Migration):

1. Load the module into the kernel

2. Making a request: Checkpoint a process

3. Stop this process after kernel receives the request, and then start checkpoint it

4. Kernel saves its state to a file, and the process is killed (because it starts migrating)

5. Create a new process on the host that you want to migrate to, and save the state to it


The user interface provided by Crak is:

1. Checkpoint: The user is to specify which process is checkpoint, where the checkpointed Imgage is created (hard disk or network send). We kill the process above after checkpoint, the user can also choose not kill

2. Restart: When you want to restore, the user will use that image to regenerate the process


In this way, we can create a program on our own to monitor the programs you want to checkpoint, and automatically go to checkpoint regularly.


Checkpoint:

When users say that they want to checkpoint a process, what process information to save:

    • Address space
    • Register set
    • Opened Files/pipes/sockets
    • System V IPC Structures
    • Current working directory
    • Signal handlers
    • Timers
    • Terminal settings
    • User identities (UID, GID, etc)
    • Process identities (PID, PGRP, sid, etc)
    • Rlimit
    • Any other data, need to be saved

The first two are essential. Address space consists of several sections, a section that is a memory block with start and end addresses, and access flags (Read,write,execute,private and GKFX). Checkpoint iterates through all the sections of the process, saving the position and access flags and contents of each section into the image. But we know that a simple process can take up a few megabytes of memory, not to mention dozens of or even hundreds of megabytes of process like Apache. But Crak is smart, so many sections have code sections, it is read-only, so you do not have to save it, directly from the program's binary files can be obtained. Also we don't have to save those shared libraries, just save the sections that will be modified.


If the process is still running in the checkpointing process, the state of checkpoint and the final state of the process will be inconsistent, so stop it before checkpoint:

We don ' t checkpoint a currently running process.  Stop it first.  if (P! = current) {    send_sig (SIGSTOP, p, 0);    stop = 1;  }  if (ret = Do_checkpoint (FD, p, flags))! = 0)    return ret;

In fact, although the process stop, but does not mean that the process will not change, at this time it can also receive signal, but on this issue, Crak gave a wise answer, but this is not the focus of this article, so do not discuss.


The smartest thing about Crak is that it works in the form of dynamic module loading, which becomes part of the kernel space and runs in privileged mode as soon as the module is loaded. It is difficult to achieve this concept, module can only work with kernel, but cannot change it arbitrarily

Restart:

Just like Execve:

    • Create a new process
    • Restore address space from image
    • Restore Register Set
    • Re-open files, etc.


The value of thousands of words is less than the value of hundreds of lines of code, I think we computer people should believe in a rule:

Don ' t talk, show me your code.

Although the idea is good, but to achieve it, is the cost of painstaking efforts, the following to check the implementation of Crak.


Realize:

Two actions, one file, three questions in a nutshell--

Two actions: Checkpoint, restart

A file: The image file that holds the process state

Three questions: which states are stored? How to save? How do I recover these states?


Crak will register itself as a device file:/dev/ckpt after being loaded. This allows the user program to interact with the kernel module using standard file Operations (OPEN,CLOSE,WRITE,READ,IOCTL). Crak provides the IOCTL for this interface to checkpoint and restart.


int checkpoint (int fd, int pid, int flags);

FD is the checkpoint image file,pid that saves the process states is the process to be checkpoint, the flags flag has three kinds:

    • Ckpt_kill: When the checkpoint process is killed immediately
    • Ckpt_no_binary_file: Keep code sections, which is what you mean by reducing the amount of memory space you want to save, so that you don't have to put code section in image
    • Ckpt_no_shared_libraries: Keep SHARED LIBRARIES


int restart (const char * filename, int pid, int flags);

FileName is the image file to be loaded, from which the process is resumed.

PID: If flag is set to Restart_notify, kernel will send a SIGUSR1 signal to process PID when RESTART is finished

Flags

    • Restart_notify: As above, when RESTART is complete, the PID process is notified by kernel
    • Restart_stop: When RESTART is complete, immediately end this restarted process


In addition, there are several important functions:


Get_kernel_address ()

We all know that the memory address has a virtual address and physical address, for a process, its virtual address and physical address is different, but for kernel, the virtual address is the physical address. Pass to this function two parameters process p and the address of the process p to be accessed addr(virtual address), the page directory and page table of P can be used to calculate the physical address.


Look again register set, the register state is not a new thing, in the context Swtich, a process to be replaced, the first to save its register state. The same principle is used here, and the problem with implementation is where it exists. A process has its own kernel stack that occupies the 8K (2xpage_size) frame, and the task_struct of the processes is here, and the register set is on top of the stack. For example, the task_struct *p of a process, the position of register set is:

struct Pt_regs *regs = (struct Pt_regs *) (2*page_size + (unsigned long) p)-1;



Unfortunately, the research and development of Crak has been a long time, and in the 2001, the experimental environment used was

    • Gateway PCs with Intel Celeron 433MHz CPUs and $ MB RAM running Redhat 6.1 with Linux kernel 2.2.14.
    • All client machines were on a 100MB Fast Ethernet network.
    • The tests were do over NFS v3. The NFS server is a dual-processor sun4u Sparc running Solaris 7 with a. MB RAM.
    • The file system involved in the test is on a Seagate 4.2GB SCSI disk (there were several other disks hosting other Filesy Stems on the server).
    • The Linux client ran with NFS v3 UDP support.

The author himself said in the test will encounter unexpected results, such as segfaults, so Crak can only be regarded as prototype development, we want to improve it ourselves.

Finally, let's talk about its overhead. The main overhead is to save the checkpoint image, so it is still relatively low. It can choose the memory sections to be stored, only the necessary, which also reduces the overhead. Here are a few tests where the first line is optimized with Ckpt_no_binary_file and ckpt_no_shared_libraries (without the sections of code and SHARED libraries), you can save 80% to 90% of time and space.




If you want to study it, it is best to choose Virtualbox + Ubuntu 8.10, and then select kernel 2.6.25 recompile the kernel. Maybe you'll run into a situation where it's running wrong, which is very likely, after all, it's a prototype, and the author says he's got the wrong result, so the job still depends on us. If it runs abnormally, there may be an error in dump Memory:

  for (i = 0, vm = p->mm->mmap; vm!=null; i++, VM = vm->vm_next) {    unsigned char valid_mem;    /* Dump the memory segment *    /Valid_mem = Valid_memory_segment (regs, p->mm, VM);    /* Dump pages and shared libs if we are allowed to *    /if (! ( ((no_binary && valid_mem) | | (No_shrlib &&!valid_mem)) &&   (!) ( Vm->vm_flags&vm_write) | | (Vm->vm_flags&vm_mayshare)) &&   vm->vm_file) {          if (Dump_vm_area (f, p, VM)) {ret =-eagain;goto out;      }  }}

You can first comment out this paragraph and then explore its root cause. Good luck!




Here is a copy of its source code and documentation, as well as a study of Zap that inherits it.


Crak Source:

http://www.cs.fsu.edu/~baker/devices/projects/ale/crak-2.6.25.6.tar.gz  


Crak Documentation:

Http://www.cs.columbia.edu/techreports/cucs-014-01.pdf

http://www.cs.fsu.edu/~baker/devices/projects/ale/


Zap Papers:

http://systems.cs.columbia.edu/projects/zap/

http://dl.acm.org/citation.cfm?id=844162


As a Checkpoint/restart application scenario, you can read this article Rex (Microsoft):

Http://research.microsoft.com/pubs/216938/ppaxos.pdf


Information about kernel-based checkpoint and restart:

http://lwn.net/Articles/293575/







Checkpoint/restart Technology on the Crak--linux

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.