This series of articles describes the relationship between cloud and backup, including:
(1) VMware Virtual machine backup and recovery
(2) KVM virtual machine backup and recovery
(3) Cloud and backup
(4) OpenStack and backup
(5) Public Cloud and backup
1. VMware Basics related to backup 1.1 VMware virtual machine disk files on ESXi host
Simply put, each virtual disk of the VM consists of three files on the ESXi host (the virtual machine name is Sammy-target-win-small, and the following is the three files for its first disk):
- SAMMY-TARGET-WIN-SMALL.VMDK (config file, size 633 bytes)
- SAMMY-TARGET-WIN-SMALL-FLAT.VMDK (binary file, size 12884901888 bytes)
- SAMMY-TARGET-WIN-SMALL-CTK.VMDK (binary file, size 78694 bytes)
which
- The first file holds the metadata for the disk, which includes information about two additional files
# Extent DESCRIPTIONRW 25165824 VMFS "SAMMY-TARGET-WIN-SMALL-FLAT.VMDK" # change Tracking filechangetrackpath= " SAMMY-TARGET-WIN-SMALL-CTK.VMDK "
- The second file is the Extent description file, and the binary data is saved in this file. Here's how to get the data in the file using the API.
- The third file is a CTK file. Let's talk about CTK later.
1.2 Snapshots (Snapshot)
A snapshot of a virtual machine is the state and data of a virtual machine at a certain point in time, in which the state refers to the state of the virtual machine, including the running state, configuration, etc. data refers to the virtual disk data of the VM. The basic operations of snapshots include:
- Creating a Snapshot (Create)
- Deleting a snapshot (delete)
- Snapshot merge (Consolidate)
- Revert to Snapshot (revert)
1.2.1 Creating a Snapshot
Create a snapshot of the virtual machine above, with the exception of the snapshot definition file, three new files have been added to the disk:
-RW------- 1 root root 786944 Jul 10:55 sammy-target-win-small-000001-ctk.vmdk-rw------- 1 root root 28672 Jul 10:55 sammy-target-win-small-000001-delta.vmdk-rw------- 1 root root 428 Jul 10:55 SAMMY-TARGET-WIN-SMALL-000001.VMDK
The first one remains the CTK file, the second is the delta file, and the third is a non-binary file. Then create a second snapshot, which is what it looks like:
(RW = read-write, RO = readonly)
From the data point of view:
(The green part is the view of the data from the virtual machine perspective; the bottom red box is the data in the base vmdk; the middle red box is the data in the Delta VMDK)
Now you can briefly summarize the features of VMware snapshots:
- Snapshots Save the state and data of a virtual machine at a certain point in time
- Taking a snapshot of a virtual machine is equivalent to setting the virtual machine's current disk to read-only and then creating the Delta vmdk file, which will accept the new data write operation. In the case where there are multiple snapshots, the previous snapshot disk becomes read-only.
- Write loss: When writing, follow the copy-on-write mechanism, according to the data chunked, when you need to modify the data in a piece, first copy it from the parent vmdk to the Delta vmdk, and then modify it.
- Read loss: When reading a piece of data, ESXi needs to determine where to read: read from the parent VMDK for data blocks that have not been modified, and read from the Delta VMDK for data blocks that have been modified. As can be seen, the client's one-time read operation may require reading data from different vmdk.
- The delta vmdk size does not exceed the size of the base vmdk because the limit is that all data is copied to the Delta VMDK and has not been modified.
- Because a snapshot brings read and write losses, a virtual machine cannot have too many snapshots. VSphere limits a virtual machine to a maximum of 32 snapshots, but the recommended maximum is 2-3, and the snapshot is retained for no more than one day.
1.2.2 Deleting a snapshot
Obviously, the snapshot is only the internal data, which is the state of the virtual machine at a point in the past, is invisible to the outside, so deleting the snapshot does not affect the current state and data of the VM. Therefore, there are three possible ways:
(1) The snapshot is based on the original virtual machine: The data in the Delta VMDK is merged into the base VMDK and then the Delta VMDK is deleted. (e.g., S1 in)
(2) The snapshot to be deleted is on the virtual machine's data path: The data in the Delta VMDK is merged with the VMDK of the parent snapshot, and then the Delta VMDK is deleted. (e.g., S2 in)
(3) The snapshot to be deleted is no longer on the virtual machine's data path: Do not need to merge, delete directly. (e.g., S3 in)
Now you can briefly summarize the features of the delete snapshot:
- Deleting a snapshot means that the changes after the snapshot are merged into the data before the snapshot, so the VM can no longer return to the state of the snapshot.
- The delete snapshot process consists of two asynchronous operations: deleting the snapshot from Snapshot Manager and merging the VMDK data. If the first step succeeds and the second fails, then the remaining delta files are retained, which is the manual merge operation that will be described below.
- Deleting a snapshot can lead to a large amount of data write operations, during which the performance of the virtual machine is negatively impacted
- Deleting a snapshot can sometimes take a long time, especially for long-time snapshots of large-capacity disks
- When all snapshots are deleted, the process has been optimized since VSphere 4 Update 2, and is no longer redirected to the next layer of consolidation, but each layer is merged directly into base disk.
1.2.3 Snapshot Merge (Consolidation)
The data merge of the snapshot delete operation mentioned above may fail. This failure can lead to a number of problems, including unnecessary disk space consumption and decreased VM performance. Therefore, when this happens, VCenter will prompt the user for the need to do consolidation. This checks all current VMDK hierarchies for the VM and merges the redundant delta files before deleting them.
1.2.4 revert to Snapshot
Reverting to a snapshot operation is also a good idea, which is to point the VM's base VMDK to the target snapshot's vmdk, which results in the absence of any changes since the target snapshot was created.
1.3 VMware API
VMware offers a very rich API:
Among them, we can divide the APIs related to backup into two categories, one is the control plane API, they are mainly used for the management of VSphere virtualized environment, and the other is the data Plane API, which operate virtual disks for virtual machines.
1.3.1 VMware APIs and SDKs
VMware provides access to clients through WEB services, which can be used to manage virtual machines and other virtual facilities, including data centers (datacenter), data storage (datastore), networks, and so on. It also provides SDKs including Java,. NET, Python, Perl, REST, and several other languages such as Ruby. For other languages, the need to access their web Service,gsoap through the SOAP protocol is a common suite for writing Web service client programs in the C + + language.
For more information, please read https://www.vmware.com/support/pubs/sdk_pubs.html
1.3.2 VDDK and VADP
The VDDK full name is Virtual Disk Development Kit, which helps developers create applications that access virtual machine storage. VDDK is based on the Virtual disk API.
The virtual disk API, or Vixdisklib, is a set of functions that manipulate a vdisk file in the VMDK format. Its main features include:
- Create, convert, expand, defragment, shrink, and rename virtual disk files
- Create redo logs and delete vmdk files
- Accessing arbitrary data in a VMDK file, and reading meta data
- Connect to the remote Vsphe storage, using advanced transports, including Sans (the server on which the backup program resides can be directly connected to the storage connection of FC or ISCSI and virtual machine disks), Hostadd (virtual disk attached to the virtual machine on which the backup program is a disk) and LAN (the Backup program passes LAN access to virtual disks).
VADP full name is the VMware Storage apis-data Protection (VMware Storage api-data protection), which uses the virtual Disk API and some Vsphre APIs to create and manage snapshots of virtual machines that support Report
1.4 CBT (Changed block Tracking blocks change tracking)
CBT is a feature of VMware introduced in VSphere version 4.0 in order to achieve incremental backups. VDAP uses this feature to enable incremental backup of the various VM backup applications developed based on it.
All data blocks of the VMDK are saved relative to a full-scale backup (left), and the CBT-based incremental backup saves only the changed chunks since the last backup (right). ESXi creates a CTK file for each virtual disk that has a CBT-enabled VM, which is used to hold metadata for the changed blocks. This feature will have a bit of a performance penalty on the disk because it can be turned off when not in use, but its benefits to backup are obvious.
The function that gets the change block of CBT is defined as: Querychangeddiskareas (snapshot, Devicekey, Startoffset, Changeid). which
- Snapshot represents the current snapshot, which is the back end of the "change" time period;
- Devicekey is the device ID of the target virtual disk;
- Startoffset is the offset that begins to acquire the change block;
- Changeid refers to the front-end point of the "change" time period, which is the changeid of the old snapshot.
The results are similar to "(117768192, 65536), (132120576, 65536), (145096704, 43122688), (265289728, 65536), (958398464, 65536)", and the format of each item is (Offset,length), which represents a changed block of data.
1.5 quiseced Snapshot and VMware Tools
Virtual machine snapshots can be divided into three types according to different consistency:
- Crash-Consistent snapshot (crash-consistent snapshot): When the application on the virtual machine is still running, the IO is still in progress to get this snapshot. It is equivalent to the state that the computer suddenly loses power to the disk.
- File system consistent snapshot (File-system-consistent snapshot): Before the snapshot is taken, the file system of the virtual machine is temporarily frozen, the dirty data in memory is brushed into the disk, and after the snapshot is done, the file system is thawed. The snapshot at this time is file system-consistent.
- Application consistency (application-consistent snapshot): Before the snapshot is taken, the application is temporarily frozen, all data applied in memory is brushed to disk, and after the snapshot is done, the application is unfrozen.
The default snapshot is the first, to get the latter two snapshots, you need to add the appropriate steps. Its implementation can be divided into two main types:
- On newer Windows clients, Windows provides the VSS (Volume Shadow Copy Service) service, which can be requester-writer Application and file systems that have frozen requirements are thawed before the snapshot is frozen and before the snapshot is taken.
- In Old Windows, VMware provides SYNC drivers, and on Linux systems, VMware provides vmsync kernel modules for file system consistent snapshots.
- To implement a consistent snapshot of applications on non-Windows clients, you need to write a script that applies to the application, freezing or thawing the application after the call.
Who will call the VSS service, SYNC driver, Vmsync kernel modules, and custom scripts? VMware Tools, which is a standalone program with different operating system versions, needs to be installed in the client. In the case of VSS, VMware Tools assumes the role of VSS requester before and after it invokes the VSS service, and the VSS service invokes the registered VSS Writer to perform the appropriate operation. is a simple example:
The following two types of snapshots are called quiseced snapshot. The complete process is about:
- The user issues a quiesced snapshot creation request to Vcenter,vcenter to the HOSTD service where the virtual machine is located.
- The HOSTD on ESXi passes the request to VMware tools within the client
- VMware Tools notifies Vss,vss as a VSS requester and notifies registered file systems and VSS writer for each application to perform a freeze operation
- Once completed, VMware tools will tell hostd the results
- HOSTD to perform the snapshot operation again
- End of operation, thaw file systems and applications in the previous order
Again, VMware tools. On the Windows system, its installation package contains a lot of drivers, these drivers can enhance the user experience of the virtual machine, such as a smoother mouse, higher resolution, better sound, etc. in addition to these drivers, there is VSS support, which is VMware tools and Windows The bridge between VSS interactions. To create the quiseced snapshots, this must be installed.
Note When installing VMware Tools, now VWC select Guest->install/upgrade VMware Tools, then log in to the virtual machine, locate the disk that was attached to the previous step, and then double-click Setup to begin the installation process.
2. Basic architecture of backup software for traditional VMware environments
3. Brief VMware virtual machine image Backup and recovery process 3.1 backup process
Brief process:
- The Backup program uses the VSphere API to establish a connection to the virtual machine and to back up the virtual machine's configuration information
- Creating snapshots using the VSphere API often creates snapshots of the quiseced type to ensure application or file system consistency
- Using the VDDK API to establish a connection to the first disk of the snapshot, the connected transfer mode will be one of the SAN/HOSTADD/NBDSSL/NBD.
- On the disk, call the Querychangeddiskareas interface to get a list of blocks of data that changed between it and the disk when it was last backed up
- Call the VDDK API to read the contents of a changed block of data and write to a backup in the store
- Processing other disks in turn
- After all the disks have been processed, delete the snapshot and disconnect from the virtual machine
Characteristics:
- Using the snapshot feature, you can save the state and snapshot of a VM at a point in time, and the VM will run as usual after a short time. At the end of the backup, the snapshot is deleted so that the performance of the virtual machine is not affected.
- With the VADP API, only data blocks that have changed on the disk between two backups are read. Of course, the first time is to make a full backup.
- Only the changed blocks are written to the back-end storage, which means that the back-end storage must be responsible for maintaining the relationship between the first full backup and each subsequent delta backup. The equivalent of moving VMware's Snapshot manger functionality to the back-end storage of the backup software.
3.2 Recovery Process
Brief process:
- The Backup program uses the VSphere API to establish a connection to the virtual machine to be restored and restore the virtual machine's configuration information
- Creating snapshots using the VSphere API often creates snapshots of the quiseced type to ensure application or file system consistency
- Use the VDDK API to establish a connection to the first disk of a snapshot
- On the disk, call the Querychangeddiskareas interface to get a list of blocks of data that changed between it and the disk when it was last backed up
- Call the VDDK API to read the changed block data from the stored backup and write to the corresponding location of the snapshot disk. After all the changes to the disk block are written, close the connection to the disk.
- Processing other disks in turn
- Revert a virtual machine to a recovered snapshot
- Delete the snapshot and disconnect from the virtual machine
Characteristics:
- Before you operate, you need to ensure that the virtual machine is in a shutdown state
- Also take advantage of the snapshot and then use the API to get the data blocks that have changed between this snapshot and the snapshot of the last backup, and then use the data from the saved backup to overwrite the corresponding data block in the snapshot disk that has changed
- After the snapshot's disk vmdk files are restored, perform a snapshot recovery
- After you finish, delete the snapshot
- Although the Delta data block is uploaded during backup, all data blocks need to be read when doing recovery.
Cloud and Backup (1): VMware Virtual machine backup and recovery