The enterprise customers (in fact, all customers) who want to deploy the application to Windows http://www.aliyun.com/zixun/aggregation/13357.html ">azure" are most concerned about the security of their data. When you free up disk space and reassign it to other customers, make sure that the new owner cannot read the original data on the disk after freeing the space, which is sometimes overlooked in data protection. An extreme example is that the discarded processing is removed from the data center drive or used again in other tasks. Using 0 or other patterns to overwrite freed space before releasing is the easiest way to ensure this. This coverage can significantly affect performance, so Azure, like most systems, uses a more complex but more efficient mechanism.
In this article, we'll find the practice of Windows Azure and SQL Azure software for the following purposes: Preventing the removal of Windows Azure virtual machine instances, Windows Azure virtual machine drives, Windows Azure drives, Windows Azure Storage, SQL Azure data, or SQL Azure instance itself causes data disclosure or exposes one customer's data to other customers. The details of these mechanisms vary, but the concepts are similar: no user is allowed to read data from a disk location that was not previously written to.
The detailed information in this article is provided by the Windows Azure Chief Software engineer, the distinguished Security architect Charlie Kaufman. You can find some of Charlie's books here and here. Charlie, thank you!
About the concept of data protection
In practice, disks are sparse allocated. This means that the total amount of disk space is not allocated when the virtual disk is created. Instead, you create a table that maps the address on the virtual disk to the area on the physical disk, and the table is initially blank. The first time a customer writes data on a virtual disk, the physical disk space is allocated and the flags are set in the table. We can understand the concept by following a series of graphs:
Figure 1: The data block assigned to the user
In Figure 1 above, two blocks of data are allocated for each of the two users based on their respective write requests.
Figure 2: The user frees the data block
In Figure 2 above, a user "deletes" the data to free the data block. Data blocks are marked as available, and others are not affected.
Figure 3: Assigning a recently released block of data to a user
In Figure 3 above, the newly freed block of data and the previously unassigned block of data were allocated for the new user when the write was requested. Previously released blocks of data are still unaffected. Essentially, the procedure is that when a user requests to write to disk, it must determine whether there is enough space on the existing disk that is assigned to the user to store the new data. If so, the new data overwrites the data in the existing block. If not, a new block of data is allocated and the data is written to the new block. You can view this logic in the following illustration.
Figure 4: User requests to write data to disk
The problem now is that a customer may read data that other customers have deleted, and Azure administrators may also read the data that the customer has deleted. If anyone tries to read from a zone on a virtual disk that has not yet been written to it, no physical space is assigned to the zone and no data is returned. We can view the logic and the results in the following figure. Only the Azure administrator can read blocks marked as available, but the administrator cannot use any utility to determine the owner before the block.
Figure 5: The user makes a read request
Conceptually, this applies to any software that tracks reading and writing. For SQL Azure, this is done by the SQL software. For azure storage, this is done by Azure storage software. For a VM's non-persistent drive, this is done by the host operating system's VHD processing code. Because client software can access only virtual disks (the mapping from virtual addresses to physical addresses takes place outside the client VM), read or write requests cannot be made to physical or idle physical addresses that have been assigned to other customers.
Note: In some cases, the write logic (see Figure 4) is modified, and the data on the disk is not overwritten when the block is written for the second time. Instead, a new block is allocated and the data is written to the new block. The old blocks will be marked as available. This method is often referred to as a log based file system. It may sound inefficient, but this approach allows most of the data to be written to a contiguous location on a physical disk, minimizing the time to find and achieving better performance. These details are transparent but relevant to the customer because it means that even if the customer uses 0 to explicitly overwrite each block of the virtual disk before releasing the disk, it does not guarantee that the customer's data will not remain on the physical disk.
Windows Azure virtual Machine (VM)
After the VM is deleted, the disk space that originally stored the contents of its local virtual disk is marked as available, but not completely zeroed out. The space will eventually be used to store data for other VMs, but does not specify the maximum time that expired content should remain on disk. However, the virtualization mechanism is designed to ensure that no data disclosure threats are generated until other customers (or the same customer) cannot read the points on the disk until the data is written again. After creating a new virtual disk for the VM, the virtual disk appears to have been zeroed out because we always return zero when reading an unnamed virtual disk region. If you reinitialize the VM instance, it is the equivalent of moving it to the new hardware.
Windows Azure VM Drive and Windows Azure Drive (x-drive)
In Windows Azure, there are two types of virtual drives that a VM instance can access. Compute the local disk of the node, which is the C: disk, D: disk, and E: Disk in Web role and Worker role. The data on these disks is not stored in a redundant fashion and must be treated as transient data. If a hardware failure occurs, the VM instance is moved to a different node, and the contents of the virtual disk are reset to the initial value. If the VM instance is reinitialized, the C: disk, D: disk, and E: disk revert to the initial state, which is equivalent to moving it to the new hardware.
Windows Azure Drive (also known as "x-drive") is stored in a Blob in Windows Azure storage. X-drive is persistent and will not be reset unless the customer takes an explicit action to replace it. This data is stored in redundant mode and will not be lost even if a hardware failure occurs. Deleting a VM instance does not delete the data in the associated x-drive. Deleting the blob itself (or deleting the storage account containing the BLOB) deletes the x-drive. See the next section, which describes how to handle data deletion in Windows Azure storage.
Windows Azure Storage (tables, blobs, queues)
In the Windows Azure storage subsystem, the customer data is not available once the delete operation is invoked. All storage operations, including deletions, are designed to achieve immediate consistency. A successful delete operation deletes all references to related data items and cannot be accessed through the storage APIs. All copies of deleted data items will eventually be reclaimed. When the associated block of storage is re used to store other data, the physical information is overwritten (that is, reinitialized), just like a standard computer hard disk drive.
SQL Azure
In SQL Azure, deleted data is marked for deletion, but will not be zeroed. If you delete the entire database, it is equivalent to deleting all of its contents. In any case, the SQL Azure implementation can ensure that the data used is not compromised by preventing all access to the underlying storage (except through the SQL Azure API). The API allows users to read, write, and delete data, but never allow the user to read data that was not previously written.
Automatic backup and Forensics
Typically, customers want to ensure that their data is not accessed without authorization. In some cases, they even want to ensure that deleted data is not accessed without authorization. Once the data has been deleted or changed, can no longer be retrieved by the interface provided to the customer, but the data may remain on disk for a considerable period of time, and can theoretically be recovered with an internal forensics tool (but the likelihood of the deleted data is reduced over time). Eventually, any physical disks removed from the production environment will be completely erased or destroyed.
We are considering introducing features in the near future that will allow customers to recover deleted data (and restore changed data) without having to make an explicit backup. Using these tools, you cannot fully assure the customer that the data will not be accessed by authorized parties after it is deleted. Any such tool can retrieve deleted data within a limited amount of time (no more than 30 days) unless the customer chooses a longer backup time. At the time of this writing, there are some open tools that allow you to recover data deleted from the SQL Azure database within 14-21 days. There is no such tool for azure storage or azure computing temporary disks.