Software-defined storage logic 2: Energy-Adaptive Distributed Storage System

Source: Internet
Author: User
Tags sha1 hash

Software-defined storage logic 2: Energy-Adaptive Distributed Storage System

This paper [3] proposes a flexible and scalable distributed storage system namedFlexStore. This distributed storage system can adapt to the constantly changing energy in the data center and provide excellent performance for disk I/O access of de-duplicated virtual machines. Researchers have studied and proposed an intelligent control method to address the power supply restrictions of data centers, because the node density of storage arrays may increase, it is also possible that green and traditional energy sources work together to power data centers. The most important component in this storage framework is the policy engine in the framework. It is a software layer that provides interfaces for applications to customize performance requirements, of course, it also provides energy-related strategies. The policy engine then implements these policies in the storage system by constantly adjusting the storage resource allocation method.

Finally, they did some experiments. Experiments show that the system can adapt to the changes in workload and power supply restrictions. The goal of the system is to minimize the impact of this change on performance. The prototype also shows that this adaptive backup policy reduces the I/O Latency by about 65% when the power supply is sufficient, and reduces the I/O Latency by more than 40% when the power supply is insufficient.

Note: In my previous blog on the software-defined storage logic, I described the paper as follows: "This paper is compared with IOFlow, A framework that focuses more on software-defined storage (uses existing frameworks to create new frameworks, and then uses existing Protocols), rather than communication-oriented protocols like IOFlow. In addition, this framework is also a software-defined environment framework, not just a storage framework, but the full text focuses on storage (more challenging )... the article fully defines the software-defined storage framework and is integrated with SDE, which focuses on the energy adaptive replica system.

Green energy

First, we introduced a technology called scale-out storage architecture. In this architecture, performance and reliability are crucial, therefore, you often need to copy a data block among multiple nodes, but multiple copies consume more resources and bring an increased cost to the data center. In addition to replication policies, data centers may also use network RAID technology. However, this technology is more expensive in terms of resource consumption and affects the performance under high load conditions.

In addition to performance and reliability, energy consumption is also a crucial factor for Data Center storage systems. A survey shows that storage energy consumption accounts for more than 40% of the total data center energy consumption. Therefore, large research efforts are seeking green energy for supply. The use of green energy has changed the power supply of data centers. Considering the characteristics of green energy, energy is constantly changing, and sometimes energy may be insufficient. For example, google has a green plan. His homepage says this:

"At Google, we have been committed to using all renewable energy to support our enterprise operations. In addition to considering the environmental benefits that can be brought, we also regard renewable energy as a commercial opportunity and continue investing to accelerate its development. We believe that by using renewable energy to support network operations, we will create a better future for everyone ."


Therefore, the data center architecture and services need to adapt to the change and uncertainty of this green energy. When the supply of green energy is insufficient and the conventional energy grids used as backups are also unavailable, even so, performance degradation also needs to be acceptable. When green energy is sufficient, there is no need to consume conventional energy (from the power grid ). This flexible operation is very important to integrate green energy into the data center. These researchers have previously studied EAC (energy adaptive computing) to provide intelligent control in data centers, turn off over-supplied ENERGY and automatically adapt to changing available ENERGY supplies. The emergence of Software Defined storage SDS provides a better management interface for storage.FlexStoreIs a software-defined system that can adapt to and manage storage resources.

System design objectives, architecture, and prototype implementation
In the green energy data center, Virtual Machine backup is very important for Disaster Tolerance. When green energy is used with sufficient energy, there can be more backups. when energy is insufficient, we can also appropriately reduce the number of Virtual Machine backups to adapt to the energy backup. So how can we minimize the performance to ensure QoS when meeting the energy requirements? This is also the caseFlexStoreArchitecture goals. To achieve this goal, we need to first analyze the relationship between energy and performance.


Virtual Machine deduplication

Why does the VM need to be de-duplicated? Because there are many virtual machines in the data center, but usually the virtual machines share the same operating system and configuration, they can be de-duplicated during storage, which can greatly reduce disk space.

How do I remove duplicates?

What are the difficulties of removing duplicates?

As mentioned in many papers, for example, [1] [2]. But it is not the purpose of writing this article.


Virtual Machine copy Model

For data security and data center fault tolerance, several copies are generally created for virtual machines. However, for performance or fault tolerance, there are several different models. three models are provided here.

Strong Consistency: write operation. If all copies are written, the system returns the result;

Weak Consistency: write operation. When a copy is written, it is returned. Only when a copy needs to be changed, it is copied again;

FlexstoreModel: write operation. When a copy is written, it is returned, and the copy is synchronized at a certain time.


A: Relationship between performance and energy

Data center replicas are used for Data Disaster Tolerance and virtual machine load balancing. In the case of multiple virtual machines, IO latency decreases as data copies increase (the more distributed accesses to replica and replica, the smaller the IO queue and the smaller the IO latency ), however, the increase in replicas increases energy consumption. Therefore, in green-energy data centers, energy restrictions lead to a decrease in the number of data copies and an increase in IO latency. If the energy is sufficient, the data can have enough copies without affecting the data delay.

As shown in. As the number of VM replicas increases, IO latency decreases and energy consumption increases.


In a large data center that uses virtualization, a very specific application is: removing duplicates from virtual machines, and storing multiple copies in a lower-level storage system. We have designedFlexStoreArchitecture (to ensure the performance of Virtual Machine disks, also need to achieve de-duplication ).



FlexStoreComponent 2. This paper introduces data plane and Control plane.FlexStore. Data plane isHeterogeneous Storage Systems with different types of storage devices, and the Control plane provides programmable interfaces to Control data placement and layout.

The environment and management of data center virtual machines are described in Metadata management and Chunk storage.

Metadata Management:The Metadata Manager component is placed on the FUSE File System (this file system is used to store the disk chunks of the VM). The file system also needs to execute the deduplication function, it also provides block semantics for virtual machines as the "adapter" layer.

Chunk storage:VM disks are divided into chunk objects stored on lower-level storage devices. Traditional storage systems usually use the unified interface of LUN (Block Storage) or NAS volume (file storage) to use storage, whileFlexStoreThe replica interface is usually used. This storage system is a Distributed Object Storage System (SHA1 hash ).FlexStoreThe storage system provides several backups for a chunk. At a certain time, the VM only assigns a copy and reads and writes the copy.


So what does data plane do? After the volume of the virtual machine is created, the virtual machine can directly read and write the volume. The device driver or hypervisor is required to communicate with different storage systems. The main function of this driver is to pass the VM block request to the lower-layer storage system. In addition, data plane also includes counters and monitors to monitor the number of read/write requests and related latencies, and then transmits the monitored data to Policy engine, PE (Policy engine) then, automatically execute related policies according to certain algorithms according to the system conditions. An interface is also provided to control the plane to control the layout of data. For example, when energy is insufficient, data is integrated into fewer disks, these interfaces correspond to the control plane (migratedata across storage devices/replicas ).


Prototype implementationMetadata manager

? Chunk storage: replica

? FUSE module (VM disks are mounted to the FUSE directory)

? Process: When the VM executes any vdisk I/O operations, metadata manager converts the I/O Request to a fixed chunk (SHA1 Hash). It maintains a hashmap in the memory; metadata manager interacts with the storage system through the thrift client.


Storage system)

? Thrift server interacts with the thrift client of metadata management (key-valve: SHA1 hash value and stored chunk)

? It also interacts with the client of the policy engine.

Use the getLogSize () API to obtain the amount of the new chunk and inform the client of the policy engine (the Log is used to store the new chunk. Only these new chunks are synchronized during synchronization ); the policies of the Policy engine client, such as data transfer, are passed to the storage system.


Policy engine

?Distributed: client + central coordinator on each host

? Client: interacts with meta data manager to collect information, such as the number of VM disk files accessed, the number of disk nodes accessed, and then notifies central of these local states.

? Client + central coordinator: maintain a global view of the system: number of storage nodes, number of active hosts, and available power

? Central coordinator periodically collects information from the client. Then, the global information is fed back to the client, and the client selects the list loaded nodes to store data or move data.

?The Client will initiate the data transfer operation in the following cases:

1) the amount of data not synchronized on replica exceeds the threshold.

2) the load on replica exceeds the threshold.

3) The IO delay of VM disk exceeds the threshold.

4) the replica should be disabled, or a new replica should be enabled.

The actual data transfer only occurs between nodes.



C: Adaptive copy consistency (energy-related) Algorithm


Replicas always need to be synchronized.FlexStoreIn the model system, we set the replica size to a certain period (assuming that the size of the log file (new chunks) of a replica reaches the threshold, and then the replica starts diverging, other copies of this replica start to be synchronized with it ). When the power and bandwidth of this copy are met, how can we achieve maximum shunting? That is to say, the more replica that can be synchronized, the moreQoSLarger. TheThe adaptive copy consistency algorithm is to solve such a problem.

Consider a data center with N replicas. As shown in, the virtual machine is allocated to a replica, And the replica is responsible for reading and writing the virtual machine. When the new chunk is added to this replica, the replica will not stop distributing, so that other copies will remain consistent.


This algorithm is clearly described in this paper, probably as follows:

This is an integer linear programming problem. When N = 2/3, the policy engine can quickly obtain the optimal solution ......


Lab and evaluation


The lab environment is Amazon's IaaS platform:

?4 EC2 M1.xlarge instances that have 4 vCPUs and 15 GB of RAM each as the storage nodes which ran the storage software


? 8 EC2 M1.large instances that have 2 vCPUs and 7.5 GB of RAM each were used as hosts on which the metadata manager component and the distributed policy engine module ran


? All EC2 instances ran Ubuntu Server 13.04 as the operating system.


? A maximum of 4 VMs on each of the hosts.




Workload: Fio and MSR Cambridge trace

Fio: Stress Testing

MSR Cambridge trace: simulates data in the real environment

Deduplication rate: 25%-75%

?Fio: a standard benchmark to evaluate file system workloads


? MSR Cambridge trace: The traces were collected over a period of 7 days from 36 different volumes in a Microsoft datacenter.


? Each VM replayed one out of the four traces. The use of real workload traces in the evaluations helps to demonstrate the behavior of flexStore in a real datacenter.




Tutorial 1: tradeoff



The larger the number of copies, the smaller the IO latency. Through this experiment, we can find that increasing the number of replicas can reduce the IO latency. From this experiment, we can know howConfiguration System: Because less energy is used to reduce the number of system replicas, We need to dynamically and accurately understand the system performance changes and make corresponding trade-offs.


Experiment 2: Adaptive consistency


The size (threshold) of A logfile serving the VM's replica can be a variable. Figure 6 shows that when the logfile size is MB, the read/write IO latency is the minimum, therefore, select the logfile size as the threshold value (because this value will minimize the read IO latency of workload ). Then we tested strong, weak, andFlexstore(Using the logfile threshold tested earlier) Average IO latency of the three consistency models. It is found that the strong model causes a high write latency, which is obvious according to the previous analysis. The weak model andFlexStoreOf course, changing the size of the logfile may have some impact on the experiment results.


Experiment 3: The Impact of cache increases the memory size on the VM host, that is, increases the cache buffer. In this way, the test shows that the average latency of the cache is smaller than that of the memory.

The de-duplication rate of the VM and the cache size of the storage node are also an important factor affecting the delay. If the memory size is the same, the de-duplication rate is 75% lower than the latency of 25%, because the de-duplication rate is high, there will certainly be a lot of duplicate data in the cache, this reduces cache miss. The larger the RAM memory of the storage node, the larger the buffer cache here. If the de-duplication rate is the same, it will lead to a lower latency.


Experiment 4: The energy adaptation experiment compares weak and FlexStoreThe adaptation to energy in both cases, that is, when the energy is tight, the number of replicas decreases, and the IO latency increases, however, the weak model will become very large in a period of time and then fall down (and FlexStoreIn an instant), or the energy is enough. At this time FlexstoreThe I/O latency of is suddenly reduced (and the weak takes a long time), because other disabled replica of the weak model is not always updated, and FlexStoreEven if a power failure occurs, update always requires (using grown energy), and devices like SSD do not consume much power.



In addition, when we calculate the consistency model, we find that the resources allocated to IO and synchronization can be mutually restricted. Therefore, when energy changes, we adjust the IO bandwidth, this can also change the latency. If the bandwidth can be optimized, the latency will be reduced.


Other studies to reduce the energy consumption of datacenter

Write offloading technique [4] is used to bring a disk in the spun down state (to reduce the energy consumption of the data center, redirects write requests from some disks in the spun down state (I guess it is to put these disks in a rest state) to other storage devices, after the device is working again, write the data block back to reduce energy consumption.

Some people have proposed the MAID technology [5] to replace the original high-performance RAID technology and reduce energy consumption. Some disks in a system are used as cache disks to store hot data, while other non-cache disks can maintain the spun-down state.

Another research has designed a distributed file system to optimize the layout of data on storage nodes, so that some nodes can be closed when there is no need to reduce energy consumption [6]. And a decentralized SAN De-duplication technology []. Unlike the technologies mentioned earlier to reduce energy consumption,FlexStoreIt can adapt to energy changes more flexibly and dynamically.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.