The future of mass storage--memory cloud?

Source: Internet
Author: User
Keywords disk server application can storage system

Tcl's founder, academician of the American Academy of Engineering, and ACM fellow John Ousterhout are currently teaching at Stanford University, and his main research projects in recent years have been ramcloud--memory clouds. As the name suggests, Ramcloud is such a new data center storage system, which is a large-scale system composed of thousands of ordinary server main memory, at any time, all information is stored in these fast dram (dynamic random access memory, commonly known as memory), Memory replaces the hard disk in the traditional system, and the hard drive is used only as a backup.

The memory cloud enables both large-scale (100~1000 TB) and low latency (the same data center application accesses a small amount of memory cloud data with just 5~10ms, 100~1000 times faster than the current system). The memory cloud not only simplifies the development of large Web applications, but also creates new data-intensive applications.

The latest edition of ACM Communications (Communications of ACM) published the August 2011 issue of Ousterhout's work with his team to write "The case for Ramcloud"

Over the past 40 years, disk has become the primary storage location for information on the computer system's network.

During this time, disk technology has been significantly improved and leveraged by higher levels of storage systems, such as file systems and relational databases. But disk performance has not changed as much as disk capacity. A reality that cannot be overlooked is that technicians find that disk-based systems are increasingly difficult to scale to meet the needs of large Web applications.

Many computer scientists have proposed new disk-based storage solutions, as well as other recommended replacement disks and flash memory devices. In contrast, the solution we're talking about is the basic idea of migrating online data from disk to DRAM to create a new type of storage on top of the disk.

At this point, a new storage approach called "Ramcloud" will provide a new storage architecture for many future applications.

What is Ramcloud?

The rationale behind the Ramcloud architecture is to store information on all commercial servers on primary storage and create large storage systems using hundreds of thousands of servers. It is said that data stored on Ramcloud is 100-1000 times lower than that stored on a disk-based system, while throughput is 100-1000 times higher.

While individual storage is easy to change, Ramcloud can leverage replication and backup technology to ensure data durability and availability, as is the case with traditional disk-based systems.

Ramcloud can provide low latency and scale, and will change the storage industry's status from the following three points. First, Ramcloud eliminates the scalability problems of developing large-scale web applications. Second, the new application class will be enabled, which will make it possible to drill down 100-1000 times in depth today; Ramcloud clusters can support a single metabolic program or many smaller applications, and can be assured of no additional complexity when small applications are turned into development for large applications.

The architecture principle of Ramcloud

1, in the Ramcloud architecture, a large number of server data centers can be divided into two categories: application server, the implementation of the logic of the application (such as building Web pages and executing business rules) and storage servers, providing a long shared application server storage.

2. Ramcloud represents a new way to represent system storage organization servers, with two key distinctions distinguishing between ramcloud and traditional storage systems. First, all information is kept in ramcloud at all times, and secondly, ramcloud must be built on a certain size (thousands) of storage servers. For applications, the actual number of single storage systems that are independent of the storage server is visible.

3. The information stored in the Ramcloud must be persistent, as it is stored on disk. A single storage server failure will never cause data loss or even a few seconds of data being unusable. Techniques for achieving durability and usability will be discussed at the back of this article.

4. All data stored on DRAM is allowed to be kept in Ramcloud and is 100 to 1000 times times higher than the actual performance of today's high-performance disk storage systems.

5. The latency of access to read hundreds of-byte data over the network through a process running on the application server in the same datacenter may be reduced to 5μs–10μs. By contrast, today's systems typically require 0.5ms to 10ms, depending on whether the data exists on the server's memory cache or must be read from disk.

6. A multi-core storage server should be able to request at least 1 million of the network per second. Depending on what is configured and cached, a similar machine for a disk-based system (the main memory running multiple disks and one storage cache) can correspond to 1000 to 10000 of network requests per second.

Case sharing

At present a feasible ramcloud configuration, each server configuration 24GB dram, which is a cost-effective configuration. Extended memory can lead to a sharp increase in costs. The 2000 server will be equipped with 48TB of storage space, with an average cost of 65 USD per gigabyte. It is predicted that by 2020, with the continuous improvement of DRAM technology, the Ramcloud cost per gigabyte for 1PB-10PB configuration is only 6 USD.

Ramcloud has been applied in practice. For example, a large network retailer or airline uses Ramcloud to spend hundreds of thousands of of dollars. By August 2009, Facebook had about 260TB of all non-image data. This may be close to the practical upper limit of today's ramcloud.

Data such as video, photos, songs and so on have not been applied ramcloud on a large scale, yet ramcloud can actually be used in all online data. With the continuous improvement of DRAM technology, Ramcloud will be more attractive in the future.

The existence value of Ramcloud

First, Ramcloud can be a new architecture for data-intensive applications, where the traditional architecture is that applications, along with code and data, are loaded into the primary storage of a single server, and bottlenecks are obvious, with complex data operations, application sizes, and machine capacity bottlenecks.

Over the past 10 years, a large Web application architecture that serves millions of users has emerged. It mainly stores application code and data in different servers in the same data center. The application server stores only the requirements of the current request and processing browsers, and this architecture allows applications to be extended to thousands of application servers and storage.

Unfortunately, in a large architecture diagram, the complexity of the application, and the latency of data access, is problematic when the server adds 4-5 of the magnitude. For example, when Facebook receives an HTTP request to access a Web page, the application server must emit more than 130 data to generate HTML pages, which have the order of instruction requests, and the accumulation of these requests is one of the factors causing the user's overall response time delay, so a considerable amount of development is required. To minimize the size and number of code requests to the server.

MapReduce is a new technology that has arisen in recent years to improve data access speed and eliminate latency problems, but it solves large-scale problems, but if it is continuous data access, it will make MapReduce limited to the use of random access data.

Ramcloud combines the advantages of both-scale and low latency: preserving the scalability of Web applications while reducing data access latency to close to traditional applications.

Extended Storage for existing applications

For new applications, Ramcloud will make it easier to build. Because of the lack of an extensible storage system, it is difficult to develop large Web applications now.

In the past, all Web applications used relational database storage, but as the size of the data expanded, a single relational database could not meet their I/O requirements. So you start to do system upgrades and introduce new technologies to extend your storage systems (such as data partitions across multiple databases).

For example, while Facebook has 4,000 MySQL servers in 2009, the existing storage system still does not meet its I/O requirements due to a large number of interactive data calls, So Facebook uses 2000 memcached as a distributed memory object cache server-storing some key values in main memory, but the bottleneck is the need to deal with the consistency between memcached and MySQL servers, Application management needs to be managed (such as refreshing cached values to update the database), which undoubtedly adds to the complexity of the application.

As a result, NoSQL began to appear, using a relational database with key value pairs to store, its structure is not fixed, each tuple can have different fields, each tuple can add some of their own key value pairs, so it will not be limited to fixed structure, can reduce some time and space overhead, But their bottleneck is still disk speed.

One of the principles of Ramcloud is to provide a common storage system that is much larger than the existing system, and application developers do not need to take a special approach (such as a nosql system). Ideally, Ramcloud provides a simple model that is easy to use and extensible, and does not require architectural changes to the application's urban warfare.

The development trend of technology

Ramcloud's development is driven by the evolution of disk technology. Disk capacity has increased by more than 10,000 times times since the 80 's, the future will continue to increase (table 3), unfortunately, the speed of information access on the disk has stopped, the significant improvement by reducing the seek time and rotation delay, but only increased by 50 times times, and capacity is not directly proportional to the increase.

Hard disk technology unbalanced development of the structure, resulting in the data must be kept in memory non-stop access. If the disk is used to fill a specific size block, how long can each block be accessed, if calculated according to the capacity/bandwidth ratio? What about random access? One possible way is to reduce the utilization of the disk, and if only half of the disks are available, the speed of access can theoretically be doubled. Data show that by the end of 2009, Facebook can really use only 10% of the disk capacity, obviously, this cost is a little too high, in terms of economy and energy conservation, is definitely not a good structure.

Cache

For software engineers, if most access is just a small piece of disk, high performance can be maintained on top of the most frequently-dram block, and in an ideal state, a caching system can provide a similar DRAM performance + disk cost. But there is a 1000 times-fold gap between DRAM and disk access, which means that the cache must have a very low loss rate to avoid performance losses.

SSD

For now, there are two options for SSDs: DRAM based devices and flash based devices.

DRAM storage devices are faster than flash based storage devices, but cost much more.

For example, the 2TB size of a storage device based on flash memory costs about 180,000 dollars; by contrast, the storage capacity of the same but DRAM based storage equipment costs up to about 1 million dollars. DRAM-based drives read or write data for as long as 0.015 milliseconds, working at random speeds up to 400,000 times per second I/O can be processed. This drive is best for write-only software and companies that use High-performance database application systems.

A flash-based storage drive reads or writes data for 0.2 milliseconds, with a maximum read speed of 100,000 I/O per second, and a maximum write speed of 25,000 I/O per second. This technology is also more suitable for the use of read-oriented applications.

(Responsible editor: admin)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.