The future of mass storage--memory cloud

Last Update:2015-03-17 Source: Internet

Author: User

Keywords Memory Cloud cost

Tags .mall access application application server application servers applications backup based

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The fastest way to store a computer system is the ram traditionally used primarily as memory. As the access performance of the hard drive has deteriorated over the years, and the cost of RAM has been decreasing, many researchers have been exploring how to replace the hard disk with memory in recent years.

As early as the 1980, David DeWitt and Garcia-molina and so on published a main memory database paper. Steven Robbins's article points out that Turing's winner, Jim Gray, is more explicit about the notion that "memory will become a hard disk, the hard disk will become a tape" (a blog from Tim Bray's 2006 discussion of Grid computing, which he had already expressed in his 2003 interview). IMDG (Memory data grid) was once a very popular concept. In practice, search engines such as Google and Yahoo have completely indexed dram, and Google even keeps snapshots of all the pages in the Internet in memory. Memcached and BigTable are also practical examples of using memory as an important storage medium. Dare Obsanjo, in its analysis of the Twitter architecture of 2008, also saw that the biggest burden of similar new applications was hard disk I/O, so it tended to put random operations into RAM, leaving only sequential operations to the hard drive.

Tcl's founder, academician of the American Academy of Engineering, ACM fellow John Ousterhout left the industry to teach at Stanford in 2008, with the support of Facebook, Mellanox, NEC, NETAPP, SAP, Leading a team to engage in ramcloud--memory cloud research, but also to push this trend to the extreme. As the name suggests, Ramcloud is such a new data center storage system, which is a large-scale system composed of thousands of ordinary server main memory, any time, all information is stored in these fast dram (dynamic random access memory, commonly known as memory), Memory replaces the hard disk in the traditional system, and the hard disk is used only as a backup.

At the end of 2009 Ousterhout team in Sigops keyboard-based BAE Review published the paper "The Case for Ramcloud", causing widespread concern. The latest edition of the Communications of ACM August 2011 published a paper written by Ousterhout in collaboration with his team, which describes the theory and practice of memory cloud more comprehensively and completely.

The paper points out that in the past 40 years, the main storage mode of computer systems is hard disk, file system and relational database are all developed for hard disk. However, while the capacity of the hard disk has increased rapidly (more than 1000 times times since the middle of the 1980), performance has been difficult to compare, with a 50 times-fold increase in transmission rates and a twice-fold increase in latency. If measured by capacity/bandwidth (Jim Gray's rule), the access latency of the hard drive actually worsens dramatically, as shown in the figure below (from the CACM paper).

At the same time, the rapid development of the Internet, so that the structure of software has changed dramatically. As the following illustration shows, unlike traditional application data and computing and application logic are all on one computer, Web application architectures often use the method of computing and storage separation, in the data center there is a dedicated business logic, front-end rendering of the application server, in addition to a dedicated storage server. The application server is stateless and stores only the state of the current browser request. This separation and stateless way enables the system to scale well to hundreds of servers to meet the access needs of millions of of users. However, this architecture also exacerbates the problem of data access delays-large web sites such as Facebook or Amazon need to make one hundred or two hundred internal requests to access multiple hard drives in order to generate an HTML page. As the server increases by 4-5, the complexity of the application increases dramatically, and software development becomes more difficult and the workload surges.

Therefore, the speed of hard disk access is the main bottleneck of computer system development.

In order to solve the problem of data access delay, research and development researchers and researchers proposed a variety of solutions: using memcached cache, database partitioning, more flash replacement for hard drives, SSD, using MapReduce and Hadoop asynchronous job scheduling, NoSQL, Distributed file systems, and so on.

The Ousterhout team, however, proposed a new solution--ramcloud (memory cloud) that migrates the primary storage center of the online data from the hard disk to the DRAM via a large, general-purpose server memory cluster, which is used only as a backup/archive. This memory cloud can achieve both large-scale (100~1000 TB) and low latency (the same data center application accesses a small amount of memory cloud data with just 5~10ms, 100~1000 times faster than the current system).

Ramcloud Overview

The most appropriate scenario for Ramcloud is the data center that divides servers into application servers (primarily for generating web pages and executing business rules) and storage servers (which provide long-term shared storage for application servers). These data centers generally support many applications, some small, using only one part of the capacity of a server, some very large, to use thousands of dedicated applications and storage servers.

Ramcloud differs from other storage systems at two points:

All information is kept in DRAM at all times, not as memcached cache or I/O devices like flash, and must be able to automatically scale to thousands of storage servers, and the number of storage servers is transparent to the application and appears to the developer as a storage system.

Also, the information stored in the memory cloud must be as persistent as the hard disk, and the failure of a single storage server cannot result in data loss and even a few seconds of service being unavailable.

Ramcloud stores all the data in DRAM, and performance can be as high as 100~1000 times higher than today's most high-performance hard disk storage systems. In terms of access latency, a process running in an application server in the Ramcloud scenario reads hundreds of bytes of data over the network from a storage server in the same data center with only 5~10μs, At present, the actual system usually spends 0.5~10ms, depending on whether the data is in the server memory cache or hard drive. Also, a multi-core storage server can serve at least 1 million small read requests per second. The same machine in the hard drive system can only service 1000~10000 requests per second.

The practicality of Ramcloud

The overall use of memory as primary storage media, a common problem is of course the cost. This is explained in the paper. The following table lists a Ramcloud configuration, 2000 servers, each server configured with a 24GB dram, the entire memory cloud capacity can reach 48TB, calculated at 2010 price, the average cost per gigabyte is 65 dollars. By increasing the number of servers, the total capacity can be as high as hundreds of TB. By 2020, with the continuous improvement of DRAM technology, the price is decreasing, the memory cloud capacity can reach 1~10PB, and the cost per gigabyte is only 6 dollars.

Ramcloud can already be used in many practical applications. The table below estimates the cost of a large network retailer and a large ticket booking system, between tens of thousands of and hundreds of thousands of dollars, if Ramcloud is used. As of August 2009, all of Facebook's non-image data was about 260TB, which could be the upper limit for current Ramcloud applications. The memory cloud is not currently available to store data such as videos, photos, songs, etc., but the situation is expected to change quickly within a few years.

Comparison with other programmes

At present, many schemes have been put forward to solve the bottleneck of data access delay of large-scale Internet application system, but each of them has its own problems. This is summarized in the paper.

1.MapReduce

MapReduce the application into a number of parallel steps, the data into a number of large sequential reading, effectively solve many large-scale problems, so in recent years has become very popular. But MapReduce is difficult to apply to data that must be randomly accessed. In fact, Google, in addition to MapReduce, has a special architecture for graph processing, called Pregel. In addition, the founder of the Pregel project, G.malewicz, will attend this year's Hadoop in the Congress.

and Ramcloud because of the scalability and low latency at the same time, there is basically no such limitations. On the contrary, the application of large-scale collective collaboration, the translation of statistical language needs to traverse graph model is probably the best use of memory cloud.

2.NoSQL

The rise of NoSQL is mainly due to the unprecedented scale of internet stations, which cannot be addressed with conventional relational databases. However, a variety of nosql solutions are generally unable to achieve the commonality of relational databases, and they are still limited by hard disk storage performance.

The goal of Ramcloud is to provide a unified storage system that is far more scalable than existing solutions, greatly simplifying development, not only for new applications, but also for existing applications, without even having to reorganize the application code.

3.Caching

Ideally, Caching (caching) can bring DRAM-level read and write performance to the system. However, the problem with caching is that the missing data is still stored on the hard disk, so a very small failure rate can cause significant performance damage. Moreover, more and more complex links between the more and more applications generated (such as Facebook, Twitter, friends, or concerns) are difficult to localize, resulting in more cache size. In Facebook, for example, nearly 25% of all online data in August 2009 was kept on memcached servers, and if the cache in the database server was added, nearly 75% of the data was actually in memory.

Obviously, Ramcloud only needs to add a small amount of cost compared to caching (the Facebook example adds One-fourth memory), but it avoids access patterns and local problems.

4. Flash memory

Ramcloud can actually be built with flash memory, with lower cost and less energy consumption. But there is still a gap between latency and throughput and DRAM. Even in the case of cost, high query rate and small dataset, DRAM is the lowest cost, low query rate and large data set, the cost of hard disk is the lowest, while the flash memory is in the middle.

Future flash latency may catch up with DRAM, and new technologies such as phase change memory may be more advantageous than DRAM, but now the benefits of DRAM development are no uncertainties, and existing schemes (replication mechanisms, cluster management, and deferred system methods) can still be used for other technologies in the future.

Challenges facing Ramcloud

Low-latency RPC: Most of the current network designs are sacrificing latency-guaranteed throughput. Although there are already InfiniBand, Myrinet, Arista 7100S and other high-speed network equipment, but the normal data center is based on Ethernet/TCP/IP, to achieve 5~10μs delay (Ousterhout originally envisioned RPC to achieve 1μs, it seems to be quite difficult) , we must solve many problems of hardware and software.

Persistence and Availability: DRAM is a volatile store and it is of course critical to achieve at least the same durability and availability as the hard disk. Failure of a single server, systematic power outages in the data center, and so on cannot result in data loss and service disruption. The easiest solution is to save multiple replicas in the dram of a different server, but the cost is too high, and the data will still be lost if the data center is powered down. What about backups on the server? However, if you want to update the hard drive synchronously when you write, the latency is too high and the memory cloud is lost. For this reason, ousterhout and other proposed "buffer log" (buffered logging) ideas. The principle is shown in the following illustration.

Cluster Management: However, when the datasheet is large, the Ramcloud software must be able to automatically partition on multiple servers without affecting the application in operation. To ensure system throughput, minimize replication and replicate only when maintenance data persistence and availability are required.

Multi-tenant: In order to support the cloud computing multi-tenant model, the memory cloud needs to support a wide range of applications and provide the appropriate billing methods. Furthermore, access control and security mechanisms need to be provided for different users. In addition, high load applications cannot affect the performance of other applications.

Data Model: Relational databases are inherently expensive and incompatible with low latency in the memory cloud. The memory cloud may require k storage with acid characteristics.

concurrency, transactions, and consistency: the extent to which the memory cloud should provide transactional, which is a problem. But high-speed updates to the memory cloud can significantly mitigate transactional conflicts and provide atomicity and consistency on a larger scale.

The disadvantage of Ramcloud

Ramcloud The most obvious flaw is that the bit cost and energy consumption are relatively high, than the hard drive system to 50~100 times, than the flash system is 5~10 times worse. And the area of the data center is larger. Ramcloud is not an ideal solution if your system is expensive and requires less access performance. However, according to the number of operations, ramcloud cost and energy consumption is much better than other types. For high throughput systems, Ramcloud not only delivers high performance but also delivers energy efficiency.

Another drawback of Ramcloud is that it provides high performance in only one datacenter. For applications across multiple data centers, the update latency is determined by the distance between the data centers, so Ramcloud has no advantage in write operations, but still provides a lower latency for read operations across the data center.

The potential impact of Ramcloud

Obviously, the memory cloud will have a wide impact on the computer world if it can be used in a large number of practical applications.

First, new data-intensive applications, such as those involving large-scale graph algorithms, will be spawned. Resolves many of the scalability issues that currently affect the productivity of developers, simplifying the development of large Web applications. The Ramcloud of 1 000~10 000 storage servers can support thousands of application servers to access 1014~1015b (100TB~1PB) datasets at the total speed of 109–1010 Requests/sec. and the storage model of the memory cloud is flat, and any object can be accessed at high speed wherever it is placed. Second, providing the scalable storage infrastructure needed for cloud computing and other data center applications will accelerate the process of cloud computing adoption. Third, the memory cloud requires very low latency and will have a significant impact on the network infrastructure that originally considered bandwidth, including protocols, devices. Four, it may affect the design and management of the data center. Operational managers want to think more about the problem from a low latency perspective, and the memory cloud will drive the adoption of new storage devices. Finally, a new approach to server architecture will be triggered, including new power management technology (battery backup, super capacitor), which embodies speed, memory capacity, new balance of energy consumption server design, server cluster automatic management, etc.

Comments on the memory cloud

There are many architectures based on memory, Highscability's article once mentioned Cisco's UCS, Oracle's coherence, IBM's WebSphere EXtreme Scale, Terracota, gigaspace, etc. It may also be possible to add a lot of memory calculations to SAP recently. But there are plenty of skeptics about whether radical projects like Ramcloud, which completely downgrade the hard disk to backup devices, are actually being applied or even become mainstream.

The Hacker news discussion was that it was sheer flicker. Infoq collected two dissenting voices in an article in January this year. Among them, Jeff Darcy, who is responsible for cloud file system development in Red Hat, wrote a blog criticizing IMDG, who believes that the so-called memory based architecture is one thing with the nature of the existing cache system, which is bound to be limited in capacity, and still relies on hard drives, And the use of the way after the recovery, then why not adopt the practice has been tested, the algorithm is very mature caching system? Murat Demirbas, an associate professor at New York University Buffalo, said the cost analysis in the paper was too optimistic. Because the price of the hard drive is falling faster, the cost of memory is still very obvious in the long run--when the hard drive costs only 0.07 dollars per gigabyte, who will use 60 dollars g memory?

Amazon's James Hamilton2009 the second time at the end of the year to listen to Ousterhout's related lectures, made detailed notes in the blog, praised the lecture "thought-provoking", it is worth listening to again. But he who believes in "no universal" also disagrees with Ousterhout's prediction that relational databases will disappear and that future data will all be in memory.

Greg Linden, a senior architect who once worked for Amazon and Microsoft, commented that the concept of a memory database was not new, but previously he thought it was only suitable for special occasions and did not expect Ousterhout to predict the future of the memory cloud as the mainstream.

In fact, Ramcloud's original version of the paper was published to this month, more than a year ago, Ousterhout's main point of view has not changed, on the contrary, Ramcloud project progress quickly. It is worth noting that the reviewers of this CACM paper are the main developers of Google MapReduce and other projects, Jeff Dean, the founder of Microsoft Research's Senior Institute of James Larus,veritas, Jeff Rothschild, Computer architecture Authority David Patterson etc. You can also see the design review of the project by Jeff Dean and others on the Ramcloud project website.

In retrospect, the power system in the large network, the original factories and residential areas of their own power plants, their own small generators have gradually disappeared, the whole society in addition to emergency standby power, the basic use of large power grid power. The grid company has thus developed a series of highly specialized large-scale power generation and transmission technologies that had previously been unthinkable. We predict that, with cloud computing in the mainstream, computing, networking, and storage are provided by a handful of cloud services providers and will develop a range of highly specialized technologies that are very different today. Today's special technology for big internet companies like Google and Facebook can be a clue to this trend. So, everything is possible, not in the previous linear view, so is the memory cloud.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More