Disk is tape, Flash is Disk

Last Update:2018-12-06 Source: Internet

Author: User

Tags ruby on rails

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Today, I met a figure from Google listing the time loss of common operations. For details, see the big picture in high definition:

This figure reminds me of the words Jim Gray has been circulating for a long time when I used redis :"Tape is dead, disk is tape, Flash is disk, Ram locality is king". (Tape is dead, disk is new tape, Flash is new disk, random memory is local is king.) the original source of this sentence is here: http://www.signallake.com/innovation/Flash_is_Good.pdf

Here is a Chinese explanation: http://www.infoq.com/cn/news/2008/07/ram-is-disk

Jim Gray has contributed a lot to technological development over the past 40 years. "memory is a new hard disk, hard disk is a new tape" is his famous saying. "Real-time" web applications are emerging, and more systems are reaching a massive scale. What is the impact of this model of development on software and hardware?

Tim Bray has discussed the advantages of a ram-and network-centric hardware architecture before grid computing became a hot topic. It can be used to create a ram cluster that is faster than a disk cluster.

For random data access, the memory speed is several orders of magnitude higher than that of the hard disk (even the highest-end disk storage system barely reaches 1,000 seek/second ). Second, as the network speed of the data center increases, the cost of memory access is further reduced. Accessing the memory of another machine over the network is cheaper than accessing the disk. As I wrote this article, sun's InfiniBand product line has nine fully-interconnected, non-blocking port switches. Each port can reach 30 Gbit/sec! The Voltaire product has even more ports. (If you want to learn about the latest developments in this type of ultra-high-performance networks, follow Andreas Bechtolsheim's courses in Standford .)

Tim also pointed out the truth explained in the last half of Jim Gray's famous saying: "random access, hard disks are unbearable; but if you use hard disks as tapes, it throws the speed of continuous data throughput, and is designed for logging and journaling for Ram-dominated applications )."

Today, just a few years later, we found that the development trend of hardware is not declining in the ram and network sectors, while in the hard drive sector, we are stuck. Bill McColl mentioned the emergence of a massive memory system for Parallel Computing:

Memory is new hard disk! The speed of hard disks increases slowly, and the memory chip capacity index increases. The in-memory software architecture is expected to improve the performance of various data-intensive applications by an order of magnitude. Minicomputer servers (1u and 2u) will soon have T-byte or even more memory, which will change the balance between memory and hard disk in the server architecture. The hard disk will become a new tape, which is used as a sequential storage medium like a tape (sequential access to the hard disk is quite fast) instead of a random storage medium (very slow ). There are a lot of opportunities, and the performance of new products is expected to increase by 10 times and 100 times.

Dare obsanjo points out what kind of bad consequences will be caused if this sentence is not taken seriously-that is, the troubles Twitter is facing. Speaking of Twitter's content management, obsanjo said, "If a design simply reflects the Problem description, your implementation will fall into the hard disk I/O hell. Whether you use Ruby on Rails, COBOL on cogs, C ++, or handwritten assembly, the read/write load will still kill you ." In other words, random operations should be pushed to ram, leaving only sequential operations on the hard disk.

Below are two pieces of information from slideshare (do not show multiple clicks ):
Redis -- memory as the new disk
View more presentations from Tim lossen
What every data programmer needs to know about Disks
View More PowerPoint from iammutex

In-depth understanding of computer systems

Let's take a look at the explanation of High-speed cache in the typical computer textbook deep understanding of computer systems. The book is based on the simplest hello program:

The machine commands of the hello program are initially stored on the disk. When the program is loaded, they are copied to the master memory. When the processor runs the program, the commands are copied from the master memory to the processor. similarly, the data string "Hello, world \ n" is initially stored on the disk, copied to the primary storage, and finally copied to the display device. according to the mechanical principle, large storage devices require relatively small storage devices to run slowly, and the cost of fast devices is much higher than that of similar low-speed devices. for example, a disk drive on a typical system may be 1000 times larger than the primary storage, but for processors, the time overhead of reading a word from a disk drive is 10 million times higher than that of reading a word from the primary storage. in view of the difference between the processor and the primary storage, the system designer uses a smaller and faster storage device, that is, high-speed cache storage (referred to as high-speed cache), as a temporary assembly area, it is used to store information that may be needed by the processor in the near future.

The idea of inserting a smaller and faster storage device (such as high-speed cache) between a processor and a large and slow device (such as the primary storage) has become a general concept. in fact, the storage devices in each computer system are organized into a storage hierarchy, as shown in. in this hierarchy, devices become slower and slower to access from top to bottom, with larger capacity and cheaper cost per byte. the register file is located at the top of the hierarchy, that is, level 0th or mark as l0. here we show three layers of High-speed cache L1 to L3, occupying layer 1st to layer 3rd of the memory hierarchy. the primary memory has 4th layers. Similarly, the primary idea of memory hierarchies is that the memory on one layer serves as the high-speed cache of the lower layer of memory. therefore, the register file is the L1 high-speed cache, L1 is the L2 high-speed cache, L2 is the L3 high-speed cache, L3 is the primary storage high-speed cache, and the primary storage is the disk high-speed cache. in some network systems with distributed file systems, local disks are high-speed caches of data stored on magnetic disks in other systems.

Good night!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More