Netapp Storage Basics-wafl, NVRAM, raid, Snapshot

Source: Internet
Author: User
This chapter is the focus. First, let's talk about basic concepts. The full name of wafl is write anywhere file layout. It is similar to other UNIX file systems.
For example, Berkeley Fast File System (FFS) and transarc episode file system.
Its core concept is "Writing files wherever possible ". First, a common file system is composed of inode and data. inode is in a fixed location (usually the initial location) of the hard disk, and the OS/head is
I already know the inode address. inode can be understood as the directory of the file. After the system starts, read the inode file to know which files are contained in the file system and where these files are located.
Address. In wafl, inode is divided into two layers: Root inode and inode. Root inode are fixed locations on the hard disk. Root
Inode is small, only 128 bytes. Root
Inode refers to the inode address. Inode can be stored anywhere. An inode file is generally 4 kb.
In fact, there are many chained inode in the inode layer. When an inode file is not large enough to store all the addresses of a large data file, multiple inode files are required.
For this big data file "service", wafl uses the upper-layer inode to point to the lower-layer inode. For example, when the file size is small to less than 4 kb, the file can be directly stored in inode. Save space. Thinking: in fact, Microsoft's inode only contains the address of the first data block of the file, and then the first data block contains the second data block. Therefore, inode does not need to be so large. You only need an inode of the normal size. Why does wafl not use this method? In this way, inode space can be saved... Is this Microsoft's patent? Haha NVRAM.
In fact, it cannot be called NVRAM, but it should be called ram with a battery. Because NVRAM can save data without power supply. However, Ram requires continuous power supply to save data. Institute
The netapp "NVRAM" is actually a battery-supplied Ram. Netapp "NVRAM" is called NVRAM as follows.
Physically, NVRAM has two structures: the first is to share dimm with Ram, that is, a part of the memory is used as NVRAM, And this part is protected by a battery, become NVRAM; the second is that NVRAM is a card and has no physical connection with Ram, although they all look like memory. the first type is used in low-end netapp products, and the second type is used in high-end netapp products.
Ram stores data caches, while NVRAM stores operation logs. Similar to the database's archive log. Think: NVRAM is protected by a battery, but what about Ram? Data cache is installed in Ram. If it loses power, it will lose data. I have consulted many people and cannot get a clear and detail reply... Raid. netapp only uses raid 4 and raid DP. We know raid 4 is not reused in the industry. Why? Because it cannot support concurrent random I/O writes! Why?
Raid
3. Each slice has a very short length, for example, bits per slice, to improve performance. It is very effective with support for large files. However, random I/O reads and writes cannot be performed because every random I/O reads
Writing should make all hard disks in raid work at the same time. It cannot make one hard disk read/write one Io while the other hard disk read/write another IO. So we have Raid 4. Raid.
4 only improves the Strip length in RAID 3, for example, changing 512bit to 4kb. We want to support random I/O small files (less than 4 kb) read/write. In fact, raid
4. It supports random I/O reads, but it is difficult to write. When writing a small file smaller than 4 kb, You need to rewrite the data on two disks: data disk and verification disk, while raid
4. The other hard disk is used as the verification disk. Therefore, when two small files are to be written, the verification disk can only write one data at a time. In this way, disk verification becomes a bottleneck. This is why
The reason for RAID 5. RAID 5 is used to strip the verification disk to all hard disks, improving the concurrency! So now RAID 3 and RAID 5 are both used, but raid
4. No one loves it.
How does netapp use RAID 4? Raid
4. There is only one scenario where I/O writing can be performed concurrently. That is, when two or more small files to be written are exactly in the same strip! At this time, the same strip on N data disks needs to be rewritten, and the same strip on the verification disk also
To rewrite. Wafl makes this happen through the software layer. Wafl uses programming to make the data to be written within a period of time in the same band as much as possible. Of course, when the data is larger than a band, it is in the adjacent
. Raid 4 is so, wafl cleverly uses it to optimize its performance.
Raid dp (double
In fact, they are similar to raid 6. They are designed to prevent two data disks from dropping data at the same time through two verification disks. But their algorithms are different. Raid
6. The second verification data is generated after the data is multiplied by a factor. The raid DP generates the second verification data by performing an exception or exception between different bands,
That is, one of the two verification disks is the horizontal strip verification, and the other is the oblique strip verification. I will not comment on the details here. I will discuss the raid content in another article. Snapshot.
In general, snapshot is to take a snapshot of the current data at a certain time point, and "record" it. In the future, you can use the data at the time of snapshot creation. When each vendor implements it
The methodology of Hou is consistent, but the methods are different. The methodology copies inode. Because the inode file is very small, its copy can be completed instantly. But in actual implementation
Different. It mainly involves three phases: What should I do if the snapshot is started and the cache is still in the memory? Then the specific copy
What information about inode or what inode is copied?
After a snapshot is completed, how does one protect the data that has been snapshot? That is, how to ensure that the number of snapshots is not changed when data is rewritten after Snapshot
Data?
First, let's see how the system protects data that is still cached in the memory. The data is snapshot, but it has not been written into the disk, and there is no information in the disk inode. So
The system needs to perform the snapshot operation on it. Every vendor has different implementations. I don't know how other vendors do it. Netapp allocates memory data and updates inode from memory.
To the hard disk, update the block graph file, and update the disk cache (note that it is the cache on the hard disk. This should also be protected)
Other vendors may copy inode files as snapshot, while netapp only copies root inode files as snapshot.
Most
Netapp uses the Redirect. Write method to protect the snapshot data (the write.
Copy is inaccurate. Essentially redirect. Write ). That is, after snapshot is completed, if data is to be deleted, the root user will be deleted.
The corresponding information in inode and inode does not delete the real data. If there is data to be rewritten, the original data is not changed, instead, the rewritten data is completely written elsewhere, that is
Inode points to other addresses and does not occupy the original address. Thinking: How can I enable cache in a hard disk? The security is very low. If the power is down, the hard disk cache cannot be protected !! The above three are the basic concepts. I will repost relevant articles for details later. I only understand these technologies. Next, we will consider these technologies as a whole. Because they are closely related. Wafl features:
1. Root inode is in a fixed position on the hard disk, while data and inode can be stored anywhere.
2. wafl places data and inode in adjacent locations. Try to put the data to be written on the same disk of RAID 4.
3. Use a block graph file to indicate the file system in which a block is used. For example, is it used by the current file system or snapshot at a certain time.
4. wafl never covers old blocks
5. wafl not only flush the actual data to the new block, but also writes the inode metadata to the new block instead of overwriting the old block each time it performs check point, this is why snapshot dared to copy only the root inode. Think: in fact, Data. Tap is not raid-aware. It is only responsible for storing data on adjacent blocks. Write process:
Ram stores the data to be written into the hard disk,
It is called data cache. The operation log is stored in NVRAM. Log records "Creating a file whose property is xx" and "deleting XX content in XX File ".
The space of NVRAM is divided into two parts on average, and the two spaces are used in turn. When the value of a space exceeds a certain threshold value or exceeds 10 seconds
Trigger point ),
Flush. Write the inode into the hard disk based on the log in NVRAM and clear the log in NVRAM. Inode is written in. What about data? Actually here
Before CP, data was constantly written into the hard disk! Flush only writes inode! After flush starts, the other half of NVRAM takes over the original space.
The former space takes over. Wait for the next CP to return to another space for further work. (Note: in cluster, vnram is divided into four identical spaces. The two spaces are used by the node, and the other two spaces are used by the peer node)
Wafl sends data and location to the raid Chip
Here, raid chips do not know inode and root inode. Raid only knows data and what data needs to be stored in LBA addresses.
If the underlying layer is discussed, the data and inode on the wafl layer are written to the same disk by raid. Of course, the raid accounting calculation and verification data are written to the verification disk on the same disk. Wafl
The root inode of the layer is a fixed position written into the hard disk (should it be the head ?). The disk header ranges from the data/inode location to the root
Inode location requires a certain time for seeking. This time cannot be saved. The order of writing data is to write data/inode first and then root inode.
Because there are also channels here
Time. Previously I had a question: a common file system has time to seek data from data to inode, while wafl ranges from data/inode to root.
Inode also has a tracing time, which does not seem to reduce the tracing time. After thinking, I can only guess this:
Netapp writes root inode after writing all data/inode in a CP. This saves some time for searching .. This issue needs to be considered later.
For example
In this case, wafl does not have to spend a lot of time performing consistency checks. Because it always writes data first and then root inode.
Wafl updates the consistent point every 10 seconds. This consistent point is actually an internal snapshot, but this snapshot cannot be accessed.
When the power is down, NVRAM can protect logs, that is, Operation Records. The file system can know all the operations before power loss and the latest snapshot. In addition, the logs recorded by NVRAM are excellent.
It records tens of thousands of operations with a small amount of space. You don't need to check the consistency of the file system. You can restore the file system in a short time. However, there is a problem where the data in Ram is not
Protection by law! So far, I do not know how netapp handles Ram. If it does not log on the data in Ram, the data will be lost forever when it loses power. Possible
Netapp only wants to save the consistency check time, but does not care much about data protection? Attached hard disk space diagram:

 

Http://leonce.blog.51cto.com/250767/124679

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.