Analysis of the evolution and development of network file systems

Last Update:2013-12-24 Source: Internet

Author: User

Tags rfc virtual environment

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Network File System (NFS) is a network abstraction on a file system. It allows remote clients to access the network in a similar way as the local file system. Although NFS is not the first such system, it has evolved into a UNIX®The most powerful and widely used Network File System in the system. NFS allows multiple users to share a public file system and provides the advantages of a dataset to minimize the storage space required.

NFS: as useful and evolving as ever

Network File System (NFS) has evolved since its launch in 1984 and has become the basis of distributed file systems. Currently, NFS (expanded through pNFS) provides scalable access to distributed files through the network. Explore the ideas behind distributed file systems, especially the recent development of NFS files.

Brief NFS history

The first network File system, called File Access Listener, was developed by Digital Equipment Corporation (DEC) in 1976. Implementation of Data Access Protocol (DAP), which is part of the DECnet Protocol set. For example, for TCP/IP, DEC has published protocol specifications for its network protocol, including DAP.

NFS is the first modern Network File System (built on the IP protocol ). In 1980s, Sun Microsystems was first developed internally as an experimental file system. The NFS protocol has been archived as the Request for Comments (RFC) standard and evolved into nfsv2. As a standard, NFS has developed rapidly due to its ability to interoperate with other clients and servers.

The standard continues to evolve to NFSv3, which is defined in RFC 1813. This new protocol has better scalability than earlier versions, supports large files (more than 2 GB), asynchronous writing, and uses TCP as the transmission protocol, it paves the way for file systems to be used in a wider network. On April 9, 2000, RFC 3010 (revised by RFC 3530) brought NFS into the enterprise settings. Sun introduced NFSv4 with high security and status protocol (earlier versions of NFS are stateless ). Today, NFS is version 4.1 (defined by RFC 5661). It supports parallel access across distributed servers (called pNFS extension ).

The NFS schedule, including the specific RFC that records its features, is shown in figure 1.

Figure 1. NFS Protocol schedule

Surprisingly, NFS has been developed for almost 30 years. It represents a very stable (and portable) Network File System that is scalable, high-performance, and enterprise-level quality. As network speed increases and latency decreases, NFS has always been an attractive option to provide file system services over the network. Even in the local network settings, the virtualization-driven storage enters the network to support more mobile virtual machines. NFS even supports the latest computing models to optimize virtual infrastructure.

NFS Architecture

The customer-server model that NFS allows computing (see figure 2 ). The server implements the shared file system and the storage connected to the client. The client Implements user interfaces to share the file system and load it to the local file space.

Figure 2. NFS client-server architecture

In Linux®In, virtual file system switching (VFS) provides a way to support multiple concurrent file systems on one host (such as International Organization for Standardization [ISO] 9660 on a CD-ROM, and ext3fs on the local hard disk ). VFS determines which storage is needed and which file systems are used to meet the requirements. For this reason, NFS is a pluggable file system similar to other file systems. For NFS, the only difference is that the input/output (I/O) requirements cannot be met locally, but must be completed across networks.

Once a requirement specified for NFS is discovered, VFS will pass it to the NFS instance in the kernel. NFS interprets I/O requests and translates them into NFS programs (OPEN, ACCESS, CREATE, READ, CLOSE, REMOVE, and so on ). These programs are archived in a specific nfs rfc and specify the behavior in the NFS protocol. Once a program is selected from an I/O request, it is executed in the remote program Call (RPC) layer. As its name implies, RPC provides methods for executing program calls between systems. It will mail NFS requests with parameters. Management sends them to the appropriate remote peer level, manages and tracks the response, and provides it to the appropriate requestor.

Furthermore, RPC includes an important interoperability layer called external data representation (XDR), which ensures that all NFS participants use the same language when it comes to data types. When a request is executed for a given architecture, the data type may be different from the data type on the target host that meets the requirements. XDR is responsible for converting the type to a public representation (XDR) to facilitate interoperability between all architectures and the shared file system. XDR specifies the type byte format (such as float) and type byte sorting (such as fixed Variable Length array ). Although XDR is well known for its use in NFS, it is a useful specification when you process Multiple Architectures in public application settings.

Once XDR converts data to a public representation, the transport layer protocol is required through network transmission. In earlier versions, NFS adopted the Universal datateprotocol (UDP). However, TCP is more common today because of its superior reliability.

On the server side, NFS runs in a similar style. The requirement arrives at the network protocol stack, and then reaches the NFS server through RPC/XDR (converting the data type to the server architecture. The NFS server is responsible for meeting the requirements. The requirement is submitted to the NFS Daemon. It indicates the target file system tree for the requirement, and VFS is used again to obtain the file system in local storage. The entire process is shown in figure 3. Note that the local file system on the server is a typical Linux File System (such as ext4fs ). Therefore, NFS is not a file system in the traditional sense, but a protocol for accessing a remote file system.

Figure 3. client and server NFS Stack

For high-latency networks, the NFSv4 implementation is called the compound procedure program. This program essentially allows embedding multiple RPC calls in a single request to minimize the transfer tax of network requests. It also implements the callback mode for the response.

NFS Protocol

From the client perspective, the first operation in NFS is called mount. Mount means to load the Remote File System to the local file system space. This process starts with a call to mount (called by the Linux system). It is routed to the NFS component through VFS. After confirming that the port number is loaded (call the remote server RPC through the get_port request), the client executes the RPC mount request. This request occurs between the client and a specific daemon responsible For the mount protocol (rpc. mountd. This daemon checks client requests based on the file system currently exported by the server. If the requested file system exists and the client has accessed it, an RPC mount response creates a file handle for the file system. The client stores remote loading information of local load points and establishes the ability to execute I/O requests. This Protocol represents a potential security issue; therefore, NFSv4 uses an internal RPC call to replace this auxiliary mount protocol to manage load points.

To read an object, the object must be opened first. There is no OPEN program in RPC; otherwise, the client only checks whether the Directory and file exist in the mounted file system. The client starts with the getattr rpc request to the directory. The result is a response with a directory attribute or no indication of the directory. Next, the client sends a lookup rpc request to check whether the requested file exists. If yes, a getattr rpc request will be sent to the requested file, which is the object's return attribute. Based on the above successful GETATTRs and LOOKUPs, the client creates a file handle to meet your future needs.

The client can trigger the read rpc request by specifying a file in the remote file system. READ contains the file handle, status, offset, and READ count. The client uses the status to determine whether the operation is executable (that is, whether the file is locked ). Offset indicates whether to start reading, and count indicates the number of bytes read. The server may return or do not return the number of bytes requested, but it indicates the number of bytes returned (along with data) in the read rpc reply.

Innovations in NFS

The two latest versions of NFS (4 and 4.1) are the most interesting and important for NFS. Let's take a look at some of the most important aspects of NFS innovation.

Before NFSv4, a certain number of auxiliary protocols were available for loading, locking, and other elements in file management. NFSv4 simplifies this process into a protocol and removes UDP support as a transport protocol. NFSv4 also supports integration of UNIX and Windows-based®File Access semantics to extend local NFS integration to other operating systems.

NFSv4.1 introduces the concept of parallel NFS (pNFS) for higher scalability and higher performance. To support higher scalability, NFSv4.1 has a script that is similar to the clustered file system's split data/metadata architecture. As shown in figure 4, pNFS splits the ecosystem into three parts: client, server, and storage. You can see that there are two paths: one for data and the other for control. PNFS splits the Data Layout and data itself, allowing a dual-path architecture. When a customer wants to access a file, the server layout the response. The layout describes the ing between files and storage devices. When the client has a layout, it can directly access the storage without having to use the server (which achieves greater flexibility and better performance ). When the client completes file operations, it submits data (changes) and layout. If necessary, the server can request to return the layout from the client.

PNFS implements multiple new protocol operations to support this line. LayoutGet and LayoutReturn obtain the publishing and layout from the server, and LayoutCommit submits data from the client to the repository for other users to use. The server uses the LayoutRecall callback layout from the client. The layout is expanded across multiple storage devices to support parallel access and higher performance.

Figure 4. NFSv4.1 pNFS Architecture

Both data and metadata are stored in the storage area. The client may execute direct I/O to give the layout receipt, while the NFSv4.1 server processes metadata management and storage. Although this line is not necessarily new, the pNFS feature is added to support multiple access methods for storage. Currently, pNFS supports block-based protocols (Fiber Channel), object-based protocols, and NFS itself (or even in a non-pNFS format ).

The request for NFSv2 was released on April 9, September 2010 to continue the NFS work. The changes in storage in the virtual environment are identified with new improvements. For example, data replication is very similar to that in Virtual Machine environments (Many operating systems read/write and cache the same data ). For this reason, it is advisable for the storage system to fully understand where replication occurs. This will retain the cache space on the client side and save the capacity on the storage side. We recommend that you use shared blocks to solve this problem in NFSv4.2. Because the storage system has begun to integrate processing functions at the backend, server-side replication is introduced. When the server can efficiently resolve data replication at the storage backend, to reduce the load on the internal storage network. Other innovations have emerged, including the sub-File Cache for flash storage and client prompts for I/O (potentially using mapadvise as the path ).

Alternative to NFS

Although NFS is the most popular Network File System in UNIX and Linux systems, it is certainly not the only choice. In Windows®In the system, ServerMessage Block [SMB] (also known as CIFS) is the most widely used option (like Linux supports SMB, Windows also supports NFS ).

One of the latest distributed file systems, also supported in Linux, is Ceph. Ceph is designed as a fault-tolerant Distributed File System with UNIX-compatible Portable Operating System Interface (POSIX ). For more information about Ceph, see references.

Other examples include OpenAFS, an open-source version of Andrew Distributed File System (from Carnegie Mellon and IBM), GlusterFS, a general distributed file system focusing on scalable storage, and Lustre, focus on large-scale parallel distributed file systems for cluster computing. All are open source software solutions for distributed storage.

Further steps

NFS continues to evolve and is similar to the evolution of Linux (supporting low-end, embedded, and high-end performance). NFS implements scalable storage solutions for customers and enterprises. The future of NFS may be interesting, but according to history and recent situations, it will change the way people view and use NAs.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More