Distributed Systems Notes-nfs, AFS, gfs__distributed

Source: Internet
Author: User

CMU 95702 notes on NFS, AFS, GFS.
NFS (Network File System)

Objective: Your files are available from any machine. Distribute the files and we won't have to implement new protocols.

Features: Defines a virtual C/s file system stateless Uses RPC over TCP or UDP.

The essence of NFS is the sharing of computers among users. The user accesses the network through an NFS client and can access the hard disk of other computer systems on the same network (the computer is an NFS server). NFS clients can mount parts or all of the remote file system to local access, just as they would access a file system on a local disk.

NFS accesses data at a speed close to the speed of a local disk, and the performance of the NFS client depends directly on the performance and network performance of the server. Such as: The maximum throughput of the network server hardware performance: network card, disk and other server-side cache size, TCP/IP Configuration server-side service instance number of network files client request system performance
Other processes running on the client or server to compete with NFS resources

The NFS client translates the user-level command into the RPC;NFS server to convert RPC to user-level commands.
The main disadvantage of NFS: the location of the file server is not transparent to the client, that is, the client needs to know the exact address of the service (Mount point), which also led to poor scalability, maintenance difficulties, the advantages of development for many years, the Linux kernel direct support, easy to use. NFS Architecture

NFS Server Operations

-> the directory and file operations are integrated into a single service. NFS Client

AFS (Andrew File System)

Objective: Scalability

Features: Modified from Coulouris Cache
Whole files are cached in client nodes to reduce client server interactions-> the achieve.
A client cache would typically hold several hundreds of files most recently used on that computer.
Permanent cache, surviving reboots. Consider UNIX commands and libraries copied to the client. Consider files only used by a single user.
These last two cases represent the vast majority of cases. Gain:your files are available from any workstation. Principle:make the common case fast.

Open File: When the client tries to open a file
Client cache is tried-I
If not there, the-a server is located and the ' server is ' called for the file. The copy is stored on the client side and is opened. Subsequent reads and writes hit the copy on the client.

Close file: When the client closes the file-if the files has changed it are sent back to the server. The client side copy is retained for possible.

The AFS (Andrew file system) file system is primarily used to manage the files of divisions on different network nodes. AFS uses secure authentication and flexible access control to provide a distributed file and authorization service that can be extended to multiple clients.

Unlike NFS, AFS provides users with a completely transparent, ever-unique logical path. Therefore, it has the characteristics of cross-platform and distributed. However, because AFS uses the local file system to cache the most recently accessed block of files, accessing a local AFS file can be a lot slower than accessing the local file directly because of the need to append some time-consuming operations. AFS is optimized for read operations, the write operation is very complex, is a slow read and write file system, can not provide a good read and write concurrency. AFS Architecture

implementation of file system calls in AFS

File name space seen by clients of AFS

System call interception in AFS

The main components of the Vice service interface

CMU ' s Coda is a enhanced descendant of AFS
Very briefly, two important features are:
Disconnected operation for mobile computing.
Continued operation during partial network failures in server network.
During normal operation, a user reads and writes to the file system normally, while the client fetches, or "hoards", all O f The data the user has
Listed as important in the event of network disconnection.
If The network connection is lost, the Coda client's local cache serves data from this cache and logs all updates.
Upon Network reconnection, the client moves to reintegration state; It sends logged updates to the servers. From Wikipedia GFS (Google File System)

Objective: Scalability

Features: Reliably with component failures. Massively large files
Solve problems that Google needs solved–not a massive number of files but massively the files large are. Write once, append, read many times. Streaming and no cache
Access is dominated by long sequential streaming reads and sequential appends. No need for caching on the client. Throughput more important than latency. Each of the file is mapped to a set of fixed size chunks (64mb/chunk). 3 Replicas
Each chunk was replicated on three different chunk servers. Master and chunk servers
Each cluster has a single master and multiple (usually hundreds) of chunk servers.
The master knows the locations of chunk replicas.
The chunk servers know what replicas they have and are by the master on startup.

Very large files each holding a very large number of HTML documents scanned from the Web. These need read and analyzed.
This isn't your everyday use of a Distributed file system (NFS and AFS). Not POSIX. Google Physical Infrastructure


Operations Read

Suppose a client wants to perform a sequential read, processing a very large file from a particular byte offset. The client can compute the chunk index from the byte offset. Client calls master with file name and chunk index. Master returns chunk identifier and the locations of replicas. Client makes call on a chunk server for the chunk and it are processed sequentially with no caching. It may ask for and receive several chunks. Mutation

Suppose a client wants to perform sequential writes to the end of a file. The client can compute the chunk index from the byte offset. This is the chunk holding end of File. Client calls master with file name and chunk index. Master returns chunk identifier and the locations of replicas. One is designated as the primary. The client sends all data to all replicas. The primary coordinates with replicas to update files
Consistently across replicas.

Original address: http://www.shuang0420.com/2016/12/10/Distributed%20Systems%E7%AC%94%E8%AE%B0%EF%BC%8DNFS%E3%80%81AFS%E3%80%81GFS/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.