Distributed file system of Big Data storage (I.)

Source: Internet
Author: User

1.Google file System (GFS)
use a bunch of cheap commercial computers to support large-scale data processing.

gfsclient: An application's access interface

(master server): Management Section dot , There is only one logic on the (there is also a" shadow server "that provides metadata when the master server fails, but not the full hot standby server) , save meta data , responsible for Management of the entire file system.

Chunk Server(database server): Responsible for the specific storage work, the data is stored as a file on the Chunk server, the corresponding GFS client read and write requests.


Overall architecture:


The process of reading data:

Application developer submits a read data request: Reads the file, starts at a location p, reads the data of size L.

After the GFS system receives this request, the conversion is done internally, because the chunk size is fixed, so from the position p and the size L can deduce that the data to be read is located in the first chunk of file, that is, the request is converted to the form of < File,chunk serial number >.

The GFS system then sends this request to the master server, because the master server keeps some management information, and the master server can know which chunk server to read the data on, and translates the chunk serial number into the unique chunk number in the system, sending the two messages back to the GFS client.

GFS establishes a connection with the corresponding chunk server, sends the chunk number and read range to be read, and chunk the server to send the request data to the GFS client after receiving the request.


The advantages and disadvantages of using this master-slave structure:

Benefits: Relatively simple management

Disadvantage: Many service requests go through the master server, which is easy to become the bottleneck of the whole system, there may be a single point of failure.


PS: To do data redundancy, each chunk multiple backups on a different chunk server.


The process of writing data: The GFS system must apply this write to all chunk backups, and for ease of administration, GFS selects a primary backup from multiple chunk that are backed up, and the other as secondary backups, where the primary backup determines the order in which the data is written for the secondary backup.
The GFS client first communicates with the master server and learns which chunk servers store the chunk to be written, including the address data for the primary and other secondary backups, and then the GFS client pushes the data to be written to all the backup chunk. The backup chunk first places the data to be written in the cache, then notifies the GFS client whether it is successful, and if all the backups are successful, the GFS client notifies the primary backup to perform the write operation, and the primary backup writes its own cached data to chunk, notifying the secondary backup to write the data in the specified order , the primary backup notifies the GFS client that the write operation completed successfully after the secondary backup has finished writing the primary backup. If the data to be written spans chunk or requires multiple chunk to fit, the client automatically decomposes it into multiple writes.

Colossus Google Next Generation GFs Distributed File system, several improvements are as follows:The single master server is transformed into a multi-master server cluster, and all the management data is fragmented and distributed to different master servers. Multiple backups of chunk data although increased system availability, but more storage costs, a common compromise is the use of Erasure code algorithm. The Colossus client can specify the data storage location as required.
Ps:

The basic principle of the Erasure code algorithm is as follows:

Given n data blocks D1, D2,..., dn,n and a positive integer m, RS generates m check blocks based on n data blocks, C1, c2,..., cm. for any N and M, The original data can be decoded from any n blocks of n raw data blocks and M-check blocks, that is, the RS tolerates a maximum of M blocks or the checksum is lost at the same time (erasure code can only tolerate data loss,  Can not tolerate data tampering, erasure code is the name of this.


For more details on erasure codes, refer to: http://blog.sina.com.cn/s/blog_3fe961ae0102vpxu.html

About data reliability: coding redundancy of cold data and redundancy of thermal data. (the former reduces storage costs and the latter reduces computational costs, which should be well understood)



2.HDFS
is a large-scale distributed file system in Hadoop that is roughly the same as GFS throughout the architecture, simplifying, for example, allowing only one client to append to a file at a time.

It has the following characteristics:

1) suitable for storing very large files

2) suitable for streaming data reading, which is suitable for "write once, read multiple" Data processing mode

3) Suitable for deployment on inexpensive machines

However, HDFs is not suitable for the following scenarios (everything must be viewed on both sides, only the technology that suits your business is really good Technology):

1) Not suitable for storing large amounts of small files due to namenode memory size limit

2) Not suitable for real-time data reading, high throughput and real-time is inconsistent, HDFs chooses the former

3) Not suitable for scenarios that require frequent data changes



Overall architecture

Consists of Namenode,datanode,secondary Namenode and the client.
NameNode(responsible for managing the entire Distributed file system metadata, including the file directory tree structure, file-to-block mapping, block replicas and their storage location, and other management data.) This data is stored in memory. Also responsible for Datanode status monitoring, through the heartbeat to pass management information and data information.
Secondary NameNode

The responsibility is not to namenode the hot spare, but to periodically pull the Fsimage (memory namespace metadata in the external image file) and Editlog file (write-ahead-log file for various metadata operations) from Namenode. Record the operation before the memory data changes to this file to prevent data loss) and merge the two files, form a new Fsimage file and return to the Namenode, in order to alleviate namenode work pressure.
DataNode
The chunk server, similar to GFS, is responsible for the actual storage and reading and writing of data blocks.
Client
Similar to the GFS client, the HDFs client and the Namenode contact to obtain the metadata of the required read/write file, and the actual data read and write is done with the Datanode direct communication.
The HA Scheme master server consists of both active namenode and standby Namenode, which is the server that is currently responding to client requests, and the SNN is a cold or hot backup machine. In order for the SNN to become a hot backup machine, all metadata for the SNN needs to be consistent with the Ann Meta data. This requirement is ensured by the following two points: 1. Use third-party shared storage to save namespace metadata such as catalog files. The essence is to transform the single-point failure problem of NN into a single-point failure problem of third-party storage, but the third-party storage has strong redundancy and fault-tolerant mechanism, so the reliability is stronger. 2. All Datanode simultaneously sends heartbeat information to Ann and SNN.
Join a failover controller that is independent of NN and elect SNN as the primary control server in the case of Ann failure. When the Hadoop system was just started, both were snn and elected to make an Ann. This is to include isolation measures to prevent brain fissures (i.e., multiple active host servers at the same time): 1) Only one NN at a time can write to third-party shared storage
2) Only one nn issue delete command related to managing the copy of the data 3) at the same moment there is an NN capable of issuing the correct corresponding to the client request
Solution:
QJM: Using the Paxos protocol, the editlog of the nn is stored in the 2f+1ge journalnode, and each write operation is considered successful if there is a successful return of the F server. Can tolerate a simultaneous failure of F-Journal node.
Namenode Alliance core idea: to cut a large namespace into several sub-namespaces, each sub-namespace is managed by a separate nn, the nn is independent, and all Datanode are shared. A mapping relationship is established between the Datanode and the sub-namespaces by the Block management layer, which consists of a number of block pools of data blocks. Each data block is unique to a fixed block pool, and a child namespace can correspond to multiple block pools.
Ps:hdfs look at the foggy, later study Hadoop time to review it again.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Distributed file system of Big Data storage (I.)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.