Application practice of cloud storage based on Hadoop platform

Source: Internet
Author: User
Keywords Cloud storage practices storage devices bandwidth

Cloud computing (Cloud Computing) is an internet-based Super computing model in which thousands of computers and servers are connected to a cloud of computers in remote data centers. The user uses the computer, the notebook, the handset and so on the way to pick up the data center, according to own demand carries on the computation. There is still no universally agreed definition of cloud computing. Combined with the above definition, we can sum up some essential features of cloud computing, that is, distributed computing and storage characteristics, high scalability, user-friendly, good management.

1 Cloud storage Architecture diagram

Orange as a storage node (Storage node) is responsible for storing files, blue as the control node is responsible for the file index, and is responsible for monitoring the storage nodes between the capacity and load balance, the two parts together to form a cloud storage. Storage node and control node are simple servers, only storage node of the hard disk, storage node server does not need to have raid function, as long as can install Linux, control node in order to protect the data, need to have a simple RAID level O1 function.

Cloud storage is not meant to replace existing disk arrays, but rather to cope with the new forms of storage systems that are generated by high-speed data volumes and bandwidth, so cloud storage is typically designed with the following three points in mind:

(1) Capacity, bandwidth expansion is simple

Expansion is not downtime, will automatically bring new storage node capacity into the original storage pool. There is no need for a complicated setting.

Figure 1 Cloud storage architecture diagram

(2) The linear growth of bandwidth

Many customers using cloud storage are considering future bandwidth growth, therefore, the quality of the cloud storage product design will produce a great difference, some more than 10 nodes will reach saturation, so that the future expansion of bandwidth will have adverse effects, this must be clear beforehand, or wait until the discovery does not meet the demand, has bought hundreds of TB, It's too late to regret it.

(3) management is easy.

2 Key technologies for cloud storage

Cloud storage must have nine main elements: ① performance, ② Security, ③ automatic ILM storage, ④ storage access mode, ⑤ availability, ⑥ master data protection, ⑦ secondary data protection, ⑧ storage flexibility, ⑨ store report.

The development of cloud computing is inseparable from the development of virtualization, parallel computing, distributed computing and other core technologies. This is described below:

(1) Cluster technology, grid technology and Distributed File system

The cloud storage System is a collection of multiple storage devices, multiple applications, and multiple services, and any single point of storage system is not cloud storage.

Since it is composed of multiple storage devices, among different storage devices, it is necessary to realize the cooperative work among multiple storage devices through cluster technology, distributed File system and grid computing, so that multiple storage devices can provide the same service and provide greater and stronger data access performance. Without the existence of these technologies, cloud storage could not be truly implemented, and the so-called cloud storage could only be a single system, not a cloud-like structure.

Article CDN Content Distribution, Peer-to-peer technology, data compression technology, duplicate data deletion technology, data encryption technology

CDN Content distribution System, data encryption technology to ensure that the data in the cloud storage will not be accessed by unauthorized users, at the same time, through a variety of data backup and disaster-tolerant technology to ensure that the data in the cloud storage is not lost, to ensure the security and stability of the cloud storage. If data security in cloud storage is not guaranteed, no one dares to store it in the cloud.

(3) Storage virtualization technology, storage networked management technology

The number of storage devices in cloud storage is large and distributed in many different regions, how to implement logical volume management, storage virtualization management, and Multilink redundancy management between multiple devices of different vendors, different models, and even different types, such as FC storage and IP storage, is a huge challenge, and the problem is unresolved Storage devices can be the performance bottleneck of the entire cloud storage system, the structure can not form a whole, but also bring late capacity and performance expansion difficult issues.

3 Deploying Hadoop

Historically, data analysis software in the face of today's massive data has become powerless, this situation is quietly changing. The new massive data analysis engine has emerged. Apache Hadoop, for example, has proven that Hadoop is the best at data processing and one of the open source platforms.

The cloud storage Center is a data node (datanodes) that makes up Hadoop for a large number of servers, which is responsible for saving the contents of files, realizing distributed storage of files, load balancing and fault-tolerant control of files.

Use Hadoop as an experimental platform, step-by-step to demonstrate how to deploy a three-node cluster and test the power of mapre-dace distributed processing, saving two files in a Hadoop Distributed File System (HDFS). Using MapReduce to calculate the number of names appearing in two namelist files, the program architecture design is shown in Figure 2.

Figure 2 3-node Hadoop cluster

The distribution of Namenode master and Datanode from nodes is as follows:

Table 1

(1) Start Hadoop cluster

You only need to execute the start-all.sh command on the Namenode master node, and the master node can log on to each of the lave nodes via SSH to initiate other related processes.

(2) MAPRUDCE test

When the Namenode and Datanode two nodes were working properly, when the Hadoop deployment was successful, we prepared two list files on the Namenode master node. The list of documents reads as follows:

4 Operation experiment and result

5 Conclusion

The result was the same as we had expected, so that the HDFs file was stored on a Hadoop platform, and the number of data in the file was counted and then displayed.

Guess you like:

1.hadoop pseudo-Distributed installation method

2.Hadoop what is the internal working mechanism when reading and writing files?

Some basic operations in 3.Hadoop

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.