As one of the most important components of cloud computing in recent years, cloud storage undertakes the task of collecting, storing and processing data in service form.
Enterprise users and individuals host data to a third party and perform on-demand access to data through public, private, or mixed cloud forms. The advantages of cloud storage are: On-demand, on-demand, without additional hardware facilities or with special personnel responsible for maintenance, reduce management difficulties, data replication, backup, server capacity expansion and other work to the third party implementation, rapid deployment configuration, at any time to expand the increase and decrease, more flexible and controllable.
In terms of business prospects, the Web2.0 of "user-created content" and "sharing" drives the perception of online services by users across the Web. With the improvement of relevant background technology, cloud storage has matured in technology.
Key Technologies for cloud storage
With the diversification of business demands, such as voice, data and image, the network construction has been developing toward the broadband trend. Faster speed, can carry more types of business, transmission quality more and more high.
Only if the broadband network is sufficiently developed, the users will be able to obtain large enough data transmission bandwidth, realize the transmission of large amount of capacity, and really enjoy the cloud storage service which is superior to the local storage.
The key technologies of cloud storage, including CDN, Web2.0, Data coding technology, virtualization storage technology and so on, have become the intrinsic demand and key power of cloud computing and cloud storage development.
A CDN is a content distribution network. The basic idea is to avoid as far as possible the internet is likely to affect the speed and stability of data transmission bottlenecks and links, so that content transmission faster and more stable.
WEB2.0 users are both Web site content visitors, but also the site content manufacturers. Users use the Internet at the same time provide a cloud access mode, but also for cloud computing to cultivate user habits. Users are more accustomed to storing their own data on the network for sharing.
Cloud storage is not just storage, it is more applications. The development of application storage technology can greatly reduce the number of servers in cloud storage, thus reducing the cost of system construction. It can also reduce the system caused by the server single point of failure and performance bottlenecks, in reducing data transmission links, improve system performance and efficiency and ensure the efficient and stable operation of the whole system plays an important role.
The cloud storage System is a collection of multiple storage devices, multiple applications, and multiple services, and any single point of storage system is not cloud storage. Multiple storage devices work together to provide the same service externally and provide greater, stronger, and better data access performance. Without distributed technology, cloud storage can only be a stand-alone system, not to mention cloud.
For the operation unit of cloud storage, it is necessary to solve the problem of difficulty in centralized management, difficult condition monitoring, difficult maintenance and high human cost through practical and effective means. As a result, cloud storage must have an efficient, similar centralized management platform like network management software, which enables centralized management and state monitoring of storage devices, servers, and network devices in a cloud storage system, based on storage virtualization technology.
Typical architecture for cloud storage
Google GFs File system, a scalable distributed file system for large-scale data-intensive applications, runs on inexpensive universal hardware devices, delivers disaster redundancy, delivers high-performance services to a large number of clients, and is a typical architecture for cloud storage based on distributed technology.
A GFS cluster contains a single master node, multiple chunk servers, and is accessed by multiple clients at the same time. All of these machines are usually ordinary Linux machines that run user-level service processes.
GFS-stored files are split into fixed-size chunk. When chunk is created, the master server assigns an invariant, globally unique 64-bit chunk identity to each chunk. The chunk server saves chunk as a Linux file on the local hard disk and reads and writes block data based on the specified chunk identity and byte range. For reliability reasons, each block will be replicated to multiple block servers.
The master node manages all of the file system metadata, while also managing system-wide activities. The master node uses heartbeat information to communicate periodically with each chunk server, sending instructions to each chunk server and receiving status information for the chunk server.
The GFS client code is linked to the client program in the form of a library. The client code implements the API interface functions of the GFS file system, communicates with the master node and the chunk server, and reads and writes data. Communication between the client and the master node only obtains metadata, and all data operations are directly interacting with the chunk server by the client.