What is needed to architect a cloud storage system solution for a successful, highly scalable NAS storage System?
The concept of cloud storage began with a service offered by Amazon (S3), along with its cloud computing products (EC2). Behind Amazon's S3 service, it manages a number of commodity hardware devices and bundles the appropriate software to create a storage pool. Emerging web companies have embraced the product and proposed the term cloud storage and its corresponding concepts.
Cloud storage is a schema, not a service. Whether you own or lease this architecture is a secondary issue. Fundamentally, cloud storage can easily extend cloud capacity and performance by adding standard hardware and access to shared standard networks (public Internet or private intranet). It turns out that managing hundreds of servers makes it feel like a single, large storage pool device is a fairly challenging job. Early vendors, such as Amazon, undertook the task and profited by renting online. Other vendors, such as Google, employ a large number of engineers to implement this management within their firewalls and customize storage nodes to run applications on them. Cloud storage has become a highly disruptive technology in the data center as Moore's law has driven down the prices of disk and CPU goods.
The cluster NAS system has improved over the past decade. This article reviews various architectural approaches to building a cloud storage or large-scale scalable NAS system, for enterprise IT managers seeking to build private cloud storage to meet their consumption, or for service providers seeking to build public cloud storage products to provide storage in the form of services, These methods are closely related to them. The architecture approach is divided into two categories: one is the architecture through the service, the other is the architecture through software or hardware devices.
The traditional system utilizes tightly coupled symmetric architectures designed to solve the problem of HPC (High performance computing, super Operations) and is now expanding outwards into cloud storage to meet the fast-presenting market demands. The next-generation architecture has adopted a loosely coupled asymmetric architecture that centralizes metadata and control operations, which are not well-suited for high-performance HPC, but are designed to address the bulk storage requirements of cloud deployments. The summary information for the various schemas is as follows:
Tightly coupled symmetric (TCS) Architecture:
TCS is built to address the challenges of single file performance, which limits the development of traditional NAS systems. The advantages of HPC systems quickly overwhelm storage because they require a much greater number of single file I/O operations than single device I/O operations. The industry's response to this is to create products that utilize the TCS architecture, many of which are accompanied by distributed lock management (which locks the write operations of different parts of the file) and cache consistency. This solution works well for single file throughput issues, and many HPC customers in several different industries have already adopted this solution. This solution is advanced and requires a certain degree of technical experience to install and use.
Loosely coupled asymmetric (LCA) Architecture:
The LCA system uses different methods to extend outward. Instead of executing a policy to make each node aware of what each action is doing, it uses a central metadata control server outside of a data path. Centralized control offers many benefits, allowing for new levels of expansion:
Storage nodes can focus on the requirement to provide read and write services without the need for acknowledgement from network nodes.
Nodes can take advantage of different hardware CPUs and storage configurations, and still play a role in cloud storage.
Users can adjust cloud storage by taking advantage of hardware performance or virtualization instances.
Eliminating the large amount of state overhead shared between nodes can also eliminate the need for user computers to interconnect, such as Fibre Channel or infiniband, to further reduce costs.
The blending and matching of heterogeneous hardware enables users to expand storage on the scale of the current economy when needed, while providing permanent data availability.
Having centralized metadata means that storage nodes can be rotated for deep application archiving, and metadata is often available on control nodes.
Cloud storage Selection
While there are many options on scalable NAS platforms, they typically represent a service, a hardware device, or a software solution, each with its own advantages and disadvantages:
Service pattern: In the most common case, when you consider cloud storage, you will think of the service offerings that it provides. This pattern is easy to start, and its extensibility is almost instantaneous. By definition, you have a backup of offsite data. However, bandwidth is limited, so consider your recovery model. You have to meet the needs of your data outside your network.
HW Mode: This deployment is behind the firewall and provides better throughput than the public internal network. It is convenient to purchase integrated hardware storage solutions and, if the vendor is doing well on installation/management, it is often accompanied by an organic rack and stack model. However, you will give up some of the advantages of Moore's law because you will be limited by hardware devices.
SW Mode: SW mode has the advantage of HW mode. In addition, it has a price competitive advantage that HW does not have. However, its installation/management procedures should be carefully focused, because it is really difficult to install some SW, or may require other conditions to limit the choice of HW, and choose SW.
The following table summarizes the choices of different vendors:
The indication of the product has not been listed, so the details are not clear
With the advent of a massive digital data era, in which companies use YouTube to distribute training videos, there is no need to place these numbers "data" everywhere. Like these companies are committed to the creation and distribution of content, genome research, medical imaging requirements will be more rigorous and accurate. The LCS architecture's cloud storage is ideal for this type of workload, and provides a huge cost, performance, and management advantage.