Hold Big Data! Resolving EMC Isilon cluster storage
Source: Internet
Author: User
KeywordsLarge data Intel offers
With Intel's drive, IT system communications bandwidth and computing power follow Molfa to record highs, maintaining a doubling of growth rates every 12-18 months. At the same time, IDC's latest "Digital Universe" study predicts that data will grow faster than Moore's law, reaching 1.8ZB in 2011, and that the company will manage 50 times times the current amount of data over the next 10 years, and file volumes will increase 75 times times. Under the background of the rapid expansion of the digital universe, the concept of "big data" came into being.
detailed Data
in fact, big data and cloud computing are the two associated concepts, although the industry temporarily does not have the official definition of large data, but in fact, the manufacturers of large data understanding has reached consensus.
EMC Information Infrastructure Products Director and chief Operating officer Pat Gelsinger that: Large data should include three elements, first of all, large data is a large data set, generally around 10TB scale, and sometimes set together a number of datasets will form a PB set of data volume. Secondly, these datasets often come from different applications and data sources, requiring the system to integrate semi-structured, unstructured and structured data well, and finally, large data has real-time, iterative features.
IDC Global Storage and large data project Vice President Benjamin Woo proposed that large data has four basic elements, Volume, produced, Velocity, value. First, the data is massive in volume; large data is a huge dataset provided by a large number of people with a variety of features, and the value of these data is very high, both for companies and for individual users around the world, and in addition, the speed at which data is expected to be obtained is very fast from system requirements. So use four V to generalize large data characteristics.
In addition, EMC has a deeper interpretation of the relationship between large data and the cloud: large data and clouds are two different concepts, but there are many intersections between them. The underlying principles of supporting large data and cloud computing are the same: scale, automation, resource allocation, self-healing, so there is actually a lot of synergy between large data and clouds.
"When we build our cloud facilities, we think about what kind of apps we should run on the cloud, and the big data is a very typical application that runs on the cloud," he said. For example, although e-mail is one of the applications on the cloud, it can also be detached from the cloud architecture, but large data applications must be architected on the cloud infrastructure. This is the relationship between the two-big data is inseparable from the cloud. "said Pat Gelsinger.
Traditional Storage bottleneck
today, the concept of large data is becoming clearer, but the problem of storing large data remains a problem for every user. Not only that, the entire IT domain technology has developed rapidly, many new technologies and new architectures before the 20 are now facing obsolescence and even disappearing into the vastness of technological development, and many of today's new technologies will face the same fate 20 years later, and the technology change in storage has been particularly pronounced in any other area.
Key technologies in the
storage area SAN and NAS architectures have now been developed for nearly 20 years and have replaced Das as the mainstream standard architecture for enterprise storage since 10 years ago. However, SAN and NAS platforms are essentially improvements to DAS and do not break the bottleneck of traditional storage technologies. Legacy storage architectures still have fundamental architectural flaws:
first of all, the traditional storage architecture is static, the design of the inherent lack of scalability, in the expansion, often only the number of disk expansion, Backplane, memory and processor resources can not be extended. If an enterprise wants to meet the growing capacity and performance requirements, it will have to spend a lot of costs, and the data risk is increasing. The final result is that users need to manage more and more complex storage, but the required organization and staffing is not sustainable growth.
The
volume is the most basic part of various storage technologies, providing data services for users ' front-end applications, and the most obvious indication of the need for new storage patterns from the application mode of storage volumes. In an ideal "cloud" system environment, volumes should be flexible and free, and it is difficult to find a reason to limit the data to a specific location. With sufficient security and reliability, individuals and applications should be able to easily access files and folders from any geographic location, just as the data is local, and the corresponding storage volumes should grow seamlessly as the application scale grows.
The fact of the matter is that the storage volumes do not migrate freely between devices in a variety of ways, and the expansion and contraction of the storage volumes is clearly not as flexible as we might think. When storage volumes are limited by reliability issues, technical limitations, or performance, the ultimate problem for users is inefficiencies. These fixed resource sets should fully realize their full potential,
In addition, another important challenge that traditional storage environments encounter is waste; Many storage vendors believe that up to 50% of the resources in the user environment are underutilized. This, of course, is beneficial to the storage vendor, but will lead to waste of power, cooling and management for the user.
's inherent bottlenecks make traditional storage even more stretched in the face of big data challenges.
Isilon Coping with large data
Isilon, a veteran vendor in the cluster storage area, has had several powerful competitors in the field, but as Dell acquired the EXANET,LSI acquisition Onstor, HP acquired Ibrix, HDS acquisition BlueArc and a series of acquisitions, Today, there are few independent vendors in the cluster storage field, and the cluster storage market has become the world of integrated storage solution providers. EMC's 22.5 dollar acquisition of Isilon was also interpreted as a rush to leverage Isilon's leadership in the big data age.
Isilon is based on the cluster storage system, its core technology is Onefs file system, the hardware platform is based on the Intel X86 standardized components. Onefs (see Figure 1) combines three traditional storage architecture tiers (file systems, volume managers, and RAID) into one unified software layer to create a single intelligent file system that spans all nodes in the storage system. The standardized components of the intel® X86 platform provide the perfect hardware base for onefs, and onefs its powerful storage management capabilities on these standardized hardware, based on the strong performance, superior energy efficiency and outstanding cost performance of the Intel standardized platform.
The Onefs file system is designed to address large data storage challenges. A Isilon cluster consists of multiple storage "nodes" based on the X86 platform containing memory, CPU, network, NVRAM, Infiniband, and storage media. With the support of the Onefs file system, Isilon clusters can be scaled horizontally from three nodes to as many as 144 nodes. In addition, Isilon provides the largest single file system for large data and can be extended to over 15PB in a single file system and a single volume. Each node added to the cluster can increase the total disk, cache, CPU, and network capacity. As total capacity increases, 144-node clusters can access globally consistent shared caches up to 13.8 TB. In addition, capacity and performance are available on a single storage system, a single file system, and a single volume, and the complexity of the system and the management time of the storage administrator will not increase as the system expands.
this year, EMC has made a new round of upgrades for the Isilon product line. The new EMC Isilon S200 and X200 products are all based on Intel's Westmere and Nehalem processors, with high-end S200 systems providing more than twice times the file throughput. S200 replaces the previous IQ 5400s,s200 with an existing IQ of 7200X. The 7200X near-line storage version of 72NL and 36NL will remain in the Isilon product portfolio. However, it is estimated that the upcoming 200NL will be introduced as an update of existing hardware into the near-line product area.
The
S200 has 2 4 core Westmere processors that provide 1.4 million NFS IOPS and a single filesystem with 85gb/s throughput. The previous 5400S can only provide 600000 NFS IOPS and its throughput is about 45gb/s. Midrange X200 Each node has 1 Nehalem processors and 48GB of maximum memory, which can hold 24TB of data, including 12 2TB, 3.5-inch SATA drives. Vertically scalable to 5.2PB, may provide 309312 IOPS, with 35.7gb/s throughput, 6.9TB global cache. In addition to support for traditional disks, X200 supports solid-state drives.
Application of
X86 platform in cluster storage
Isilon is one of the key layouts of EMC's big data strategy to meet the storage needs of large data with a unique distributed file system and an efficient hardware base. In the context of large data requirements, the cluster architecture has been extended from computing to data storage, and today, most high-end systems in the storage field have shifted from the traditional scale up mode to the scale out architecture, based on the distributed parallel computing features of the scale out architecture, Intel The X86 architecture also plays an important role in the cluster storage architecture.
concludes that Intel's strategy in the storage area can be summed up as three elements of standardization, high integration, and low cost.
Intel's efforts to standardize the process of standardization can be said to be spared, Intel Storage chief technology officer Mike McGrath said in an interview: "Intel follows the principle of openness and open technology, closed technology and products are not necessarily the user must choose." The trend towards integration and integration of storage is unstoppable, and standardization in this process is critical. In terms of long-term and user-cost perspectives, the storage industry ultimately needs to move towards openness.
in the cluster storage system architecture, the adoption of a standardized hardware platform can help reduce overall system costs, help users easier and faster adoption of the industry's latest technology, while providing the most compatible features, and for future expansion and upgrade laid a good foundation.
on the Intel server platform ROMLEY-EP for the next generation (expected to be released in 2012), Intel will integrate 10Gb Ethernet switching technology on the motherboard, including the sandy Bridge-ep server processor (future named Xeon E5) and the Patsburg chipset. Intel integrates the RAID acceleration feature into the processor chip and integrates the 6GB/S SAS interface on the Patsburg chipset. The addition of these important storage functions makes the Romley platform a highly integrated processor platform that greatly simplifies the architecture and design thresholds of storage systems while driving storage integration into a new phase.
In addition, based on Intel's unique standardization and scale advantages, X86 nodes tend to provide high computational density, lower cost and power consumption, and simple and flexible management while performing excellent performance. It makes the cluster storage based on X86 platform have incomparable cost advantage in coping with large scale storage requirements such as big data.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.