Now, cloud computing and large data are undoubtedly the fire of the concept, the industry to their discussion also intensified, then cloud computing and large data encounter again how the link? Some people say that cloud computing and large data are twins, two are different individuals, interdependent and complementary, and some people say that big data is to disrupt.
Cloud computing VS Big Data
IBM Global Senior Vice President, Rod Adkins, general manager of Systems and Technology (STG), believes that the current global IT sector has exciting trends and challenges, with a large number of data and information generated every day, which provides an opportunity for large data analysis, as well as new opportunities for it by the challenges of the data center, Cloud computing, for example, lowers the cost of data centers; IBM hopes to realize the vision of a smarter planet through intelligent computing.
He Jingxiang, general manager of Intel Asia Pacific Research and Development Co., Ltd., general manager of Software and services Division of China, said the big data itself was a new leader in the information revolution. In the next few years with the development of the Internet of things, there may be 210 billion RFID or cluster, in our environment, if the future of mobile interconnection, the Internet of things if the reality, our lives will be sensors, will be embraced by data acquisition devices, this time the amount of data will be greater. This amount of data is just data, does not solve the problem, it from the data into information, into intelligence, into business value, which can reflect the real value of large data.
VMware Global Senior Vice president Fan that in the past three years, saw the development of large data from scratch, the market, people say that the trend of large data, three years ago, may have no one to say the word, now in full swing. However, in addition to the data itself has changed, cloud computing has also made the data more decentralized, in this trend, the traditional database for a large number of data requirements, fast demand, the demand for diversity of developer data is difficult to meet, so that a wide range of solutions.
EMC's big data and storage expert, EMC Senior product manager Li Junpeng that large data itself is a problem set, cloud technology is the current solution to large data problem sets the most important and effective means. Cloud Computing provides an infrastructure platform for large data applications to run on this platform. Currently recognized as the most effective means of processing large data distributed processing, is also a concrete reflection of cloud computing ideas.
As for the impact of large data on cloud computing, Teradata Technology director Stephen Brobst says the public cloud architecture has no impact on the data warehouse, because it is dangerous for corporate CIOs not to put financial data or customer data on the cloud for no reason. However, the private cloud architecture does have an impact: first, through the private cloud, you can consolidate the data mart, reduce the problem of insufficient utilization, second, the data can be integrated in a sensitive way to achieve business value.
Large data and cloud computing applications vary
In fact, the difference between cloud computing and large data is the application of the different, mainly in two areas:
First, the concept is different, cloud computing changes it, and big data changes the business. However, large data must have the cloud as the infrastructure to operate smoothly.
Second, the large data and cloud computing target audience, cloud computing is sold to the CIO technology and products, is an advanced IT solution. The big data is sold to the CEO, to the business layer of products, the decision makers of large data is the business layer. Because they can directly feel the pressure from the market competition, must be in the business of a more competitive way to overcome their opponents.
Big data is not just Hadoop
Hadoop was launched and developed by the Apache Foundation and is currently recognized as one of the open platforms in the industry. Authorized companies can publish their own version of Hadoop. The distributed system represented by Hadoop is a necessary part of large data system. The necessity is embodied in the large data now that many of the data are machine-generated data, or the Internet of things. A variety of detectors, computer generated logs, these are man-made, and a large number of it is not suitable to put it directly into the database, and Hadoop provides a new way to easily expand the plane, Put this data in the library for arbitrary data analysis. Hadoop successfully built this environment, enabling software around Hadoop to provide a wide variety of functions to complete the intelligent analysis work.
However, the big data is not just Hadoop, in the analysis of the data, users can put the data in the pool, Hadoop will divide the data into hundreds of, thousands of nodes, which is in certain scenarios must be part of the application. But more scenarios require real-time responses and interactive responses that require other technologies, including memory-class retrieval techniques, and even real-time responses to data generation. These technologies are combined to be a complete large data processing system.
Major manufacturers to deal with large data
Regardless of whether the big data age is really coming, as enterprise service vendors should be in front of the audience, to deal with large data.
1.IBM:4V theory + Large Data analysis platform
IBM, based on large data, presents the "4V Theory" of scale (Volume), diversity (produced), high-speed (velocity) and authenticity (veracity) to help companies visualize and accurately grasp large data attributes.
IBM launched the Blue Cloud computing platform as early as November 2007, and the "Blue Cloud" is based on the cloud infrastructure of the Ibmalmaden Research Center (almadenresearchcenter), including Xen and POWERVM virtualization, Linux OS image and Hadoop file system with parallel build.
In addition, IBM has a large data analysis platform--infosphere. Infosphere is a powerful weapon for IBM's foray into large data fields, including biginsights and streams, which complement each other, biglnsights based on Hadoop, to analyze large-scale static data, providing distributed computing with multiple nodes, Nodes can be added at any time to enhance the data processing capability, while streams uses the memory calculation method to analyze the real-time data. Infosphere Large data analysis platform also integrates data warehouse, database, data integration, business process management and other components.
2. Intel: Hardware + software Firepower
Hardware, Intel from the CPU, storage, memory technology to consider how to make the next generation of system architecture, as well as data center solutions can be more suitable for large data requirements.
Software, Intel provides optimized middleware. In the case of Hadoop, Intel has been enhanced and optimized on HBase and HDFs on the Hadoop system, enabling them to significantly improve the performance of Intel's hardware on Intel's platforms and launch the Intel Hadoop Manager2.0.
Intel Hadoop Manager2.0 has been introduced to optimize the processing power of Hadoop, shortening data collection to data processing to near real-time processing, and multiplying performance on Intel platforms.
3.VMware: Virtualization Architecture + Cloud Platform
VMware's Greater China president Song that the cloud platform is the only way to solve the demand for explosive data use, moving key applications to the cloud platform is an inevitable trend. As a result, VMware tries to apply large data to the cloud of virtual environments.
VMware's Open source project, Serenget, enables businesses to deploy and manage Hadoop on vsphere in the cloud and virtual environments. It is reported that in such an environment, Hadoop deployment time can be shortened from many days to 10 minutes.
In addition, VMware, on the Unified vsphere Virtualization architecture, provides gemfire real-time processing, greenplum interaction and Hadoop batch processing three modes to meet the massive, fast and flexible data processing needs of users, and for developers, data analysts, Data scientists and business users provide data analysis and visualization of data display. VMware also unveiled a large data analysis platform UAP (Universal Analytics Platform), which includes Greenplum database, Hadoop, and chorus analytics software, Help customers analyze both structured and unstructured data at the same time.
4.EMC:EMC HADOOP
EMC has released the EMC Hadoop version, which focuses on real-time, unstructured data processing. EMC's product portfolio includes three aspects of Hadoop: Greenplum HD Community Edition; Greenplum Enterprise Edition; Greenplum HD Data Computing appliance.
According to the introduction, in Greenplum HD has some core technical innovation: pluggable I/O, can use Isilon onefs storage System, can also use Atmos, etc., improve efficiency and performance; real-time processing can be real-time data interaction and analysis processing; in fault tolerance, eliminate the single point of applause for name nodes, There are many optimizations for job tracking and other key components. Its biggest bright spot is the Greenplum database and the Apache Hadoop powerful union, such one machine realizes the structured, the unstructured data seamless integration.
In addition, Oracle has launched a large data machine to provide a way for enterprises to deal with massive unstructured data, integrating hardware, storage and software, including open source code distribution for Apache Hadoop software, a new Oracle NoSQL database, and the R-language open source code distribution for statistical analysis.