The big data age has come and has quietly influenced our lives. According to a recent study by IDC, 1 million new links are shared every 20 minutes on Facebook and 10 million user reviews are released. Facebook and all other Internet sites, Internet applications, have gradually become the entire data, analysis, processing and value-added architecture.
In China, social networks are also in full swing. Sina Vice President Wang Gaofei has said that Sina Weibo has registered more than 300 million users, the average user posted more than 100 million micro-blog content per day, the equivalent of every 10 Chinese inside, there will be a person to publish a microblog every day. The average online time for each user is 60 minutes, and 60% of active users are logged in by mobile terminals, and 40% of all original content from mobile terminals share photos. Under the influence of social networks, users can consume and create data at any time, any place, or in any state, through mobile devices.
The development of social networks and mobile Internet has spawned a lot of unstructured data, this is a different kind of data type from the traditional structure, and common images, videos, music, Office documents, Web pages, microblogs, instant messaging, and sensor-generated data all belong to unstructured data. According to the Intel Asia Pacific Research and Development Co., Ltd., general manager of the Software and Services Division, general manager of China, Dr. He Jingxiang, the current amount of data produced every 48 hours equivalent to the total number of human civilization in 2003, the future with the Internet and the development of intelligent cities, this figure will be even more alarming And many are the unstructured data produced by data acquisition devices such as sensors.
In addition, traditional enterprises also face large data challenges. According to Gartner, corporate data will increase by 800% in five years, of which 80% are structured. Non-business data from groups, communities, and social networks can be a major part of this trend. The explosive growth of unstructured data is challenging traditional databases, and Hadoop is becoming a favorite of the global IT industry.
Hadoop is a 100% Distributed file system, known as the most successful Open-source software since Linux, and its biggest advantage is to store and compute unstructured data. Hadoop can make use of cost-effective X86 server composition High-performance cluster, when the amount of data to increase to no load, just add the corresponding node to meet the computing needs, low-cost storage and computing is the driving force of large data.
Traditional database of Hadoop dreams
Unlike Hadoop, the database carries the day-to-day management of structured data in an enterprise from the date it was born. The development of database has undergone three stages of manual management, file system and database system, and the database is changing in a new direction under the influence of market trend. According to IDC, 2011 Global Data volume has reached the 1.8zb,2020 year will reach 35ZB, which means that global data will enter the era of the Big Bang. Traditional database vendors have launched their own large data solutions, and these solutions have a common key word Hadoop.
Hadoop Distributed system infrastructure, mainly composed of hdfshttp://xilele.cctv.com/pinpai/dongtai/yiliao/491659.htm, MapReduce and HBase, It is a software platform that can easily develop and run large data processing. Hadoop is not equal to the database, the biggest difference between them is that the database is good at dealing with structured data, and Hadoop is good at dealing with unstructured data, data type diversification is one of the characteristics of large data. Hadoop is both a and an opportunity for database vendors to create a new sky for the database if Hadoop is used for the database. The following author will take stock of the database that supports Hadoop, and briefly analyze its large data strategy.
Oracle: Oracle has a leading position in the database industry, and its Oracle database is one of the most popular relational database products. Oracle has been more focused on structured tools and RDBMS platforms, but Oracle has also begun to move into the big data age over the past year, said Sishing, Global VP and Technology general manager for Greater China. Indeed, Oracle is aware of the potential of Hadoop in large data processing, with the introduction of the Hadoop-based large creator, which includes open source Apache Hadoop, Oracle NoSQL database, Oracle Data Integration Hadoop application adapters, Oracle Hadoop loaders, and open source R, and collaborate with the Cloudera company to provide Apache Hadoop series software.
IBM DB2:IBM is the creator of relational database, which is important to the birth and development of database, but in the new era of large data, the old relational database also needs to innovate and meet the challenge. Wangyun, a Fellow of IBM's China Research Institute and chief technology officer, said at the 2012 China Database technology conference that large data could not be processed by traditional methods, and traditional relational databases originated from OLTP functions and were able to record data accurately, while large data was a new application and an embodiment of OLAP, This is why relational databases do not meet large data. IBM launched a large data platform including Hadoop and stream computing two components, through the new path to solve large data analysis processing.
SQL Server: Microsoft, the world's leading software company, has a reputation in the database field. Microsoft SQL Server 2012 introduces Hadoop to help customers seamlessly store and process all types of data, including structured, unstructured, and real-time data. In addition, Microsoft will also provide Hadoop on Windows Azure platform and Windows Server to form a complete large data solution. As Sun Boke, Microsoft's chief technology officer for Asia-Pacific Research and Development, says, Microsoft and Hadoop are a powerful combination of the high performance and scalability of Hadoop and the traditional advantages of easy and easy deployment of Microsoft's products.
SAP:SAP Company is a world-renowned enterprise management software provider, since the 2010 SAP acquisition Sybase, began to become the database industry a rising star. SAP has taken database technology as one of the key development areas in the 2012, forming a large data strategy based on SAP HANA and SAP Sybase database. A particularly important part of this strategy is Hadoop. With the integration of SAP HANA and SAP Sybase IQ and Hadoop, it enhances the ability to gain access to large data sources such as Hadoop and provides a deeply integrated preprocessing infrastructure.
EMC GREENPLUM:EMC is a world-renowned information storage service provider, similar to SAP, acquiring Greenplum in 2010 and starting to develop its database market. Currently, Greenplum's database products include traditional greenplum databases and Greenplum HD (Hadoop), which are used to address enterprise-structured data that can be stored and analyzed in Greenplum for unstructured data import. EMC's market strategy in China, with "Big Data for business transformation" as its core, Liuweiguang, general manager of the EMC Data Computing Products division of Greater China, has told the author that EMC's launch of the Greenplum Hadoop version is full of confidence in the future development prospects of Hadoop.
In addition to the five mainstream databases mentioned above, more and more traditional database vendors are joining the Hadoop camp, including Teradata, Informatica, Pentaho, Talend and other databases, data warehouses and business intelligence service providers. In addition, Hadoop is one of the main architectures of the NoSQL database.