The coming 2013 is considered to be a "year of big Data" with a trans-era significance. In this year, the data is more precious than ever, and even become a new energy source comparable to the oil resources, the big data is considered as the information and the Internet after the whole revolution of the message again peak. However, large data is not a slogan, and more enterprises need to be put into practice to dig out the potential value from the monotonous data.
A survey earlier this year pointed out that 28% of global companies and 25% of Chinese companies have started to practice big data. In order to further understand the real situation of large data application in Chinese enterprises, IT168 recently launched a special survey on large data application and trend in Itpub and Chinaunix, revealing the challenges and solutions of large data to enterprises.
The survey was officially launched on September 30, 2013, lasted 1.5 months, through the line online two ways to recover questionnaires 500 copies, the crowd coverage data architects, database management and operational engineers, database development engineers, data analysts, research and development director and IT managers and other technical personnel.
Main findings of the survey:
1. The number of enterprises with new data of more than 500G per month increased from 16.67% in 2012 to 18.11%. Although the proportion of enterprises with large data has increased, there is still a big gap between the growth in data and the forecast.
2. The use of domestic large data products accounted for only 5.61% enterprises, domestic manufacturers to seize large data and information security, a major opportunity, will usher in the spring of growth.
3. Enterprises consider that the three biggest difficulties in the storage and processing of large data are data security, system performance bottleneck and data type diversification.
4. In the big Data age, the biggest challenge for businesses is the lack of professional data professionals.
5. For unstructured data, the most urgent need for enterprises to address is how to analyze these data.
6. The number of enterprises that currently have large data deployments is 21.89%, the planned 1-year deployment of 27.92%, and 2014 will be the peak of large data deployments.
7. The first three factors considered in the process of large data selection are product performance, service and support levels, and compatibility with other applications.
8. Most enterprises choose large data products or solutions of the type of large data analysis software.
9. The major data technologies most concerned by respondents were large data analysis, cloud databases, Hadoop, memory databases, and data security, respectively, in the top five.
10. The respondents considered that the most important three functions in large data analysis were real-time analysis, rich mining model and visual interface.
The present situation of large data application in enterprises
How do I define large data? This is a question of the beholder and the benevolent. The mainstream is a "3V" model, that is, quantity (Volume), speed (velocity), and kind (produced), and Gartner's definition that large data is beyond the common hardware environment and the ability of software tools to collect, manage, and process data for its users within acceptable time.
The sheer volume of data is undoubtedly one of the most obvious features of large data. Forecasts say global information is growing at a rate of 59% a year. Does the size of the enterprise's data reach an unbearable level? In last year's survey we have on the enterprise monthly new data scale, the results show that the size of the enterprise data is growing, but not to reach the level of enterprise control, the monthly new data size of more than 500G enterprises accounted for 16.67%.
▲ Monthly new data size survey for enterprises
What is the situation this year? As can be seen from the above figure, the number of enterprises surveyed in the monthly new data size under 10G accounted for 26.79%,11-100g accounted for more than 41.89%,101-500g accounted for the 13.21%,500g above 18.11%.
Compared to last year's results can be seen, the company's new monthly data size of more than 500G from 2012 to 16.67%, growth from 2013 to 18.11%, an increase of 8.64%. The proportion of companies with large data has risen, but there is still a big gap between the growth of data in the forecast (59%).
▲ Survey of large data manufacturers
Which big data makers do companies prefer? Perhaps from the deployment of existing products can be discerned. As you can see from the illustration above, the top six vendors are IBM (18.74%), Oracle (18.33%), SAP (11.35%), Microsoft (9.71%), SAS (7.52%), and NetApp (7.52%).
Compared with 2012 survey data, Oracle dropped from 27.93% to 18.33%, and a single big situation ended, replaced by a proliferation of companies, each with a relatively average share. IBM and SAP share grew most rapidly in the top three, from 15.99% and 7.66% to the current 18.74% and 11.35% per cent.
In this year's survey, there are new options for domestic manufacturers. Compared to several foreign it giants, the share of domestic manufacturers is only 5.61%. This year the shock of the "Prism Gate" incident to the Enterprise information security sounded the alarm, but also to domestic manufacturers to bring great opportunities and challenges. Large data and information security in the field of surging demand, domestic manufacturers will usher in the spring of growth.
Analysis of large data pain points in enterprises
Years ago, companies focused on information and the Internet, and in recent years have focused more on cloud computing, mobility and socialization. No matter which kind of technology trend, the enterprise's data processing and analysis brings many problems. The proliferation of data, data diversity and complexity, data security problems, are the challenges faced by enterprises. In order to further understand the real needs of enterprises, the survey for the large data Age enterprise pain point analysis.
▲ storage and processing difficulties of large data in enterprises
From the above figure, enterprises in large data storage and processing of the difficult distribution is more uniform, the highest proportion of data security (18.98%), ranked second is the system performance bottlenecks (18.42%), the third is the diversity of data types (18.01%). Other data analysis efficiency is low (15.24%), data read and write bottleneck (14.96%) and storage pressure (14.4%).
The gap between the options is very small, also shows that these six items are considered to be the enterprise data storage and processing difficulties, which data security is the most concern of enterprises. In a large data environment, many enterprises are rethinking information security policies to protect data resources from being violated.
▲ challenges faced by enterprises in the age of large data
In the big data age, the challenges facing companies can be seen in the picture above. The lack of professional data professionals (26.99%) has become the biggest challenge for enterprises, followed by the analysis and processing of unstructured data (26.65%), traditional technology difficult to deal with large data (25.27%) and new technology threshold is too high (21.13%).
The shortage of large data-related talents will become an important factor affecting the development of large data market. Gartner predicts that by 2015, there will be 4.4 million new jobs around the world with big data, and 25% of organizations will have a chief data Officer position. Large data related positions need to be complex talent, able to mathematics, statistics, data analysis, machine learning and natural language processing and other aspects of comprehensive control. In the future, large data will appear about 1 million of the talent gap, the need for society, universities and enterprises to work together to cultivate and excavate.
▲ Challenges for unstructured data
In the face of text, pictures, video and other unstructured data, the enterprise is not good at processing. From the above survey results can be seen, the most urgent need to solve the enterprise is how to analyze these data, the proportion of 38.96%. The second is integration with other data sources (32.5%), how to save the data (14.72%), and data security issues (13.82%).
"Mastering Big Data," wrote the book, the core of the data is to discover value, and control the core of the data is analysis. Analysis is the most critical aspect of large data, especially for unstructured data that is difficult to deal with in traditional ways, and it is the first thought to transform them into structured data before being processed and analyzed.
In contrast to the security of structured data, the security of unstructured data lacks the importance of the enterprise. But according to statistics, up to 80% of commercial data are in unstructured form. The security of unstructured data is also imminent, and enterprises need to do early warning and planning.
Third, enterprise large data selection planning
Undoubtedly, big data is the hottest topic of the 2013. Lively, we should also calmly think about whether the enterprise needs to deploy large data, need to deploy which type of large data, and how to choose the appropriate solution, need to do a targeted selection planning.
According to a survey this year, global corporate software spending of nearly 30 billion U.S. dollars, compared to 2012 growth of 6.4%, is expected to 2014 corporate spending will tilt to large data, especially in enterprise content management, data integration and data quality tools three.
▲ deployment of large data application planning survey
From the current situation and planning of large data application in domestic enterprises, what is the difference? The figure above shows that the proportion of enterprises that have deployed large data applications is 21.89%, the number of enterprises planned for 1 years is 27.92%, and the number of enterprises to be deployed within 2 years is 14.34%, There are no related plans and uncertain enterprises accounted for 11.32% and 24.53% respectively.
In the big data age, companies have become increasingly aware of the importance of data and are slowly beginning to accept shifts from traditional databases to large data analysis. But the biggest difficulty of large data is landing, need to combine with business requirement, choose a set of suitable big data solution.
▲ Survey on influential factors of large data selection
As can be seen from the above figure, the enterprise in the process of large data selection first considered three factors are product performance (19.79%), service and Support (15.2%) and compatible with different applications (13.94%). Second is the price of the product (13.16%), the product's ease of use (12.18%), support for Mobility (11.11%), the manufacturer and brand (7.8%), and whether open source (6.82%).
The performance of the product in the first place is beyond doubt. Ranking in front of the product price of the service and support, but it seems to confirm the IT vendors to the service provider transformation of the road is correct. In addition, with the continuous deepening of mobility, support mobile version of large data solutions, will become the future trend.
▲ Large data product or solution type survey
In addition to the factors considered in product selection, what type of large data products or solutions are suitable for the enterprise? From the above figure can be seen, the choice of large data analysis software enterprises accounted for 32.05%, the choice of large data overall solution accounted for 28.96%, the choice of infrastructure products accounted for 28.38%, The minimum selection is a large data integration machine, the proportion of 10.62%.
In addition to the importance of the large data analysis described above, we can see that large data integration machines are not as popular as they might be. According to industry sources, large data integration machine is often targeted at a business process design, lack of universality, and the price is expensive, not the general enterprise can accept. So the current large data integration machine is often targeted at mature business processes, can greatly simplify the deployment and maintenance work.
IV. Enterprise Data Application trends
For a long time, when it comes to big data, Hadoop will emerge in the mind, almost synonymous with big data. But in fact, the technical field of large data is very wide, involving data acquisition, integration, governance, analysis, exploration, and learning all aspects of wisdom.
▲ Large Data Technology trend survey
As can be seen from the above figure, the top five digits of the major data technology surveyed by the respondents were large data analysis (12.91%), Cloud Database (11.82%), Hadoop (11.73%), Memory Database (11.64%), and data Security (9.21%). The second is NoSQL (8.21%), Data Warehouse (8.21%), Data Integration (7.94%), Business Intelligence (7.13%), column-type database (5.96%), large data (database) All-in-one (3.52%), and Newsql (1.71%).
Thankfully, Hadoop is no longer the only big data technology in the minds of people, and large data analysis has become the most focused technology. From this, we can see that people's understanding of large data has been gradually deepened, the focus of technical points are more and more.
▲ Large data Analysis function survey
Since large data analysis is the most focused technology trend, what is the most important feature of large data analysis? As can be seen from the above figure, the top three functions are real-time analysis (21.32%), rich mining Model (17.97%) and Visual Interface (15.91%). The second is predictive analysis (13.1%), Social data Analysis (12.12%), cloud Services (11.69%), and Mobile Bi (7.9%).
A similar survey was conducted in 2012, when the selection of rich mining models (27.22%) was 7.34% more than real time analysis (19.88%). In a short span of one year, the demand for real-time analysis has soared, and many large data makers have been successful in real-time analysis for innovative technology.
Summary
This survey is based on the 2013 large data application status and trends, from the survey results can be seen, enterprises in the next two years there is an urgent need to deploy large data, and has been from the beginning of infrastructure construction, gradually developed into large data analysis and the overall large data solution needs. At the same time, large data also face the lack of talent challenges, enterprises and colleges need to unite to cultivate the data field of compound talents, to help enterprises win the "data war."