In the Internet industry, 2013 can be said to be recognized as a trans-era significance of the "big data year." In this year, the data is more precious than ever, and even become a new energy source comparable to the oil resources, the big data is considered as the information and the Internet after the whole revolution of the message again peak. However, large data is not a slogan, and more enterprises need to be put into practice to dig out the potential value from the monotonous data.
A survey earlier this year pointed out that 28% of global companies and 25% of Chinese companies are already starting to practice big data. To further understand the real situation of large data application in Chinese enterprises, the structure has launched a special survey on large data application and trend, which reveals the challenges and solutions of large data to enterprises.
The survey was officially launched on September 30, 2013, lasted 1.5 months, through the line online two ways to recover questionnaires 500 copies, the crowd coverage data architects, database management and operational engineers, database development engineers, data analysts, research and development director and IT managers and other technical personnel.
The main conclusions of this large data survey
1, the monthly new data scale of more than 500G enterprises from 2012 to 16.67%, increased to 18.11%. Although the proportion of enterprises with large data has increased, there is still a big gap between the growth in data and the forecast.
2, the choice of domestic large data products accounted for only 5.61% enterprises, domestic manufacturers to seize large data and information security opportunities, will usher in the spring of growth.
3, the enterprise believes that large data storage and processing process of the three biggest difficulties are data security, system performance bottlenecks and data types diversification.
4. In the era of large data, the biggest challenge that enterprises face is the lack of professional data talents.
5, for unstructured data, the most urgent need for enterprises to solve is how to analyze these data.
6, the current deployment of large data enterprises reached 21.89%, planned 1 years of deployment of 27.92%, 2014 will be the peak of large data deployment.
7, the enterprise in large data selection process in the first consideration of the three factors is the performance of the product, service and support levels and other applications of compatibility.
8, most enterprises choose large data products or solutions type is large data analysis software.
9, the respondents are most concerned about the large data technology, ranked in the top five are large data analysis, cloud database, Hadoop, memory database, as well as data security.
10. The respondents considered that the most important three functions in large data analysis were real-time analysis, rich mining model and visual interface.
Second, enterprise large data Application status
How do you define large data? This is a question of the beholder and the benevolent. The mainstream is a "3V" model, that is, quantity (Volume), speed (velocity) and kind (produced), and Gartner's definition that large data is beyond the common hardware environment and the ability of software tools to collect, manage, and process data for its users within acceptable time.
The sheer volume of data is undoubtedly one of the most obvious features of large data. Forecasts say global information is growing at a rate of 59% a year. Does the size of the enterprise's data reach an unbearable level? In last year's survey we have on the enterprise monthly new data scale, the results show that the size of the enterprise data is growing, but not to reach the level of enterprise control, the monthly new data size of more than 500G enterprises accounted for 16.67%.
1, the enterprise monthly new data Scale survey
What is the situation this year? As can be seen from the above figure, the number of enterprises surveyed in the monthly new data size under 10G accounted for 26.79%,11-100g accounted for more than 41.89%,101-500g accounted for the 13.21%,500g above 18.11%.
Compared to last year's results can be seen, the company's new monthly data size of more than 500G from 2012 to 16.67%, growth from 2013 to 18.11%, an increase of 8.64%. The proportion of companies with large data has risen, but there is still a big gap between the growth of data in the forecast (59%).
2, enterprise selection of large data manufacturers survey
Which big data makers do companies prefer? Perhaps from the deployment of existing products can be discerned. As you can see from the illustration above, the top six vendors are IBM (18.74%), Oracle (18.33%), SAP (11.35%), Microsoft (9.71%), SAS (7.52%), and NetApp (7.52%).
Compared with 2012 survey data, Oracle dropped from 27.93% to 18.33%, and a single big situation ended, replaced by a proliferation of companies, each with a relatively average share. IBM and SAP share grew most rapidly in the top three, from 15.99% and 7.66% to the current 18.74% and 11.35% per cent.
In this year's survey, there are new options for domestic manufacturers. Compared to several foreign it giants, the share of domestic manufacturers is only 5.61%. This year the shock of the "Prism Gate" incident to the Enterprise information security sounded the alarm, but also to domestic manufacturers to bring great opportunities and challenges. Large data and information security in the field of surging demand, domestic manufacturers will usher in the spring of growth.
Third, the enterprise Big Data Pain Point analysis
Years ago, companies focused on information and the Internet, and in recent years have focused more on cloud computing, mobility and socialization. Ma Haixiang that no matter what kind of technology trends, the enterprise's data processing and analysis brings a lot of problems. The proliferation of data, data diversity and complexity, data security problems, are the challenges faced by enterprises. In order to further understand the real needs of enterprises, the survey for the large data Age enterprise pain point analysis.
1, the Enterprise large data storage and processing difficulties investigation
From the above figure, enterprises in large data storage and processing of the difficult distribution is more uniform, the highest proportion of data security (18.98%), ranked second is the system performance bottlenecks (18.42%), the third is the diversity of data types (18.01%). Other data analysis efficiency is low (15.24%), data read and write bottleneck (14.96%) and storage pressure (14.4%).
The gap between the options is very small, also shows that these six items are considered to be the enterprise data storage and processing difficulties, which data security is the most concern of enterprises. In a large data environment, many enterprises are rethinking information security policies to protect data resources from being violated.
2. Challenges faced by enterprises in the age of large data
In the big data age, the challenges facing companies can be seen in the picture above. The lack of professional data professionals (26.99%) has become the biggest challenge for enterprises, followed by the analysis and processing of unstructured data (26.65%), traditional technology difficult to deal with large data (25.27%) and new technology threshold is too high (21.13%).
The shortage of large data-related talents will become an important factor affecting the development of large data market. Gartner predicts that by 2015, there will be 4.4 million new jobs around the world with big data, and 25% of organizations will have a chief data Officer position. Large data related positions need to be complex talent, able to mathematics, statistics, data analysis, machine learning and natural language processing and other aspects of comprehensive control. In the future, large data will appear about 1 million of the talent gap, the need for society, universities and enterprises to work together to cultivate and excavate.
3. The challenge of the enterprise to the unstructured data
In the face of text, pictures, video and other unstructured data, the enterprise is not good at processing. From the above survey results can be seen, the most urgent need to solve the enterprise is how to analyze these data, the proportion of 38.96%. The second is integration with other data sources (32.5%), how to save the data (14.72%), and data security issues (13.82%).
I have mentioned in an article in Ma Haixiang blog that the core of the data is the discovery of value, and the core of the data is analysis. Analysis is the most critical aspect of large data, especially for unstructured data that is difficult to deal with in traditional ways, and it is the first thought to transform them into structured data before being processed and analyzed.
In contrast to the security of structured data, the security of unstructured data lacks the importance of the enterprise. But according to statistics, up to 80% of commercial data are in unstructured form. The security of unstructured data is also imminent, and enterprises need to do early warning and planning.
Four, the enterprise Big Data selection plan
Undoubtedly, big data is the hottest topic of the 2013. Lively, Ma Haixiang think we should go to calmly think about whether the enterprise needs to deploy large data, what type of large data need to be deployed, and how to choose the appropriate solution, need to do a targeted selection planning.
According to a survey this year, global corporate software spending of nearly 30 billion U.S. dollars, compared to 2012 growth of 6.4%, is expected to 2014 corporate spending will tilt to large data, especially in enterprise content management, data integration and data quality tools three.
1. Deployment of large data application planning survey
From the current situation and planning of large data application in domestic enterprises, what is the difference? The figure above shows that the proportion of enterprises that have deployed large data applications is 21.89%, the number of enterprises planned for 1 years is 27.92%, and the number of enterprises to be deployed within 2 years is 14.34%, There are no related plans and uncertain enterprises accounted for 11.32% and 24.53% respectively.
In the big data age, companies have become increasingly aware of the importance of data and are slowly beginning to accept shifts from traditional databases to large data analysis. But the biggest difficulty of large data is landing, need to combine with business requirement, choose a set of suitable big data solution.
2. Investigation on the influential factors of large data selection
As can be seen from the above figure, the enterprise in the process of large data selection first considered three factors are product performance (19.79%), service and Support (15.2%) and compatible with different applications (13.94%). Second is the price of the product (13.16%), the product's ease of use (12.18%), support for Mobility (11.11%), the manufacturer and brand (7.8%), and whether open source (6.82%).
The performance of the product in the first place is beyond doubt. Ranking in front of the product price of the service and support, but it seems to confirm the IT vendors to the service provider transformation of the road is correct. In addition, with the continuous deepening of mobility, support mobile version of large data solutions, will become the future trend.
3. Large data product or solution type survey
In addition to the factors considered in product selection, what type of large data products or solutions are suitable for the enterprise? From the above figure can be seen, the choice of large data analysis software enterprises accounted for 32.05%, the choice of large data overall solution accounted for 28.96%, the choice of infrastructure products accounted for 28.38%, The minimum selection is a large data integration machine, the proportion of 10.62%.
In addition to the importance of the large data analysis described above, we can see that large data integration machines are not as popular as they might be. According to Ma Haixiang blog insiders revealed that large data integration machine is often targeted at a business process design, lack of universality, and the price is expensive, not the general enterprise can accept. So the current large data integration machine is often targeted at mature business processes, can greatly simplify the deployment and maintenance work.
V. Enterprise Data Application Trend
For a long time, when it comes to big data, Hadoop will emerge in the mind, almost synonymous with big data. But in fact, the technical field of large data is very wide, involving data acquisition, integration, governance, analysis, exploration, and learning all aspects of wisdom.
1. Large Data Technology Trend survey
As can be seen from the above figure, the top five digits of the major data technology surveyed by the respondents were large data analysis (12.91%), Cloud Database (11.82%), Hadoop (11.73%), Memory Database (11.64%), and data Security (9.21%). The second is NoSQL (8.21%), Data Warehouse (8.21%), Data Integration (7.94%), Business Intelligence (7.13%), column-type database (5.96%), large data (database) All-in-one (3.52%), and Newsql (1.71%).
Thankfully, Hadoop is no longer the only big data technology in the minds of people, and large data analysis has become the most focused technology. From this, we can see that people's understanding of large data has been gradually deepened, the focus of technical points are more and more.
2. Large Data analysis function survey
Since large data analysis is the most focused technology trend, what is the most important feature of large data analysis? As can be seen from the above figure, the top three functions are real-time analysis (21.32%), rich mining Model (17.97%) and Visual Interface (15.91%). The second is predictive analysis (13.1%), Social data Analysis (12.12%), cloud Services (11.69%), and Mobile Bi (7.9%).
We did a similar survey in 2012, when the selection of rich mining models (27.22%) was 7.34% more than real time analysis (19.88%). In a short span of one year, the demand for real-time analysis has soared, and many large data makers have been successful in real-time analysis for innovative technology.
Ma Haixiang Blog Comments:
This survey is based on the 2013 large data application status and trends, from the survey results can be seen, enterprises in the next two years there is an urgent need to deploy large data, and has been from the beginning of infrastructure construction, gradually developed into large data analysis and the overall large data solution needs. At the same time, large data also face the lack of talent challenges, enterprises and colleges need to unite to cultivate the data field of compound talents, to help enterprises win the "data war."
This article for Ma Haixiang Blog original article, if you want to reprint, please indicate the original site is excerpted from http://www.mahaixiang.cn/sjfx/367.html, annotated source; otherwise, prohibit reprint; Thank you for your cooperation!