Today, the way companies deal with big data is changing rapidly. Just a few years ago, big data was just a popular buzzword, and most organizations are trying to use Hadoop and related technologies. Today, big data technologies, especially big data analytics, have evolved into an important part of most corporate strategies, and companies are under intense pressure to keep up with the rapid growth of big data.
NewVantage's 2018 partner's big data executive survey shows that the discovery of big data projects and the benefits of these projects has become almost universal. Among respondents, 97.2% of executives said their company is implementing big data or artificial intelligence (AI) programs, and 98.6% said their company is trying to create a data-driven culture, and in 2017 the figure is 85.5%. The vast majority of respondents (73%) said they have achieved measurable value through big data initiatives.
The 2018 Big Data Maturity Survey, conducted independently by vendor AtScale, found that 66% of organizations believe that big data is strategic or changing the rules of the game, while only 17% believe the technology is experimental. In addition, 95% of respondents indicated that they plan to use big data to do more work in the next three months.
But what exactly do they do with big data?
A number of different trends are affecting big data initiatives, but the four overall themes are becoming a key factor influencing big data in 2018: cloud computing, machine learning, data governance, and the need for speed.
1. Cloud computing
Analysts believe that big data is moving toward cloud computing. Brian Hopkins of Forrester, a research firm, said: "Global big data solutions through cloud subscriptions will grow at a rate 7.5 times faster than internal subscriptions. In addition, according to data surveys in 2016 and 2017, public clouds are big data analytics professionals. The top technical priorities for people.” He said that the cost advantages and innovations offered through public cloud services will be irresistible to most companies.
And some surveys seem to support these conclusions:
In AtScale's survey, 59% of respondents said they have deployed big data in the cloud, and 77% expect some or all of their big data deployments to be in the cloud.
Teradata's cloud computing status analysis report found that there is a higher demand for cloud-based big data analytics. 38% of respondents said that the cloud is the best place to run analytics, and 69% said they want to run all analytics in the cloud platform by 2023.
Why are they so eager to move to the cloud? The expected benefits of cloud computing analysis include faster deployment (51%), higher security (46%), better performance (44%), faster data insight ( 44%), users are more accessible (43%), and lower cost maintenance (41%).
Organizations will continue to migrate their data stores to the cloud services of public cloud providers, and when data is already in the cloud, performing big data analytics in the cloud will be faster, easier, and less expensive.
In addition, many cloud computing providers offer artificial intelligence and machine learning tools to make cloud computing more attractive.
2. Machine learning and artificial intelligence
Machine learning is an important part of artificial intelligence. It is learned without explicit programming by computers. It is intrinsically linked to big data analysis, so these two terms are sometimes mixed. In fact, the cover of this year's NewVantage annual big data survey has been redesigned to show that it contains big data and artificial intelligence.
The authors of the survey wrote: “Big data and artificial intelligence projects are almost indistinguishable, especially considering that machine learning is one of the most popular technologies for handling large amounts of fast-moving data.”
When the survey asked managers to choose which big data technology would have the biggest disruptive impact, 71.8% of respondents chose the most option is artificial intelligence. This is a significant increase compared to 2017, when only 44.3% of respondents expressed the same view. Of particular note is that artificial intelligence is far ahead of cloud computing (12.7%) and blockchain (7.0%).
John But David Lovelock, vice president of research at Gartner Inc., agrees with these executives. "Due to advances in computing power, quantity, speed, and various data, as well as the development of deep neural networks (DNNs), artificial intelligence promises to be the most disruptive technology category in the next 10 years," he said.
Gartner's recent forecasts indicate that "the global business value from artificial intelligence (AI) is expected to reach $1.2 trillion in 2018, a 70% increase from 2017." Looking ahead, Gartner analysts added: " It is predicted that the commercial value of artificial intelligence will reach 3.9 trillion US dollars in 2022."
Given the potential business value, it is not surprising that companies plan to invest heavily in machine learning and related technologies. According to research firm IDC, "the global cognitive and artificial intelligence (AI) system spending will reach $19.1 billion in 2018, an increase of 54.2% over 2017."
3. Data governance
However, while the potential benefits of cloud computing and machine learning are driving companies to invest in these big data technologies, companies still face significant barriers to big data.
The most important of these is how to ensure the accuracy, availability, security and compliance of all data.
When AtScale's survey asked respondents to point out the biggest challenges they face in relation to big data, governance ranked second, behind only the skill set, and this is the number one challenge in the annual survey. As early as 2016, governance was only at the bottom of the list of challenges, so it rose to second place, and its changes were particularly significant. Organizations are now more concerned with data governance than performance, security, or data management.
Part of the reason for renewed attention may be the recent data breach scandals of Facebook and Cambridge analysis in the UK. This violation clearly shows that a potential public relations nightmare may occur due to the loss of the normal development of data and does not properly protect the privacy of the user.
The Common Data Protection Regulations (GDPR) promulgated by the European Union is another major force for change that took effect in May this year. It requires organizations with all EU citizenship data to meet certain requirements, such as violation notifications, access rights, forgotten rights, data portability, design privacy, and the appointment of data protection personnel.
Regulatory change puts increasing pressure on organizations to ensure they know what data they have and where they live, and to ensure that they are properly protected. This is a daunting task that requires many companies to step up their development and rethink their big data strategy.
4. The need for speed
At the same time, they feel the need to slow down the processing of data governance issues, and many companies have a need for faster big data analytics.
In NewVantage's survey, 47.8% of executives said that they mainly use big data because of near real-time, daily dashboards and operational reports, or real-time interactions, or customer-facing streaming, or mission-critical tasks. application. This is an important development because the traditional use of data analysis is to perform batch reports on a daily, weekly or monthly basis.
Similarly, Syncsort's survey found that 60.4% of respondents were interested in real-time analytics.
To meet the needs of real-time or near real-time performance, companies are increasingly using memory technology. Since processing data in memory (RAM) is much faster than accessing data stored on a hard drive or SSD drive, memory technology can significantly increase speed.
In fact, SAP claims that its proprietary HANA technology has helped some companies accelerate business processes by a factor of 10,000. Although most companies have not experienced this performance boost, SAP is not the only company that has made significant contributions to memory technology. Apache Spark is an open source big data analytics engine running in memory that claims to run 100 times faster than a standard Hadoop engine.
Enterprises seem to notice these performance improvements. Vendor Qubole reports that Apache Spark's usage time increased by 298% between 2017 and 2018. When people look at the number of commands running on Apache Spark, the growth is even more impressive, with the total number of commands running on Spark between 2017 and 2018 increased by 439%.
In some respects, this demand for speed has also driven the development of other three major data macro trends. Part of the reason is that organizations are migrating big data to the cloud because they want performance gains. They invest in machine learning and artificial intelligence at least to some extent because they want faster, better insights. They are experiencing challenges related to data governance and compliance, at least in part because they are accepting big data technologies so quickly, without first addressing all data quality, privacy, security, and compliance issues.
In the near future, companies will look for new ways to use big data to disrupt their industry. And gain a competitive advantage, it is expected that all four trends will continue and accelerate development.