Large data dominate the current technology news, and it is touted as a possible solution to all problems, from intrusion detection and fraud prevention to cancer treatment and the setting of optimal product prices.
But we define large amounts, multiple formats, and high speed data that are not the panacea for every problem. In fact, if the company superstitions around some of the big data myths, it may go further in the wrong direction, wasting a lot of time and money, affecting the company's market competitive position, or damaging the company's reputation.
Here are 10 of the biggest myths that businesses should know about big data, and understand that they will help to effectively avoid the negative impact of large data and really get the commercial value of big data.
Avoid the waste of investment and recognize the 10 misunderstandings of large data
Myth # 1: Only data scientists can handle big data
In fact, only data scientists are not enough.
"Data scientists themselves can't get information about big data if you don't know the first question you're looking for," said Pat Farrell, senior director of Penn Dentistry data analysis. "You need people who are familiar with industry and domain knowledge to understand what the problems are and what insights are valuable to this particular industry." ”
For example, Penn Dentistry includes a health system and a medical school. For a long time, health systems have been collecting clinical data in data warehouses. At the same time, in medical school, new technology allows the sequencing of the human genome, which requires a large amount of data.
"We know where the value is, and we finally have the computing power to access it," Farrell said, combining data analysis and medical expertise to open up a whole new area of health-care forecasting.
Myth 2: The larger the data, the greater the value
Farrell said that collecting, storing and cataloging data required time and resources, and that the indiscriminate collection of large amounts of data could make more valuable projects and those resources passive.
Farrell advises companies to have a clear understanding of their specific indicators or key performance indicators before they start collecting data.
Myth 3: Large data for large companies
Big companies may have more data sources, but even small companies can take advantage of data from social media platforms, government agencies, and data providers.
"Depending on the size of the organization, decisions based on data are more reliable than simply relying on intuition," he said. "said Darin Bartik, senior director of product management at Dell's software information management Solutions Department.
Smaller companies tend to use data-driven decisions less often than their peers, but if they do, they can fix the strategy more quickly.
Myth 4: Collecting now, finishing later
Storage is getting cheaper, but it's not free. Brad Peters, chief executive of Birst, a cloud-based business intelligence firm based in San Francisco, says that for many companies, data is growing faster than the cost of storage is falling.
Some companies believe that if they just collect data, they will find out how to use the data later, but it is worthless to pay a lot of costs. In fact, some datasets apply the law of diminishing returns. For example, you conduct a public opinion test to predict the outcome of an election. You need a certain number of voters to get a representative sample. But after this number reaches a certain point, adding more people does not significantly affect the margin of error.
And it's not just about storage costs, says Dean Gonsowski, global head of Recommind corporate information governance and big data management, which is headquartered in San Francisco, focusing on unstructured data analysis.
For example, the more data you have, the longer it will be sorted. "When billions of records are in storage, the search takes hours or weeks." "he said.
Myth 5: All data is equal
Virginia State has been collecting data on student enrolment, financial assistance and incentive levels over the past 20 years. However, this does not mean that data collected and stored in the same data field 20 years ago must be the same data.
"The biggest problem I've dealt with is that, just because it's in the data dictionary, researchers think it's a fair game," says Tod Massa, director of Virginia State's higher education policy research and data warehousing. "For example, the act and sat students ' test scores were initially collected only for the state's students, then there was a gap and then the data were collected from state and other state students. Similarly, the data for different races at the K-12 level and in higher education are also different.
In fact, any specific data that is reported by different agencies, or different people, or at different points in time may be different.
As a result, analysts need not just statistical skills, but also familiarity with the local knowledge of the data, and the overall development trends of the industry, such as SAT and ACT scores being recalibrated.
"You can't program all these things into a data warehouse." "Tod Massa said.
The same applies to external data sources, that is, to use any data well, and you do need to understand the culture and context of the data collection.
Myth 6: More specific predictions better
Human nature thinks more specific things are more accurate, such as 3:12 is more accurate than some time in the afternoon.
But the opposite is true. In many cases, more accurate predictions are unlikely to be accurate. For example, a customer buys a specially configured laptop, and the only customer who used to buy the laptop was a pair of pink heels.
"The recommendation of hot pink heels may be specific, but it may be too specific, resulting in high errors," said Jerry Jao, CEO of Retention Science, a marketing firm based in Santa Monica, California State.
So, the things that usually look beautiful may actually not help business and marketing management.
Myth 7: Big Data equals Hadoop
Hadoop, a popular Open-source architecture for unstructured data, has recently received a lot of attention. But companies have other options.
"There's the whole NoSQL movement," said SAP Big Data Manager and senior vice president Irfan Khan. "There are mongodb,cassandra and other complete technologies. ”
Some of these technologies may be more appropriate for a particular large data project. In particular, Hadoop works by dividing the data into multiple blocks in parallel processing. This approach applies to many large data problems, but not all of them.
"While yarn and Hadoop 2 solve some of the problems, sometimes you need to deal with the way that Hadoop is not the ideal choice," said Lucidworks, chief technology Officer Grant Ingersoll, a big data consultancy. "People need to keep a cool head, and decide what is best for you and follow fashion. ”
Myth 8: End users do not need direct access to large data
Large data is often too complex to be handled by specialized staff. But that is not necessarily the case.
For example, all the data generated by the equipment in the ICU. Heart rate, respiration data, ECG readings. Although, many times, doctors and nurses can only see the patient's current readings.
"I don't see the situation 10 minutes ago and can't draw the trend line within the next one hours," said Anthony Jones, chief marketing officer for Health care and clinical information for Philips healthcare patients.
But it is valuable to see the patient's historical data for a doctor to make a decision. "These guys have a core data science team and they're missing a huge opportunity," Jones said.
The problem now is to have the data generated by all the different devices interact, even if they are not designed for this purpose, and use different platforms, operating systems, and programming languages. Once you do this, doctors and nurses can get a useful data form when they need it.
Myth 9: Big questions are used for big data
A chief information officer at a big bank recently published a conversation about big data and was asked about the end-user self-service problem.
"I don't believe it," said the CIO, "Birst chief executive Peters recalls."
This is a common attitude, he said, and some executives think big data answers only certain types of questions. This attitude can be summed up in this way: "Our big data goal is to solve a very small number of high-value issues through the core of the data scientists team." We don't want the data to be confusing and give ordinary people access to this information because we don't think they need it. ”
Peters disagreed, but said it was common in many industries. "This is a rampant myth in big insurance companies, but business users don't have the wisdom to deal with it." ”
Myth 10: Big Data bubbles will burst
The hype cycle may go back and forth, but technology always insists on change. The bursting of the dotcom bubble is not a signal of the end of the Internet.
Even if the hype calms down, the company will still have big data to deal with. In fact, because of the exponential growth, they would have expected more large data processing than ever before-IDC expects that until 2020, the cumulative amount of data collected will increase by one times every two years.
And it's not just what the company is collecting. Instead, new data types may appear and require a lot of storage.
Bryan Hill, chief technical officer of Cadient Group, a Pennsylvania interactive marketing company, said that "big data" was only a phase and that companies might miss the opportunity to capture data elements that might have an impact on their business.
"Big Data" is likely to change, just as the cloud is no different from the previous web, "he said." The word may change, but the spirit of the big data will stay there. ”
(Responsible editor: The good of the Legacy)