Three myths about big data as the industry's interest in big data grows, one of my favorite topics I've done in 2013 was the big data public speaking more than any previous year in my career. I've made a lot of speeches at industry conferences, events, universities, and within EMC. Over and over again in these speeches came a lot of comments, questions and misconceptions about big data. I believe it would be useful to share what I have heard with you. Here are three big myths about big data:
1. The most important thing is about the size of the big data itself
Big data is mainly about the size of the data, because big data is big, right? Actually, not exactly. Gary King of the Harvard School of Quantitative Social Sciences said. Of course, today's data processing is far more than going (this means "3Vs" of volume-volume, variability and speed), but if people focus only on GB, TB or PB, they will only see big data as a matter of storage and technology. Although this is also absolutely important, the more prominent aspects of big data are usually the other two V: variability (Variety) and velocity (velocity). Speed refers to data flow and very fast data, data accumulation or low latency when entering the data warehouse, so that people can make decisions more quickly (or even automatically). Data flow is really a big problem, but for me, the variability is the most interesting of the 3V.
The icons shown above are the source of big data generation. In fact, this illustrates a philosophical problem-not just big data changes, but more, the definition of the data itself has changed. In other words, most people think of data as rows of data, such as Excel tables, RDBMS databases, or data warehouses that store terabytes of structured data. There is no mistake, big data is about semi-structured data and unstructured data. Big Data contains all the other things that people don't think are data, such as RFID chips, geo-spatial sensors for smartphones, images, video files, Clickstream, speech recognition data, and metadata about these data. Of course, we need to find an effective way to
Storing a lot of data, however, I found that when people began to crawl the variability and speed of data, they started looking for more innovative ways to use the data.
2. Are you sure you want the eggs to touch the stone?
"All right, but why do I need new tools? Can't I use the original software tools to analyze big data?" We're talking about using Hadoop to arrange hundreds of thousands of unstructured data inputs. In the discussion, a listener asked why he could not simply use SPSS to analyze a large number of text corpora. In fact, once you understand the content in # #, you will realize that you need a new tool that can understand, store and analyze different data inputs (images, clickstream, video, voice, metadata, XML, etc.), and can process them in parallel. This is why in-memory desktop tools are sufficient to handle local in-memory analytics (Spss,r,weka, etc.) but cannot handle large numbers of large data sources. So we need new technologies to manage these disparate data sources and manage them in a parallel principle.
3. Incomplete data quality represents big data meaningless
"Yes, so big data, what about the quality of the data?" Does it mean a bigger "useless GIGO"? Big data can also be messy, and data quality is important for any analysis. However, the key is to keep in mind that data will inevitably be confusing. That is, there will be a lot of clutter, anomalies, and inconsistencies. It is important to focus on the number and type of data, and whether they can be pruned and used for valuable analysis. In other words, find some kind of signal in these chaos. In some cases, organizations may want to resolve and clean up a large number of data sources, and in other cases, these may not be important. Google trend analysis can be considered.
Google trend analysis shows people the hottest things in Big data analytics search, like the photos shown in the most things Google searches for the whole 2013. This requires a lot of storage space, processing power, and powerful analysis techniques to filter and rank from the search. This is a good example of using big data and ignoring Gigo. From this point of view, many people will say, "Oh! That sounds like a big change." Yes! As one of my colleagues has said, you can make a distinction between the name of a big data or the meaning of a verb. That is to say, as a noun, big data is only considered as "very much" that needs to be stored and placed. As a verb, visualization of Big data means action. The people of this camp view big data as destructive forces, the power to change the way they operate. Use big data to test ideas creatively and solve business problems in an analytical way, such as A/B testing-Please refer to google test 50 Shades of blue to find the Gmail users who are most willing to click, rather than just the marketing manager's guess. Or try to measure things that can't be measured, such as companies and universities looking for better ways to automate image categorization. Explore new ideas in new ways-with data to answer "if" questions. In this contest, the organization that regards big data as a verb will be the biggest winner!
"For more information on business intelligence, business intelligence solutions and business intelligence software downloads, visit Finebi Business Intelligence official website www.finebi.com"
How big is big data? Three major myths about big data