The definition of large numbers

In recent years, people are increasingly concerned about "big numbers". This is due to the research report released by the Siena Global Research Institute in 2011. The report suggests that people will come up with an enormous amount of information that is larger than the capacity of the present data system, and suggest that it is possible to produce huge business opportunities by using these information in a slightly more tactical way.

So what are the big numbers? Literally, it refers to the vast amount of information that can be applied to the technology that is now available. And in fact, when we're going to store the usage information for each service together with the user's own information, and when that information is available in full quantities, it's called bigger data. "

Typical is the use of Internet service. It also includes retail sales figures (POS), power usage figures, and sensors that come from accelerometers and wireless mobile controls, among other things, to measure the measured numbers.

Unlike pumping data, large numbers of web respective all of the information used by the service user, it is not based on a single day, one hour, but on a per-minute, per-second, actual situation. From the area of information, it is not in the city or other smaller units to be divided, many times can be accurate to the 6-bit, 7-digit latitude level.

Three characteristics of large numbers

People generally generalize the characteristics of large numbers (Volume), Variety, and high-speed (velocity) three words as "3v". But as a typical large power, accelerometer, and so on, it's not the kind of "multi-nature" that has a search keyword. This is also true from the speed and the sheer amount of these terms, and may not always meet the 3V standard.

From the perspective of utilization, large numbers have the following three characteristics.

One is that the available "degree of depth" is lower than the normal behavior of observing and asking volumes to check the numbers. Even if you have mastered how much information is used and used, it is impossible to know Hyo "in what circumstances, for whatever reason or purpose, to make use of these activities." There is less information about the user's sex (there are no children, career, education, income, etc.). This can be said to be a question of using these figures for market marketing.

The second is to cover all the birth numbers. It is possible to find out the unique type of the numbers that are not seen in the sample, and are associated with the lower end of the frequency.

Thirdly, the actual time of information utilization. Because large numbers can actually read, process, and take advantage of the situation in each of the processes, they are used to capture the transient of a particular line and demand. For example, when someone who is concerned about a particular car is browsing the relevant Web page, you can immediately provide information about the car. This cannot be helped by the traditional marketing of the market.

Large numbers of cases of superiority

Let's take a look at the "value for the customer" market corner, from six steps to observe the wide use of the digital.

The first step is to "see the market structure and needs", which is the basis for developing a variety of commodity development and service warfare; the second step is based on the market judgment, the core value provided by the design, the service, and the third is the market marketing of different customers; and, as mentioned earlier, when demand is happening, it provides push services (advertising, mailing, Web site, etc.); The sixth is to test the effect of the first four steps of the experiment and to predict the future from the numerical value.

Large numbers are especially important in the last four steps. For example, large data is an essential prerequisite for providing services and information that meet individual user characteristics. It is a typical case to record discard information on the Internet based on the user's website, to provide a pair of ad-proof advertising and a coupon that complies with personal consumer spending.

In fact, when you are advertising or entering some of the words of a hot topic in the search box, you will immediately be prompted with a question about the resignation, which is derived from the "mechanical learning" that is generated by the use of the user. Through this mechanical learning to use the special features of the service provider, even if they do not understand the language is not a nuisance.

In addition, the more information you need to obtain, the more you will be left without the help of large numbers. For example, in a particular week, people in the vicinity of a station in the evening to focus on the changes in the meat of chicken in this case.

Prediction analysis of election results, scenery, etc.

In addition, one of the strong expectations of people is to use large numbers to make accurate quantitative predictions, either for the moment or for the very near future. I actually have two interesting predictions in Yahoo. One was to predict the outcome of the July 2013 Senate election.

Finally, most of our predictions are accurate and the accuracy of the predictions exceeds that of all major mediums. In terms of what we found on the Internet--search volume and Twitter, the original amount on Facebook, and so on--are highly correlated with voting methods, and the predictions are made for each election area. It is through the observation of the experts, the call of the people and the results of the CI and so on, and we can only predict by the type and usage of the figures, and get the result of being more accurate than the traditional way of doing.

Another example is the forecast of the economic situation. The aloft of the scene, usually in the time of 1 or 2 months, but what people want to know is not how bad the weather was two months ago, but rather what the situation is now, and we started this prediction. ' " We start by analyzing the terms of the search. Yahoo is going to have about 75 search terms all year round. After a thorough analysis of about 60 specimens, which is often searched, screening selected 200 keys that are closely related to the economic indicator. Based on this, the model is used to push forward the indicator of the foreground, and the prediction results are also successful in its basic accuracy.

As mentioned above, large numbers play a huge role in quantitative predictions, either recently or in the past. In fact, it has already been used in the daily supply chain management, convenience store three times a day to send a number of thousands goods there is a reason why there is no deficit now.

The various problems that the nuisance uses

And then I want to talk about some of the typical problems that might arise when using large numbers. First, almost all enterprises do not have a large number of numbers. This is the problem before the question.

Second, the numbers themselves have not been sufficiently refined to make use of the integrated numbers. For example, retail commodity numbers are often classified by categories such as large, medium, and small, but for this commodity management structure, even within the same retail group, each link is different. ". In this way, the use of integration becomes extremely difficult. How to integrate these numbers to make use of is a major problem.

Even if these two problems are resolved, there is also a need for a system that can actually handle and exploit the big numbers. Most businesses do not have this machine, and there is no basis for storing the constantly-flooded numbers in a memory device. " Even if there is a storage base, there is no use of maintenance personnel.

To understand these issues, the company desperately needs people with CI skills, using information science and engineering technology to find answers to the business questions from a huge number of data.

CI, the lack of numbers, the inability to integrate, the ability to read and use the machines, the lack of a storage base, the shortage of people to use maintenance, and the talent to CI and solve problems-that is, most companies now use large numbers.

Two polarization of security policy

In the argument about large numbers, people often refer to the problem of privacy protection, and because of the mixed-up theory of different arguments, the differences in the angle of views are different.

For security issues, most of the major Internet operators have already taken effective measures. Yahoo, for example, will be able to lock a person's information and the lines of evidence are clearly separated, and the data used are anonymized.

On the other hand, many traditional enterprises do not divide the personal information and the journal documents, but integrate them into the management, and many of these enterprises lack the perfect system of controlling danger, even if it is difficult to find out whether the data has been leaked.

In the use of the figures, the business has emerged two polarization. In order to protect the user, before each enterprise is used, it is necessary to establish a guideline for the management of the digital information.

The three "obstacles" facing Japan

In general, compared with some countries such as the United States, today's current use of the environment there are three major obstacles. The first is to create large numbers of enterprises less. The second is the lack of a foundation for using the digital environment. For example, Japan's consumption is several times that of the United States, which leads to the high cost of setting up and running the center. This is also one of the reasons why the main business of information has not been involved in the Japanese data base design domain. In order to attract the industry, at least you should set some exceptions such as providing special pricing.

The third is the above-mentioned problem of talent shortage. This is a question that requires a combination of three competencies: "The ability to do the science", the "engineering skills", and the "business ability" to comb and solve these problems on the basis of understanding the background of the problem.

Data Science ability refers to the ability to understand and operate information science knowledge such as information management, artificial intelligence, and science. Engineering ability is the ability to use the mathematical sciences in a meaningful way to build and use the actual system. I think that it is not necessarily necessary to ask for more than one person, but the ability to consciously focus on nurturing a growing number of talented people will be critical to the success and failure of large numbers.

