You must have heard a lot about "big data", but you didn't understand what was going on. Even though you feel the "certain" conveniences that it brings:
For example, a week ago you were on a social networking site to do interview tests, and when you finish the interview, the real intention of the test is to match how well you fit into your team's personality. Another example, in 11 Golden week before you received a ticket application push information, which prompts you to 16 days in advance by telephone booking train tickets, than on the same day on the Internet to buy tickets high probability of success. When you really hit the past, found that the telephone booking system is busy, and had to wait 2 hours before calling, before booking success. But it's better than being stuck on 12306 sites at the same time and not paying.
Big data is seeping into all walks of life, and it can even be closely linked to the very living scenarios of your test ability tests and the chances of developing a disease. Future large data in our lives like water and electricity, so that the whole social information quality better, so that information use efficiency more efficient.
In this system, the data sampling and the later analysis still need to be done by human force.
Crowdsourcing makes data sampling more automated
"Human intervention will become increasingly unnecessary in the future, at least in the front-end data collection." "James, product manager, talked to Tencent Technology, now a lot of data collection from the user's interactive behavior, such as search, micro-blog interaction, such as" like "," praise "," throw away the trash "in the application of small and medium design, as long as the user actively completed, in the background can be extrapolated data quality.
Rising onion prices have determined the trend of India's inflation rate, with a start-up company called Premise, which uploads the retail prices of different onions in real time through more than 700 users who have installed their own development applications every day.
David Soloff, co-founder of the company, Slove that this is an effective channel for real-time perceptions of global financial dynamics, as local stores generally adjust the prices of goods in a timely fashion based on changes in the economic environment, including wholesale prices and consumer confidence.
"Premise's analytical approach has shown that inflation indicators are forecast for 4-6 weeks ahead of time in part of the economic environment, based on the data collected." You don't have to wait for the monthly ' economic weather forecast '. Slove stressed.
And for retail stores, the display of brand on the shelf directly determines the sales, how to let the brand in the mobile customers have been occupying a better display position, so that this work requires both time-consuming and very trivial.
To this end, a company called Quri, through the development of a easyshift application, so that users paid to contribute time to energy to complete the collection of data. As long as the user receives the application issued the task, in the designated place to shoot the designated place the photograph, uploads to the Quri server, then may receive the corresponding meager remuneration.
Easyshift's philosophy is easy to understand: Most users now carry smartphones. Brand makers want to know how their products are displayed in large retail outlets, evaluate competitors ' dynamics, report product and pricing information about broken shipments, monitor promotions and product launches. Easyshift pay consumers to collect this information when they shop.
In Japan earthquake, accidentally used a car brand navigation real-time visual data, through the "Green Life Channel" project "Connection Lifeline".
The project leader, Mr. Kan, is the senior director of the Creative Design Center in Japan, where he accepted a cooperation project for a car brand before the great earthquake. The project is for which car to travel on a certain road, what time to travel, in which latitude and longitude, to how fast the direction of travel, and so on, every minute about 100,000 of dynamic data will be recorded in a vehicle navigation database, Naoto wild smoked the data into a program, and the form of a map of Japan to show out.
In the event of an earthquake in Japan, these navigational data can be temporarily useful.
"When the earthquake, communication signals are not very smooth, people can only confirm the safety of relatives and friends through the network, we are facing the challenge is how to send rescue teams to the disaster area." "said Mr Kan.
The navigation data were used to collect traffic data in the event of congestion. "On the other hand, the data on the traffic show that the road is accessible." "He said, after the earthquake, once the vehicle is moving, use green to mark, to form a path."
At the same time, the team also on Twitter on the real-time organization of users to publish the current all over the country road and road signs information, combined with two types of information, the Green Life channel data in the earthquake occurred 20 hours after the release of the online public download. In addition to the Web site, programmers quickly developed the mobile end. At the time of the crisis, the intensity of information diffusion is very fast, soon in the Web site and mobile applications, a number of green lines are presented, for the rescue team quickly arrived to provide a reference.
Human intervention is still necessary in large data age
Machine learning does dominate the big data, but does it really need to be artificially intrusive? For example, you have become accustomed to the spread of internet marketing around, but do you really agree to rely on simple mathematical models and scale data analysis of marketing recommendations?
ZestFinance is a platform for customer quality analysis using machine learning to increase data analysis for the payday loan industry (payday loans, similar to usury-like short-term interest-rate borrowings).
Unlike traditional methods of analysis, ZestFinance can simultaneously operate multiple models to analyze large amounts of data to determine probabilities, plus a growing number of sources and types of data, which are then translated into tens of thousands of metrics that can be measured against borrower behavior, such as fraud probability, Long and short term credit risk and his solvency. Finally, the results of the models are integrated into the final results. This platform can provide users with the most reliable results in a matter of seconds. "We tend to combine machine learning with human intervention," said founder Merrill. ”
In the medical field, for example, data analysis based on machine learning is far from enough. "Because machine learning can calculate a certain percentage of probability, but not to achieve accurate, accurate." "Spring Rain Palm Doctor CTO Zember to Tencent Technology for example, for example, the design of a disease model is through the transfer of all the similarities in the existing database of more than 90% questions, the results of the analysis and summary, production of disease probability model, and each problem doctor's advice, summed up the" no serious "and" go to hospital "ratio, Provide intuitive data reference for patients.
"But this is also to take a certain proportion of the probability, is used for user self-examination." Whether it can be accurate to the patient really meet this condition, or need artificial analysis (doctor diagnosis), we are in the background of the data analyst to be again to check, the accuracy of screening data. said the person.