As the Oscar-winning list was released, the excitement of the fans was Microsoft. Their official, http://www.aliyun.com/zixun/aggregation/13180.html, said David of "> Microsoft" Rothschild led the team through the finalists of the film-related data analysis to predict the final ownership of the Oscars this year, "In addition to the best director, all the other Oscars predicted all hit." ”
Microsoft may have been too excited, and the fact is that the David team has made predictions for all 24 awards, 19 of which are correct, 5 wrong, and the best actor, the best actor, best Make-up, best documentary and Best Art direction.
Of course, even so, they have to say that the accuracy of their predictions has been very high. David, on Microsoft's blog, describes how they can produce this result by digging through data and building predictive models:
"The way I predict the Oscars is consistent with my prediction of other things, including politics," he said. "I focus on the most effective data and then create a statistical model that is not disturbed by any particular year's results," says David. All models are tested and calibrated against historical data to ensure that the model can accurately predict the sample results. These models can predict the future, not just the results of past events. ”
"I focus on four different types of data: Polling data, predictive market data and basic data and user-generated data," generated.
For the general election, basic data, such as past election results, incumbents and economic indices, are even more important. Throughout the forecast cycle, a benchmark is established through the underlying data, and the emphasis is shifted to the latter when market data and voting data are more and more informative. I used a small amount of user-generated data to predict the 2012 presidential election, but the Xbox Live data are critical to providing real-time analysis of big events.
But Oscar's predictions lack voting data, and data on box-office returns and film scoring are not statistically effective. So I'm more focused on predicting market data and adding some user-generated data, which helps me understand the relationship between the inside of the film and the different categories, such as how many awards Lincoln will win.
As long as I focus on a new field, I think about the key things to a meaningful prediction:
First, I will determine what is the most relevant prediction. For example, Oscar I will focus on 24 categories of potential winners and think about the total number of awards in a movie.
Second, all predictions will be 8206.html "> Live updates." From a research standpoint, it is critical to understand the value of events that occur between the prediction and the end result. For the Oscars, these events were the result of other awards (such as the Golden Globes);
Finally, I use historical data from this field to build this model, and then keep updating it to ensure the accuracy of the model. What I want to emphasize is that everything we do is aimed at the field of independence to make sure it expands to a lot of problems. If the research can be used to make more efficient forecasting models and apply more fields to solve more problems, it will be of great value to Microsoft, academia and the world. ”
David and his team have opened a predictwise website dedicated to predicting the major events. He said the Oscars were very difficult because it involved 24 categories (usually only 6), and that the results were constantly changing as other awards were awarded.
To solve this problem, David increased the proportion of dynamic data in the overall forecast model.
"Real-time predictions are very important. Because real-time forecasts can provide the latest predictions at any time, the mining of dynamic Data indicates that the entire forecast is being integrated into new information. In addition, it can provide a finer track record of when/Why the change occurred, and which part affected the final result. ”
Take the best Picture award Dynamic Data For example, the winner of the popular movie "Lincoln" quickly slides after winning multiple prizes in "Escaping from Tehran"-only 8% of the chances of escaping Tehran were awarded at the time of the Oscar nomination, but the latter's award quickly increased its winning rate to 93%.
In addition to considering the dynamic changes in time, we should also pay attention to the interaction between the data. David's model points to a strong correlation between the Best Film award and the Best Adapted Screenplay award, so the "Lincoln" and "Escape from Tehran" Trends in these two awards are basically the same, only slightly different. "Lincoln" initially had a 70% chance of winning the best screenplay, but after the possibility of winning the best Film award, the "Escape from Tehran" award rate overtook to 57%.
To better implement Dynamic Data mining, David also worked with Microsoft's Office to publish an Excel App called "Oscars ballot Predictor" To update the forecast in real time.
However, and David is more of a mining forecast market data and basic data is different, the analysis organization Branwatch chose to use social data to build their own prediction model. It identifies the number of actors, directors, and movies that have been mentioned in the social networks, and calculates the number of positive evaluations available to predict their chances of winning. Twitter accounts for about 40% of Brandwatch sampling.
Brandwatch's approach is not new, but the previous analysis has been different in that it distinguishes professionals ' comments from those of the general public and collects only positive evaluations. There are two variables involved, one is the number of mentions, and one is the attitude behind. Brandwatch that this ensures that certain invalid data can be filtered out, for example, that the evaluation of Helen Hunt on the red carpet will not be included in the statistics as the main data.
In addition, as Nate Silver, who used statistics to predict the U.S. presidential election last year, also gave his own predictions and models, we won't go into the details and be interested in clicking on his column in The New York Times.
(Responsible editor: Schpeppen)