Microsoft predicts why Oscar is so accurate?

Source: Internet
Author: User
Keywords Microsoft data Mining Oscar MSR

Oscar is an international film Awards ceremony, is also the most wait for the public opinion of the event, when the award-winning list was excited not only fans, the Microsoft should also be in the excitement. Their officials said that Microsoft Research's "David Rothschild team", through the data analysis of the shortlisted films, predicted which of the Oscars will be spent this year, "except for the best director, all the other Oscars are predicted to hit." The

There is no such god as Microsoft says, the actual fact is that the David team has predicted all 24 awards, 19 of which are correct, 5 errors, and the best actor, including Best Supporting actor, best Make-up, best documentary and Best Art direction.

Of course, even so, they have to say that the accuracy of their predictions has been very high. David, on Microsoft's blog, describes how they can produce this result by digging through data and building predictive models:

"The way I predict the Oscars is consistent with my prediction of other things, including politics," he said. "I focus on the most effective data and then create a statistical model that is not disturbed by any particular year's results," says David. All models are tested and calibrated against historical data to ensure that the model can accurately predict the sample results. These models can predict the future, not just the results of past events. ”

"I focus on four different types of data: Polling data, predictive market data and basic data and user-generated data," generated.

For the general election, basic data, such as past election results, incumbents and economic indices, are even more important. Throughout the forecast cycle, a benchmark is established through the underlying data, and the emphasis is shifted to the latter when market data and voting data are more and more informative. I used a small amount of user-generated data to predict the 2012 presidential election, but the Xbox Live data are critical to providing real-time analysis of big events.

But Oscar's predictions lack voting data, and data on box-office returns and film scoring are not statistically effective. So I'm more focused on predicting market data and adding some user-generated data, which helps me understand the relationship between the inside of the film and the different categories, such as how many awards Lincoln will win.

As long as I focus on a new field, I think about the key things to a meaningful prediction:

First, I will determine what is the most relevant prediction. For example, Oscar I will focus on 24 categories of potential winners and think about the total number of awards in a movie.

Second, all predictions are updated in real time. From a research standpoint, it is critical to understand the value of events that occur between the prediction and the end result. For the Oscars, these events were the result of other awards (such as the Golden Globes);

Finally, I use historical data from this field to build this model, and then keep updating it to ensure the accuracy of the model. What I want to emphasize is that everything we do is aimed at the field of independence to make sure it expands to a lot of problems. If the research can be used to make more efficient forecasting models and apply more fields to solve more problems, it will be of great value to Microsoft, academia and the world. ”

David and his team have opened a predictwise website dedicated to predicting the major events. He said the Oscars were very difficult because it involved 24 categories (usually only 6), and that the results were constantly changing as other awards were awarded.

To solve this problem, David increased the proportion of dynamic data in the overall forecast model.

"Real-time predictions are very important. Because real-time forecasts can provide the latest predictions at any time, the mining of dynamic Data indicates that the entire forecast is being integrated into new information. In addition, it can provide a finer track record of when/Why the change occurred, and which part affected the final result. ”

Take the best Picture award Dynamic Data For example, the winner of the popular movie "Lincoln" quickly slides after winning multiple prizes in "Escaping from Tehran"-only 8% of the chances of escaping Tehran were awarded at the time of the Oscar nomination, but the latter's award quickly increased its winning rate to 93%.

In addition to considering the dynamic changes in time, we should also pay attention to the interaction between the data. David's model points to a strong correlation between the Best Film award and the Best Adapted Screenplay award, so the "Lincoln" and "Escape from Tehran" Trends in these two awards are basically the same, only slightly different. "Lincoln" initially had a 70% chance of winning the best screenplay, but after the possibility of winning the best Film award, the "Escape from Tehran" award rate overtook to 57%.

To better implement Dynamic Data mining, David also worked with Microsoft's Office to publish an Excel App called "Oscars ballot Predictor" To update the forecast in real time.

However, and David is more of a mining forecast market data and basic data is different, the analysis organization Branwatch chose to use social data to build their own prediction model. It identifies the number of actors, directors, and movies that have been mentioned in the social networks, and calculates the number of positive evaluations available to predict their chances of winning. Twitter accounts for about 40% of Brandwatch sampling.

Brandwatch's approach is not new, but the previous analysis has been different in that it distinguishes professionals ' comments from those of the general public and collects only positive evaluations. There are two variables involved, one is the number of mentions, and one is the attitude behind. Brandwatch that this ensures that certain invalid data can be filtered out, for example, that the evaluation of Helen Hunt on the red carpet will not be included in the statistics as the main data.

In addition, as Nate Silver, who used statistics to predict the U.S. presidential election last year, also gave his own predictions and models, we won't go into the details and be interested in clicking on his column in The New York Times.

"Edit Recommendation"

Large Data Mining cloud services Microsoft's cloud services portfolio in the competitive market "a ride of Dust" Microsoft's 24-year data Center path SQL Server Data Mining rule Implementation Product Recommendation 1 "executive Editor: Xiao Yun TEL: (010) 68476606"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.