Some things are in the limelight, it is not said by a certain industry, a certain category of people, but the human race. June is a big thing, that is the "World Cup"!
Some love the football partners must be concerned about the course of the game, the results of their favorite football players can be in the game, to recruit the Wind, the exhibition. However, we all know that in addition, there are some aboveboard football spinach exist, for some veteran players, even said to me 100 yuan, I can buy a house ... (three paragraphs at the end of the text, must look!) )
Do you think you are a python programmer? You have the scientific basis to prove who can win the game, the outcome of the match is a few? You can only rely on experience, a little stronger than the average person. Just like I heard a joke: "The World Cup theory has turned 1000, if you have three yuan, you buy the World Cup, then you have three dollars will be gone!" ”
With data to be trusted, Python AI predicts a wave of "World Cup" results.
Principle:
Step one: Deep analysis of two datasets
Step Two: Do feature engineering to select the most relevant features for prediction
Step three: Data processing, selecting a machine learning model
Fourth step: Deploy it to the data set
Predicted race record:
Portugal:
Cristiano Ronaldo, a god of war, exists. Speed, awareness, skills and other aspects are the top of the existence, more worthy of respect is the age of such a large but still maintain a non-yielding heart. The other members of the Portuguese team are also very strong, for the ball machine grasp, for the cooperation is also the top, after all, you are the first European Cup champions!
Spain:
After a failure, even if many good players have gone, they can also find the reasons for failure, difficult to reorganize the team, so that they are also a bad existence, spirit and willpower indestructible. The young team members appear the rise of the trend, become "the King of Qualifiers", who and fight?
Load Data Set
Depth analysis
Deep analysis and feature engineering: the most time-consuming is to analyze which features are related to machine learning models. Add the target difference and the result column to the result data set, the detailed Python source code is as follows:
Part of the process includes data only from Nigeria participating in the contest. Helps us to know the characteristics of those national teams and then expand to other countries in the World Cup. The specific code is as follows:
Create a list of the years and select all the matches since the World Cup was founded in 1930, Python code is as follows:
And all competing teams merge into one data frame, code:
Filter the result data frame to show the squad that was only in the 2018 World Cup since 1930, removing useless elements.
Python implementation code:
Logistic regression algorithm: Estimating probability, measuring the relationship between the dependent variable and one or more independent variables, is the cumulative logistics distribution.
Build a Learning Model: Learn how to use the form of each data to produce positive, negative effects on game results. Give it true and accurate data, fill it in, and you have a model that predicts the outcome later.
Pass the final data frame to the logistic regression algorithm, Python implementation code:
The accuracy of this model on the training set is only about 60% accuracy, which certainly does not meet our requirements, we want to further train the model, create a data frame, deploy our model.
Deploy the model to a dataset and compare it from the deployment of the model to the start of the team competition. The code and results are as follows:
"Results of the group's predictions"
Time 2:00
Group: Spain, Portugal
Predicted victory group: Spain
Time 18:00
Group: France, Australia
Predicted victory group: France
Time 21:00
Group: Iceland, Argentina
Predicted victory group: Argentina
The tournament time is as follows:
In fact, linear regression, we learned in high school math textbooks, remember? In the coordinates, a dense number of points, so you go to find their most relevant expression? All points, whichever line is closest, let you list the expression, so the Python algorithm is not as difficult as you think, the key lies in your association, as well as the knowledge you used to reserve.
Since it is closest to that line, it can only be said that the probability of predicting the correct outcome will be greater. So, you all know, what's the prediction? The result is still uncertain!
Using the Python Calculus World Cup results, the programmer was angry at the table and regretted the late!