The World Cup is coming! Look at my big python analysis wave! The top four will be a country!

Last Update:2018-06-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

No doubt, who will hold the World Cup is our most concerned about the problem, small as a "senior" fans naturally have to play their own expertise, using Python to simulate the 2018 World Cup, first to quench our thirst.

Objective

The World Cup is about to play, everything is unknown, but the whole schedule has been set, we can fully follow the schedule to simulate all 64 games score 10,000 times, the A~h group of the respective outgoing situation, the probability of each team into the four, and the final probability of winning the championship.

Data acquisition

Find the resources and sites you want to collect and get ready to start collecting data. This collection takes the Scout network as an example:

First find the 32 countries of the respective website links, and then enter 32 links, respectively, to collect their own game record data;

Analyze the website, construct the general idea, create the network crawler implementation collection. Since the site is static site, so it is easy to collect the site, in the process of acquisition, we first find, each country team link, establish the National team link and national team name, and then carry on the collection of the given national team page of all historical game data.

When looking for a country link, please pay attention to the accuracy of the link, scouting network each team has a separate link, such as Brazil's ID is 778, the link address is: http://zq.win007.com/cn/team/CTeamSche/778.html, If you are not comfortable with the link, you can first copy the link to the browser to see if you can find the page.

The following is a detailed code for data acquisition:

 1 from __future__ importprint_function, Division 2 3 from selenium import webdriver 4 5 Import pandas as PD 6 7 class Spider (object): 8 9 def __init__ (self): ten self.driver = Webdriver. Chrome () self.driver.implicitly_wait () self.verificationerrors = [] Self.accept_next_alert = True 1 8 def get_all_team_data (self): 20 21 # Get all 32 team IDs (form the team URL) from the World Cup homepage Python Learning Exchange Group: 125240963, a daily share of dry goods in the group,  Includes the latest Python Enterprise case study materials and 0 basic introductory tutorials, welcome to the group of small partners to learn Exchange Self.get_team_ids () 24 25 # Cycle through each team's match data = [] [team_id, Team_name] Inenumerate (self.team_list): Print (I, team_id, team_name) DF =self.get_team_data (TEAM_   ID, Team_name) data.append (DF) PNS output = pd.concat (data) Output.reset_index (Drop=true,inplace=true) 40 Output.to_csv (' Data_2018worldcup.csv ', Index=false, encoding= ' Utf-8 '), Self.driver.close () + def get_team_i DS (self): Main_url = ' http://zq.win007.com/cn/CupMatch/75.html ' Self.driver.get (Main_url) Teams=self.driver.find_elements_by_xpath ("//td[@style = ' padding:0px; border:0px; Font-style:italic; Font-variant:inherit; Font-weight:inherit; Font-stretch:inherit; Font-size:inherit; Line-height:inherit; Font-family:inherit; Vertical-align:baseline; Word-break:break-word; COLOR:RGB (64, 128, 128); " > #fff; text-align:left; '] ") data = [] teams:56 for Team in team_id= (Team.find_element_by_xpath (".//a"). Get_attribute (' HR EF '). Split ('/') [ -1].split ('. ') [0]) Team_name =team.find_element_by_xpath (".//a"). Text: Print (team_id, team_name), Data.append ([team_i D,team_name]) self.team_list = Data #self. Team_list =PD. DataFrame (data, columns=[' team_name ', ' team_id ']) #self. Team_list.to_excel (' National Team id.xlsx ', Index=false) T_team_data (self, team_id,team_name): 74 75 "" to get a match data for a national team. TODO: No paging Python learning Exchange Group: 125240963, the group daily share of dry goods, including the latest Python enterprise case study materials and 0 basic introductory tutorials, welcome to all the small partners into the group Learning Exchange "" "http://zq.win007.com/cn/team/cteamsche/%d.html '%team_id self.driver.get (URL)----Bayi Table=self.driver.find_element_by_xpath ("/ /div[@id = ' Tech_schedule ' [email protected]= ' data '] "matches =table.find_elements_by_xpath (".//tr ") in the 84 85 Print (len (matches)) 86 87 # Grab the match data and save it as dataframe.-[] for I, Match inenumerate (matches): If i = = 0:94 headers =match.find_elements_by_xpath (".//th"): H1, H2, H3, H4, H5 =headers[0].text, Headers[1].text, he Aders[2].text, headers[3].text,headers[4].text 98 print (H1, H2, H3, H4, H5) 101 continue102 103 try:104 Info =m Atch.find_elements_by_xpath (".//td") 106 107 Cup =str (Info[0].text.encode (' Utf-8 ')) 108 109 match_time =str (info[1]. Text.encode (' Utf-8 ')) 111 Home_team =str (Info[2].text.encode (' Utf-8 ')) 113 FTS = info[3].text114 #print ('-', Cup, '-') 117 Fs_a,fs_b=int (Fts.split ('-') [0]), int (fts.split ('-') [1]) 118 119 Away_team = str (Info[4].text.encode (' Utf-8 ') 121 print (Cup, Match_time,home_team, Away_team, Fs_a, fs_b) 122 123 data.append ([Cup,match_time, Home_team, Away_team, Fs_a, Fs_b, Team_name]) 124-Except:12 6 127 break128 129 df = PD. DataFrame (data, columns=[' tournament ', ' time ', ' home ', ' away ', ' home team goal ', ' away goal ', ' National Name ') ' 131 return df132 133 If __name__ = ' __main__ ": 134 135 spider = Spider () 136 137 # First step: Catch the ID of the 2018 World Cup team. The second part: The game data of each detachment is recycled. 138 139 Spider.get_all_team_data ()

Enter the group: 125240963 can get the source code!

The World Cup is coming! Look at my big python analysis wave! The top four will be a country!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The World Cup is coming! Look at my big python analysis wave! The top four will be a country!

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The World Cup is coming! Look at my big python analysis wave! The top four will be a country!

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support