Everyone classmates good, I am hadron, long time did not bring you the latest technical articles, recently several students asked me 12306 automatic ticket can be achieved, I took advantage of these two days with Python did a 12306 automatic ticket to rob the project, Here I come to take you together to see how to overcome the evil of the 12306 steps. The end of the article has benefits!!!
We have to do 12306 of the ticket and the official does not provide the corresponding interface (also impossible to provide), then we can only find 12306 of the data packet and the ticket purchase process to simulate the browser behavior to automate the operation, said straightforward is the crawler, the next step into the front, high energy, please fasten the seat belt ~ ~
First of all, we need to confirm the ticket before the ticket, then carry out the normal check, open 12306 tickets Https://kyfw.12306.cn/otn/leftTicket/init Enter the origin and destination to search.
What is the way we can expect to get trips and related information when we see this page? For the 0 basis of the students will think of the first time in the source code to find, but here in fact there is no relevant content in the source code, Because the request is in the JS in the way Ajax asynchronous request is loaded dynamically, not included in the source code, so we can only grasp the package to see the browser and server data interaction, I use the Google Browser so open the Developer tool shortcut key is F12.
Note that the option to select the Red Line box, as long as the browser and server data interaction will be displayed in the following list box, we click the Query button again.
The results found that there are two requests in the list, that is to say we click the Query button after the browser to the server to make two requests, then we come to the return value analysis of the request is the real acquisition of the train related data requests, so that we use Python to simulate the browser operation.
First time Request:
It is obvious that the value returned for the first request does not have the train information we need.
Second Request:
The second request to see a lot of data, although we have not yet seen the train information, but we found that it has a feature, that is, there is a list of values inside there are 6 elements, and just our search from Changsha to Chengdu vehicle is also 6 data, so the two certainly have a certain relationship, Then we'll use Python to get this data before we proceed to the next analysis:
#-*-coding:utf-8-*-importurllib2 importsslssl._create_default_https_context = Ssl._create_unverified_context Defgetlist (): req = urllib2. Request (' Https://kyfw.12306.cn/otn/leftTicket/query?leftTicketDTO.train_date=2017-07-10&leftTicketDTO.from _station=cdw&leftticketdto.to_station=csq&purpose_codes=adult ') req.add_header (' User-Agent ', ' Mozilla/ 5.0 (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/59.0.3071.115 safari/537.36 ') HTML = Urllib2.urlopen (req). Read () Returnhtml printgetlist ()
First define a function to get the train list information:
Gets the url:https://kyfw.12306.cn/otn/leftticket/query?leftticketdto.train_date=2017-07-10& of the request from the packet capture data Leftticketdto.from_station=cdw&leftticketdto.to_station=csq&purpose_codes=adult
To prevent our requests from being detected by 12306, we can simply add headers to simulate browser requests.
Req. add_header (' user-agent ', ' mozilla/5.0 (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/59.0.3071.115 safari/537.36 ')
of which:
Ssl. _create _default_https _context = SSL. _create _unverified _context
Zhengshu5.com |
dajinnylee.cn |
Xc.xyseo.net |
xyseo.net/xuancai/ |
Because 12306 uses the HTTPS protocol, and the SSL certificate itself does not get the endorsement of the browser, so Python by default is not to request untrusted certificate of the site, we can use this line of code to shut down the authentication of the certificate
So let's see if we can get the information we want:
It turns out that we have no problem with the operation, and then we first get the list with 6 data.
The returned data is in JSON format, but there is no JSON type in the Python standard data type, so it is a string for Python, If we want to operate this JSON very conveniently, we can use the JSON package in Python to turn the JSON string into a dict type, and then take the list out and return it with the Dict key value.
#-*-Coding:utf-8-*-importurllib2 importssl importjsonssl._create_default_https_context = ssl._create_unverified_ Contextdef getList (): req = urllib2. Request (' https://kyfw.12306.cn/otn/leftTicket/query?leftTicketDTO.train_date= 2017-07-10&leftticketdto.from _station= cdw&leftticketdto.to_station= csq&purpose_codes=adult ') req.add_header (' user-agent ', ' MOZILLA/5. 0 (WindowsNT10.0; Win64; x64) applewebkit/537. (khtml, like Gecko) chrome/59. 0.3071. 115safari/537. + ') HTML = Urllib2.urlopen (req). Read () Dict = json.loads (html) result= dict[' data ' [' Result '] Returnresult
The final return is a list data, we first put this data for out and then see what each piece of data have something:
Foriingetlist (): Print I
We'll look at what the first piece of data looks like when we get out:
| Booking | 76000g131805| g1318| icw| izq| icw| cwq| 07:54| 18:54| 11:00| n| uhesfcaidex22z0zwfqttduzxjfuwpdia148i6tnk5spiqfp| 20170710| 3| w2| 21> 16| 0|0| | | | | | | | | | | none | none | No | | o0m090| OM9
In fact, we would like to stay a little bit will find that contains g1318,07:54,18:54, no such train information, but it seems to be messy, but they all have a feature, each data is by the | This symbol is separated, so we can use the segmentation to see what can be found?
Foriingetlist (): Forn INI. Split (' | '): print n break
You can see all the values are printed out, we can then add a sequence number to be clear to see what the value of each ordinal is, for example, there is a train hard seat 3 tickets left, soft sleeper and 8 tickets left, Then we can see which ordinal corresponds to a value of 3 which is the corresponding value of the number is 8 to figure out which ordinal is what the seating or other parameters.
c = 0fori ingetlist (): Forn ini.split (' | '): print ' [%s]%s '% (c,n) c + = 1c = 0break# Index 3 = n° # index 8 = Departure Time # Index 9 = arrival time
To here do not know the students have found a problem, is that I use this function can only get to the data from Changsha to Chengdu, and others do not necessarily buy this direction of the train, so we have to figure out the URL of the request of the departure station and the arrival station value is how to come.
https://kyfw.12306.cn/otn/leftticket/query?leftticketdto.train_date= 2017-07-10&leftticketdto.from_station= cdw&leftticketdto.to_station= Csq&purpose_codes=adult
The parameters to find the departure and arrival stations are:
leftticketdto.from_station= CDWLEFTTICKETDTO.TO_STATION=CSQ
However, through the search and analysis I did not find that the two parameters are regular, then that is to say that the two values are in the previous request has been obtained, by checking the Web page source code is not found, then can only be caught by the way to find the package.
In the process of grasping the package found a package return value is accompanied by the city code, the URL is as follows:
https://kyfw.12306.cn/otn/resources/js/framework/station_name.js?station_version=1.9018
So, we'll copy the city data out of this, create a new cons.py file and save it.
Then we can directly in this data to match the corresponding city code by inputting the parameters into the departure city and arriving city, the codes are as follows:
Station = {} Fori incons.station_names. Split (' @ '): ifi:tmp = I. Split (' | ') station[tmp[1]] = tmp[2] #print stationtrain_date = raw_input (' Please enter departure time ') From_sta tion = station[raw_input (' Please enter the Departure city ')]to_station = station[raw_input (' Please enter the City ')]
By entering the time, the city can get the corresponding train information.
Then we will carry out some simple judgment, we can achieve check the corresponding time, location, whether the train is more than the ticket.
At the same time, the combination of login, purchase tickets and other processes, through the automatic judgment whether there is a ticket, if no ticket will continue to refresh, until there is a ticket after the automatic login after the order by SMS or telephone, such as automatic contact with the purchase of the ticket person mobile phone can be, such as:
Use Python to break 12306 's last line of defense