Faster than 12306! Use Python to write a train ticket viewer ~, Python 12306
Author: Wayne Shi
Link: https://zhuanlan.zhihu.com/p/22235740
When you are going out to play and want to query the train ticket information, are you still using the 12306 official website? Next we will use Python to write a line-of-command line train ticket viewer. You only need to type a line-of-command on the command line to get the train ticket information you want!
1. Experiment Introduction
1.1 knowledge points
Comprehensive application of Python3 basic knowledge
Use of docopt, requests, and prettytable Databases
Use of setuptools
1.2 Results
Ii. Interface Design
Let's give a name to this small application first. Now that we can query the ticket information, we can call it tickets. We hope that users can get the desired information as long as they enter the origin site, arrival site, and date. Therefore, tickets should be used as follows:
$ Tickets from to date
In addition, there are various types of trains: high-speed trains, bullet trains, express trains, fast trains, and direct trains. We hope to provide options to query only one or more specific trains, we should have the following options:
-G high-speed trains
-D EMU
-T express
-K fast
-Z direct
These options should be used in combination. Therefore, our interface should look like this:
$ Tickets [-gdtkz] from to date
The interface has been fixed, and the rest is to implement it.
Iii. Code Implementation
A good practice for writing Python programs is to use virtualenv to create a virtual environment. Our program is developed using Python3. Here we will create a folder named tickets in your working directory, create a virtual environment, and activate it:
$ Virtualenv-p/usr/bin/python3 venv $. venv/bin/activate
Install the library required for the experiment:
$ Sodo pip install requests prettytable docopt
You don't need to talk about requests. Use Python to access the necessary libraries of HTTP resources.
Docopt, Python3 Command Line Parameter Parsing tool.
Prettytable, a formatting tool that allows you to print data like MySQL.
3.1 resolution Parameters
Python has many Command Line Parameter Parsing tools, such as argparse, docopt, and click. Here we use docopt, a simple and easy-to-use tool.
Docopt can parse parameters according to the format defined in the document string. For example, in tickets. py:
# Coding: UTF-8
"Train tickets query via command-line.
Usage:
Tickets [-gdtkz] <from> <to> <date>
Options:
-H, -- help: display the help menu
-G high-speed trains
-D EMU
-T express
-K fast
-Z direct
Example:
Tickets Shanghai Beijing
""
From docopt import docopt
Def cli ():
"Command-line interface """
Arguments = docopt (_ doc __)
Print (arguments)
If _ name _ = '_ main __':
Cli ()
Run the following program:
$ Python3 tickets. py Shanghai Beijing
We get the following result:
{'-D': False,'-G': False, '-k': False,'-t': False, '-Z': False, '<date>': '2017-12-05 ',' <from> ': 'shanghai',' <to> ': 'beijing '}
3.2 obtain data
The parameters have been parsed. The following describes how to obtain the data, which is also the most important part. First, open 12306 to go to the ticket remaining query page. If you use Chrome, press F12 to open the developer tool, select the Network column, and enter Shanghai to Beijing in the query box, on December 5, click query. We found in the debugging tool that the query system actually requested this URL:
Https://kyfw.12306.cn/otn/lcxxcx/query? Purpose_codes = ADULT & queryDate = 2017-12-05 & from_station = SHH & to_station = BJP
The returned data is in JSON format!
Next, the problem is simple. We only need to construct the request URL and parse the returned Json data. However, we found that from_station and to_station in the URL are not Chinese characters or pinyin, but a code. What do we want to input is Chinese characters or pinyin? How do we get the code? Open the web page source code to see if there is any discovery.
Sure enough, we found this link in the web page: https://kyfw.12306.cn/otn/resources/js/framework/station_name.js? Station_version = 1.8955 it may contain Chinese names, Pinyin, abbreviations, codes, and other information of all stations. But this information is crowded together, and we only want the station Pinyin and uppercase letters of the code information, what should we do?
The regular expression is the answer. Let's write a small script to match and extract the desired information, in parse_station.py:
# Coding: utf-8import reimport requestsfrom pprint import pprinturl = 'https: // kyf11212306.cn/otn/resources/js/framework/station_name.js? Station_version = 1.8955 'text = requests. get (url, verify = False) stations = re. findall (R' ([A-Z] +) \ | ([a-z] +) ', text) stations = dict (stations) stations = dict (zip (stations. values (), stations. keys () pprint (stations, indent = 4)
Note: After the matching result of the above regular expression is converted into a dictionary, the dictionary key is uppercase letters and large numbers. This is obviously not the expected result. Therefore, we use a transform to reverse the key value.
We run this script and it will return all stations and their uppercase letters and codes in a dictionary. We will redirect the results to stations. py,
$ Python3 parse_station.py> stations. py
We add the name stations to the dictionary and enter the Chinese name of the station. Then we can get its letter code from the dictionary:
...
From stations import stations
Def cli ():
Arguments = docopt (_ doc __)
From_staion = stations. get (arguments ['<from>'])
To_station = stations. get (arguments ['<to>'])
Date = arguments ['<date>']
# Construct a URL
Url = 'https: // kyf201712306.cn/otn/lcxxcx/query? Purpose_codes = ADULT & queryDate ={}& from_station ={}& to_station ={} '. format (
Date, from_staion, to_station
)
Everything is ready. Let's request this URL for data! Here we use the requests library, which provides very easy-to-use interfaces,
...
Import requests
Def cli ():
...
# Add the verify = False parameter to not verify the certificate
R = requests. get (url, verify = False)
Print (r. json ())
From the results, we can see that the information related to the ticket needs to be further extracted: def cli ():
...
R = requsets. get (url );
Rows = r. json () ['data'] ['datas']
3.3 parse data
We encapsulate a simple class to parse data:
From prettytable import PrettyTableclass TrainCollection (object ): # display the number of trains, departure/arrival stations, departure/arrival time, duration, first-class sit, second-class sit, soft sleeper, hard sleeper, hard seat header = 'train station time duration first second softsleep hardsleep hardsit'. split () def _ init _ (self, rows): self. rows = rows def _ get_duration (self. row): "Get train run time" duration = row. get ('lishi '). replace (':', 'H') + 'M' if duration. startswith ('00'): return duration [4:] if duration. startswith ('0'): return duration [1:] return duration
@ Property def trains (self): for row in self. rows: train = [# train Number row ['Station _ train_code '], # departure and arrival station' \ n '. join ([row ['from _ staion_name '], row ['to _ station_name']), # Start Time, arrival time '\ n '. join ([row ['start _ time'], row ['arrive ']), # time self. _ get_duration (row), # first-class sit row ['zy _ num'], # second-class sit row ['ze _ num'], # Soft Sleeper row ['rw _ num'], # soft seat row ['yw _ num'], # hard seat row ['yz _ num'] yield train
Def pretty_print (self): "the data has been obtained. The rest is to extract the information we want and display it. The 'prettytable' library allows us to format and display data like the MySQL database. "Pt = PrettyTable () # Set the title pt for each column. _ set_field_names (self. header) for train in self. trains: pt. add_row (train) print (pt)
3.4 Display Results
Finally, we will summarize the above process and output the result to the screen:
... Class TrainCollection :...... def cli (): arguments = docopt (_ doc _) from_staion = stations. get (arguments ['<from>']) to_station = stations. get (arguments ['<to>']) date = arguments ['<date>'] # Build URL url = 'https: // kyf201712306.cn/otn/lcxxcx/query? Purpose_codes = ADULT & queryDate ={}& from_station ={}& to_station = {}'. format (date, from_staion, to_station) r = requests. get (url, verify = False) rows = r. json () ['data'] ['datas'] trains = TrainCollection (rows) trains. pretty_print () if _ name _ = '_ main _': cli ()
The above is the experiment today. You can try it yourself ~
Click "read full text" below to learn Python now