Python crawler-uses Baidu map API to crawl data and save it to the MySQL database, pythonmysql

Source: Internet
Author: User

Python crawler-uses Baidu map API to crawl data and save it to the MySQL database, pythonmysql

First, I have a txt file about the city and the number of parks in the city:

Second, use the interfaces provided by Baidu map API to crawl information about city parks.
Two APIs are used:

1, http://api.map.baidu.com/place/v2/search? Q = Park & region = Beijing & output = json & ak = user access key 2, http://api.map.baidu.com/place/v2/detail? Uid = xxxxx & output = json & scope = 2 & ak = user's access key. The first API interface can obtain the general information of the city park. The second API interface can obtain the detailed information of the city park. parameter description: q: keyword region: region to be retrieved (city level or above) page_size: number of records per page page_num: page number output: output format json/xmaks: user access key, you can apply on the Baidu map API Platform
1. Try the first API to get data and store it in the MySQL database

The following is the result returned when you access the first API:

Because our final results are stored in the MySQL database, for ease of operation, I directly used the graphic management tool MySQL-Front to create a database: baidumap, create two tables in the Table. Table 1city is used to store the results of the first API, and table 2park is used to store the results of the second API. Table 1 has the following structure:

Next, write the code request data and store the results in the city table:

Import requestsimport jsonimport MySQLdbfrom datetime import datetime # obtain the relevant city from the txt file and generate a new list city_list = [] with open('cities.txt ', 'R', encoding = 'utf-8 ') as f: for eachline in f: if eachline! = ''And eachline! = '\ N': city = eachline. split ('\ t') [0] city_list.append (city) f. close () # define a getjson function to parse the returned data def getjson (palace, page_num = 0): headers = {'user-agent': 'mozilla/5.0 (Windows NT 6.1) appleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 '} url = 'HTTP: // api.map.baidu.com/place/v2/search'params = {'q': 'Park', 'region ': palace, 'process': '2', 'page _ size': '20', 'page _ num': page_num, 'output': 'json', 'ak ': 'xm53lmurtnqaapfukvy1wzsyzcnmna9h',} response = requests. get (url = url, params = params, headers = headers) html = response. text decodejson = json. loads (html) return decodejson # connect to the database, obtain the cursor, obtain the data, and insert it into the database. It is better to use the get () method to obtain the data, avoid program interruption when there is no relevant data. conn = MySQLdb. connect (host = 'localhost', user = 'root', password = 'root', db = 'baidumap ', charset = 'utf8') cur = conn. cursor () for city in city_list: not_last_page = True page_num = 0 while not_last_page: decodejson = getjson (city, page_num) print (city, page_num) if decodejson. get ('result'): for result in decodejson. get ('results'): park = result. get ('name') lat = result. get ('location '). get ('lat') lng = result. get ('location '). get ('lng ') address = result. get ('address') street_id = result. get ('street _ id') uid = result. get ('uid') SQL = "" INSERT INTO baidumap. city (city, park, location_lat, location_lng, address, street_id, uid, time) VALUES (% s, % s, % s, % s); "" cur.exe cute (SQL, (city, park, lat, lng, address, street_id, uid, datetime. now () conn. commit () page_num = page_num + 1 else: not_last_page = Falsecur. close () conn. close ()

Export data from MySQL:

Ii. Try the second API to obtain data

Second API interface: http://api.map.baidu.com/place/v2/detail? Uid = xxxxx & output = json & scope = 2 & ak = user's access key
There is a parameter uid in it. We can obtain this parameter from the city table we saved earlier, and then we try to access this API. The returned result is:

Create a table structure in Table park as follows:

First, get the uid from the table city, then use the second API to request, get the data, and store it in the table park. The Code is as follows:

From datetime import datetimeimport requestsimport jsonimport MySQLdb # obtain uidconn = MySQLdb from the city table. connect (host = 'localhost', user = 'root', password = 'root', db = 'baidumap ', charset = 'utf8') cur = conn. cursor () SQL = "Select uid from baidumap. city WHERE id> 0; "cur.exe cute (SQL) conn. commit () uids = cur. fetchall () # define a getjson function to parse the returned data def getjson (uid): try: headers = {'user-agent': 'mozilla/5.0 (Windows NT 6.1) appleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 '} params = {'uid': uid, 'scope': '2', 'output ': 'json', 'ak': 'xm53lmurtnqaapfukvy1wzsyzcnmna9h ',} url = 'HTTP: // api.map.baidu.com/place/v2/detail' response = requests. get (url = url, headers = headers, params = params) html = response. text decodejson = json. loads (html) return decodejson response T: pass # Get data, store data for uid in uids: uid = uid [0] print (uid) decodejson = getjson (uid) data = decodejson. get ('result') if data: park = data. get ('name') location_lat = data. get ('location '). get ('lat') location_lng = data. get ('location '). get ('lng ') address = data. get ('address') street_id = data. get ('street _ id') telephone = data. get ('telphone') detail = data. get ('detail') uid = data. get ('uid') tag = data. get ('detail _ info '). get ('tag') detail_url = data. get ('detail _ info '). get ('detail _ url') type = data. get ('detail _ info '). get ('type') overall_rating = data. get ('detail _ info '). get ('overall _ rating') image_num = data. get ('detail _ info '). get ('image _ num') comment_num = data. get ('detail _ info '). get ('comment _ num') shop_hours = data. get ('detail _ info '). get ('shop _ hours') alias = data. get ('detail _ info '). get ('Alias') scope_type = data. get ('detail _ info '). get ('scope _ type') scope_grade = data. get ('detail _ info '). get ('scope _ grade ') description = data. get ('detail _ info '). get ('description') SQL = "" INSERT INTO baidumap. park (park, region, location_lng, address, street_id, telephone, detail, uid, tag, detail_url, type, overall_rating, image_num, comment_num, metric, alias, scope_type, scope_grade, description, time) VALUES (% s, % s, % s, % s); "" cur.exe cute (SQL, (park, location_lat, location_lng, address, street_id, telephone, detail, uid, tag, detail_url, type, overall_rating, image_num, comment_num, shop_hours, alias, scope_type, scope_grade, description, datetime. now () conn. commit () cur. close () conn. close ()

Export data from MySQL:

Top
0
Step on
0
View comments

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.