Purpose: Crawl all secondary data in the Kunming range, including name, coordinates.
First into the basic, this article mainly on the principle, and the realization of step decomposition, for the Python code to lay the groundwork for writing.
Since it is the foundation of the 0, it will be more detailed.
As the realization of the purpose, crawling all the data of Kunming City, is to obtain the Baidu map on the Kunming range of all key words with secondary school Geographic information data (points of interest).
How to capture the data on the Baidu map?
Here are the tutorials:
The catalogue is as follows:
1. Baidu Map Open Platform registration, AK acquisition
2. Description of AK
3. Request URL Description
4. Baidu Map Coordinate picker
5. Get point of interest poi in coordinate range
6. Complete the URL array in Excel
1. Baidu Map Open Platform registration, AK acquisition.
(1)
If you want to get POI data, first to login Baidu map open Platform (http://lbsyun.baidu.com/), complete the registration.
This platform is Baidu map for developers to provide interface with a lot of other functions, here only to talk about POI crawl related.
Of course, there is Baidu account, direct landing on it can.
This interface is very familiar, not much to say.
(2) After landing Baidu map open platform, according to the number on the surface of the operation.
First, click into the console;
Second, click to enter to create the application;
Third, the application of the name;
Four, if necessary, set an IP whitelist what, limit the call to AK computer;
V. Submit.
Six, the other default, do not change.
Then you can see the AK you created.
2. Description of AK.
For this part of the explanation, you can see the development Document--web Service API section in detail.
Here's a question about quotas.
First Baidu does not support you to create an AK can be used casually, for most unauthenticated users, the daily quota is limited, not more than 100,000 times per minute can not exceed 6,000 times.
Of course, this quota is mainly triggered by the location function.
On a crawl poi, a URL page generated with an AK can only display information for 20 points of interest, and within a coordinate range, up to 20 URL pages can be generated.
In other words, within a coordinate range, the URL page generated by AK can crawl up to 400 points of interest information.
If the Kunming Middle school is not more than 400, then a coordinate range is enough, but if more than 400, it is not enough.
An explanation of the AK quota comes first here, which is also covered in the following steps.
3. Request a URL description.
Copy the following URL into the browser to see it.
http://api.map.baidu.com/place/v2/search?query= Middle School ®ion= Kunming &output=json&ak= 9s5gsyzswbmafu8ps2v2vwvdldlqgaao
Did you generate the following page?
This is the one page crawled to the information of Kunming Middle School.
We then fragment the line URL.
http://api.map.baidu.com/place/v2/search?query= Middle School ®ion= Kunming &output=json&ak= 9s5gsyzswbmafu8ps2v2vwvdldlqgaao
Http://api.map.baidu.com/place/v2/search? This prefix is found on the Baidu map (API).
query= Secondary School query keywords are secondary.
region= Kunming Query area is Kunming.
The Output=json is output in JSON format.
Ak=9s5gsyzswbmafu8ps2v2vwvdldlqgaao AK is 9S5GSYZSWBMAFU8PS2V2VWVDLDLQGAAO. (This is the AK just applied, the figure has been mosaic, I have to write a tutorial to understand, did not set the IP whitelist, put it here, we practice using it.) )
This line of URL is very good understanding, tiger, if I want to find a hotel in Beijing, its URL is: http://api.map.baidu.com/place/v2/search?query= Hotel ®ion= Beijing &output= Json&ak=9s5gsyzswbmafu8ps2v2vwvdldlqgaao
But we found that the data presented on the page is four or five, Kunming's Middle school, Beijing hotel, it is impossible to just a few.
Then take Kunming's middle school as an example, to improve the URL.
http://api.map.baidu.com/place/v2/search?query= Middle School ®ion= Kunming &page_size=20&page_num=0&output= Json&ak=9s5gsyzswbmafu8ps2v2vwvdldlqgaao
Copy this URL into the browser and take a look at the number of secondary schools that climbed down the page, 20.
From this practice, you can understand that each URL page can only display 20 points of interest.
Then take a closer look at the two URLs:
The first one:
http://api.map.baidu.com/place/v2/search?query= Middle School ®ion= Kunming &output=json&ak= 9s5gsyzswbmafu8ps2v2vwvdldlqgaao
The second one:
http://api.map.baidu.com/place/v2/search?query= Middle School ®ion= Kunming &page_size=20&page_num=0&output= Json&ak=9s5gsyzswbmafu8ps2v2vwvdldlqgaao
Look at the difference between the two?
The second one page_size=20&page_num=0 more, what does this mean? Altogether can generate 20 URL page, this is the No. 0 one, we know, the program language automatic arranging, generally start from 0.
Change page_size=20&page_num=0 to Page_size=20&page_num=1, try to see what the second page generated, and then change to page_size=20&page_num=2 ...
When you change to page_size=20&page_num=19, the URL is http://api.map.baidu.com/place/v2/search?query=%e4%b8%ad%e5%ad%a6& region=%e6%98%86%e6%98%8e&page_size=20&page_num=19&output=json&ak= 9s5gsyzswbmafu8ps2v2vwvdldlqgaao.
Change to page_size=20&page_num=20 try, URL is http://api.map.baidu.com/place/v2/search?query=%e4%b8%ad%e5%ad%a6& region=%e6%98%86%e6%98%8e&page_size=20&page_num=20&output=json&ak= 9s5gsyzswbmafu8ps2v2vwvdldlqgaao.
Show is, no point of interest, this is good to understand the above mentioned, a coordinate range, up to 20 URLs can be generated page!
This time we know the URL can crawl up to 400 points of interest, then we have to get more than 400 points of interest, how to do?
Don't worry, look down.
4. Baidu Map Coordinate picker
Before answering the above questions, let's start by understanding a tool, the coordinate picker.
Enter the development documentation--tool support--coordinate picker.
Open the coordinate picker and enter the Baidu map coordinate pickup system.
In this pickup coordinate system, first, set the range; second, click on the map, third, see the current coordinate points as follows, copy, you can get this point of Baidu coordinates.
You reckoned a rectangular range, pick a lower-left corner coordinate, and then pick a top-right coordinate.
If it is really reckoned bad, find a map of the national administrative divisions to see.
This is a little bit troublesome, I will later write a tutorial on how to get the administrative area.
Pick a Kunming-wide rectangular coordinate.
Lower left corner: 102.174112,24.390894
upper right corner: 103.678942,26.548645
(Getting the lower-left and upper-right coordinate values of the rectangle is simplified in the Advanced chapter.) )
5. Get point of interest poi in coordinate range.
We now know the coordinate range of Kunming.
Then change the URL above.
This is the url:http://api.map.baidu.com/place/v2/search?query= Middle School in the above ®ion= Kunming &page_size=20&page_num=0 &output=json&ak=9s5gsyzswbmafu8ps2v2vwvdldlqgaao
Modified url:http://api.map.baidu.com/place/v2/search?query= Secondary School & bounds=24.390894,102.174112,26.548645,103.678942 &page_size=20&page_num=0&output=json&ak=9s5gsyzswbmafu8ps2v2vwvdldlqgaao
Compare, what's different!
The previous range attribute is: region= Kunming
The modified scope property is: bounds=24.390894,102.174112,26.548645,103.678942
Note that the latitude range is bounds= in the lower left-hand corner, longitude at the lower left, latitude in the upper-right corner, and longitude in the upper-right corner.
None of the other changes, crawling in a coordinate range, can only crawl up to 400 points of interest.
Well, answer the question asked before, if the number of points of interest more than 400, how to do?
You can split the rectangular range!
Change the range of Kunming to latitude and longitude, the lower left and upper right corner constitute a rectangle, if there is more than 400 points of interest in a rectangle, then we can slice the rectangle into four rectangles. Gets the points of interest within the range of four small rectangles, respectively, and summarizes them.
Four not enough words, cut into eight, eight not enough words, cut into 16, as long as the interests of each rectangle within the point of not more than 400 of the line.
As to how many should be sliced, how to divide, by experience, their own reckoned.
In this case, this is the idea of following Python programming.
6. Complete the URL array in Excel.
This is for the later Python crawler script to warm up.
We use Excel to strengthen the coding idea.
Objective:
Crawl the poi of interest points in Kunming Secondary school.
Key words: Middle School
Already have AK:9S5GSYZSWBMAFU8PS2V2VWVDLDLQGAAO
Kunming Coordinate range:
Lower left corner: 24.390894,102.174112
upper right corner: 26.548645,103.678942
URL Template:
Http://api.map.baidu.com/place/v2/search?query= Middle School & bounds=24.390894,102.174112,26.548645,103.678942& Page_size=20&page_num=0&output=json&ak=9s5gsyzswbmafu8ps2v2vwvdldlqgaao
The coordinate range is entered into Excel, the range of 4 rectangles is computed, and then 4 rectangles correspond to 4 column URLs (page_num values 0 through 19), and the Excel function generates 4 columns with a total of 80 URLs.
Of these URLs, only two properties are changed, both bounds (boundary range), Page_num (page 0-19), and the others are unchanged.
URLs are generated in a regular and predictable pattern.
This is the idea of programming, the code is explained.
Transfer from-------http://blog.csdn.net/sinat_41310868/article/details/78746094
0 Basic Master Baidu map points of interest get POI crawler (Python language Crawl) (Basic article)