This blog post for the original blogger, please specify the reprint.
Urban public transport and metro data reflect the urban mass transit, and the data can be used to excavate the city's traffic structure, road network planning, bus location and so on. However, such data is often held in a specific sector and is difficult to obtain. Internet map has a lot of information, including bus, subway and other data, analysis of its data feedback method, can be collected by Python crawler. Gossip, the next step is to detail how to use Python crawlers to crawl city buses, metro stations, and data.
First, crawl through the city of all the bus and subway line names, namely XX Road, Metro X line. Can be obtained through the map Bar bus, bus network, 8684, local treasure and other websites, this kind of website provides the name of the bus line classified by numbers and letters. Python writes a simple crawler to be able to collect, see Wenwu_both's article, blogger details how to use Python to crawl 8684 of all the bus site data in a city. The blogger collects detailed information about the site, including, but lacks the coordinates of the bus site, bus line coordinate data. This makes people crazy, no space coordinates how to fall, how to analyze, so this article focuses on the site coordinates, the acquisition of the line.
Take the figure bar bus as an example, after clicking on a bus, it appears the road bus detailed site information and map information. Bo Main Dayton feeling excited, feel immediately will succeed, all kinds of grasping bag, found and can not parse. May be limited by Bo Master technology, if the great God can catch the site and line coordinates information, please restless enlighten. This TM makes people desperate, ah, to the mouth of fat can't eat.
Despair, try to find a map of the API, found that can be called, through parsing, to find the background address of the data. Familiar with the front-end can try, Bo main front end will only a Hello world, not caught dead. This is a way of thinking, and practice proves to be possible.
Map API can, then grab the packet through the map? Open a Map home page, directly enter a city bus name, by grasping the packet, successfully found the site and line information. Specific packet capture information as shown, busline_list in detail the site and line information, including two, is the same bus in different directions of data, slightly different, need attention. After finding the entrance, the crawler will be on the move.
The main crawl substitution code as follows, is actually very simple, the main function is as follows. First, you need to build the incoming parameters, mainly including the route name, city code, geographic extent, scale scale. Geographic range can be obtained through the coordinate picker, the parameters are URL-encoded, send a request to determine whether the return data meet the requirements (note: The line map may be stopped or not exist, it may be too fast access, anti-crawler mechanisms need to be manually verified, Bo Master crawl encountered, so the following set up a random sleep). The next step is to parse the JSON data. The code of Extratstations and Extractline, is to extract the required fields, how, is not very simple. Finally, it is saved, and the site and route are stored separately.
1 defMain ():2DF = Pd.read_excel (the line name. xlsx",)3BASEURL ="https://ditu.amap.com/service/poiinfo?query_type=tquery&pagesize=20&pagenum=1&qii=true& cluster_state=5&need_utd=true&utd_sceneid=1000&div=pc1000&addr_poi_merge=true&is_classify= true&"4 forBusinchDf[u"Line"]:5params = {6 'keywords':'11-Way',7 'Zoom':' One',8 ' City':'610100',9 'Geoobj':'107.623|33.696|109.817|34.745'Ten } One Print(BUS) AParammerge =Urllib.parse.urlencode (params) - #print (Parammerge) -TargetUrl = BaseUrl +Parammerge theStationfile ="./busstation/"+ Bus +". csv" -Linefile ="./busline/"+ Bus +". csv" - -req =urllib.request.Request (TargetUrl) +res =Urllib.request.urlopen (req) -Content =Res.read () +Jsondata =json.loads (content) A if(jsondata["Data"]["message"]) andjsondata["Data"]["busline_list"]: atBuslist = jsondata["Data"]["busline_list"]##busline List -BUSLISTSLT = buslist[0]## Buslist A total of two lines, different directions of the same bus, choose a trip to crawl - -Busstations =extratstations (BUSLISTSLT) -Busline =extractline (BUSLISTSLT) - writestation (busstations, Stationfile) in writeLine (busline, Linefile) - toSleep (Random.random () * Random.randint (0,7) + random.randint (0,5))#set Random hibernation + Else: - Continue
A blogger's analytic function is attached:
1 defextratstations (buslistslt):2Busname = buslistslt["name"]3Stationset = []4Stations = buslistslt["Stations"]5 forBsinchstations:6TMP = []7Tmp.append (bs["station_id"])8 tmp.append (busname)9Tmp.append (bs["name"])TenCor = bs["Xy_coords"].split (";") One tmp.append (cor[0]) ATmp.append (cor[1]) -Wgs84cor1 = gcj02towgs84 (float (cor[0]), float (cor[1])) - tmp.append (wgs84cor1[0]) theTmp.append (wgs84cor1[1]) - stationset.append (TMP) - returnStationset - + defExtractline (buslistslt): - ## Buslist contains two lines, note name +KeyName = buslistslt["Key_name"] ABusname = buslistslt["name"] atFromName = buslistslt["Front_name"] -ToName = buslistslt["Terminal_name"] -Lineset = [] -Xstr = buslistslt["XS"] -Ystr = buslistslt["Ys"] -Xset = Xstr.split (",") inYset = Ystr.split (",") -Length =Len (xset) to forIinchRange (length): +TMP = [] - tmp.append (keyName) the tmp.append (busname) * tmp.append (fromname) $ tmp.append (toname)Panax Notoginseng tmp.append (Xset[i]) - tmp.append (Yset[i]) theWgs84cor2 =gcj02towgs84 (float (xset[i]), float (yset[i])) + tmp.append (wgs84cor2[0]) ATmp.append (wgs84cor2[1]) the lineset.append (TMP) + returnLineset
The crawler collects the raw data as follows:
The following is a presentation of the processed data for a bus station and line. Because different map quotient adopt different coordinate system, there will be different degrees of deviation, need coordinate correction. Next, the blogger details how to coordinate the correction and vectorization of these sites and coordinates in batches.
Python crawler--city bus, metro station and line data