Warning: This article is intended for study purposes only and should not be used for illegal purposes.
In the previous article, "Mobike Bike unofficial Big Data analysis" mentions my data analysis of the bicycle during the Spring Festival, in the later series of articles I will further explain how my crawler crawled efficiently to this data.
Why climb the data of the worship
It was the first shared bike to enter Chengdu, and every day when I came down from the subway I could see a lot of bikes in the app, but when I got there, I found the car wasn't there. Some cars do not know where to hide, some cars may be in the back of the building, due to GPS errors and can not find it, some cars were put into the community inside, separated so that cyclists can not get to the car.
So is there a way to analyze whether these cars become zombie cars by getting the data from these bikes? Does anyone deliberately put it in a neighborhood that people can't get?
With these questions, I began to study how to get the data.
Where to get the data
If you can see the data, then we always have the means to automate the acquisition of this data. But the method of acquiring data determines the efficiency of data acquisition, and for the purpose of data analysis of the bicycle, the crawler is able to obtain more data in a short period of time (usually about 10 minutes or so), which is useful for data analysis. So where does the data come from?
The most direct source is the app for the bike. Modern software design is to pay attention to the separation of the front and back, and the service will serve the app, Web pages and so on. In this trend we just need to figure out the HTTP request for the software. The following tools are generally available:
Direct Grab Bag:
HTTP request packet capture and debugging with proxy:
Fiddler 4
Charles
Packet Capture (Android)
Because my phone does not have root, on the router to grab packets and too much interference, for HTTPS is not good. So you can only use Fiddler or Charles to try it out first. Hang up the fiddler agent, and then move the location on the phone to see if there are any new requests. Unfortunately, it seems that the request is to get a map of the High German, and there is no data related to Mobike car.
What's the matter? Try the phone. Replaced packet capture after sure there is traffic, in the request to find my most concerned about that:
4372317-de272f8395d2106f.png
This API request at a glance is very obvious, in the postman to try the correct return information, it seems that you!
Too early to be happy
A continuous crawl of data for a few days, the data analysis, found that the bicycle's GPS seems to have been beating, and sometimes beats will be more than a few kilometers away, obviously not a normal value.
Is it their interface that did the hands and feet back with false data? I have observed that even in the app, the data returned by the bike also bounces. One day early in the morning, I had a few minutes to refresh my car near my home to see if that was true.
I couldn't find the picture, but the observation came to the conclusion that there was really a problem with the location returned in the app. There was a car in a very remote location, and soon disappeared, and later came back, and I caught the data match. And this beat and cell phone, mobile phone number, and even the mobile operator does not have a relationship, indicating that this beating is the problem of the interface, but also on the other hand to explain why the time to see the car but actually there is no car.
This is a circle of friends before the video, you can see there is a tip near the camp gate, where in fact the car is parked, but the GPS track shows in a short time in the vicinity of the crowded, even crowded to a very long, and back to that position.
This kind of data is useless for data analysis, I almost gave up.
Turn
With the popularity of the small program, the motorcycle is also in the first time out of the small program. I smiled at a glance, good, and gave me a data source, try. It is easy to determine the API after grabbing a data with packet capture, and the process is not elaborated. Crawl after crawling for two or three days of data, found a turnaround, the data in line with the normal cycle of cycling.
The remaining thing is to improve the efficiency of the crawler.
Other attempts
Sometimes the direct analysis of the app's source code will be convenient to find the API portal, the app of the Android side of the anti-compilation, but found that in addition to some of the resource files useful, the other files are with Qihoo 360 of the obfuscation shell. Online an article analysis how to carry out shelling, but I do not have much time to study, also forget.
Also talk about the design of API
The API of the bike is easy to crawl and analyze, largely because of the simplistic API design:
Use HTTP requests only, making it easy to capture packet analysis
The request is not encrypted in any of these APIs, making its services easily accessible to people.
In addition, the applet is also an important source of leaking APIs, after all, in the app request requests can be encrypted through the native code and then issued, but in the small program does not seem to have such a function.
If you are interested, you can try to look at the request of the small Blue Bike app, they use the HTTPS request, the data request encryption, to crawl their data difficulty will increase a lot.
Of course, if the motorcycle official does not care about the data, such API design is OK.