Crawler (iii) Parse JS, grab the real play address of Youku free video

Source: Internet
Author: User

tools :Google Chrome + Fiddler grab Kit

Note: Here do not post code, "only talk about ideas!!!" 】

Original URL = v.youku.com/v_show/id_xmziwnjgymdgwoa==.html? "A random movie link" called it the original URL

Start Analysis:

Open fiddler, then open Google, enter the URL, press F12. Get:

Analysis of content: first through the fiddler grasp the Real play address is a paragraph, such as the label 3, and then one of the small play address copied to the browser to open, get 403error, it can be seen that the link needs to refactor something, and then to send the request through the code, otherwise it will be rejected. So, come to label 4, parse the request URL. First of all, the difference between the different video segments of the URL, the comparison found that only the "ts_seg_no" parameter is different, and the parameter is starting from 0 gradually +1, but the end is how much is unknown. Then analyze what the difference is between the request URL of the video's real address when the original URL is opened at different times. Old method, the original URL is opened again in the new tab, compared to two open request URL. The comparison process of the request URL parameter is omitted, and the result of the parameter comparison is as follows:

You can see that each time you open, both the PSID and Vkey parameters are variable and represent what is unknown. The request URLs of different video segments at the same time are also changed on the Ts_start, Ts_end, ts_seg_no parameters, although the change rule is known, but it is not possible to determine at what time the three parameters end, so it is still unknown. Analysis here, you can be sure that the segmented video link before the link or JS file loaded with these unknown parameters or the unknown Request URL link. So in the network try to search for PSID, vkey two values, part of the link long such as "pl-ali.youku.com/playlist/m3u8?", and then click to open the view response, such as:

Exactly, the content of the response is just the request link for the video segment. So now there is no need to refactor the video fragment request link, focus on the URL link in the "Call it play URL", as long as you can get the playback URL, then this task is completed.

Then start analyzing the playback URL, which looks like this:

Looks a bit complicated, so we used to compare and analyze the URL before the method to get the playback URL parameter differences, such as:

Analyze this link to get the parameters that need to be refactored in the link only psid, Ups_key. So start analyzing the sources of these two parameters.

So in Google network ctrl+f search psid, Ups_key, found two parameters appear in the link called "acs.youku.com/h5/mtop.youku.play.ups.appinfo.get", this link " Call it JS link "such as;

This JS link is a JS file. After the point is opened, review the response and find that the response is a JSON-formatted JS function, such as:

Then in response to search for PSID, Ups_key value, incredibly found the previous play link, to a diagram:

Ok! Now everything is very clear, as long as can get to this JS link response information, extract m3u8_url and then request, and extract the response "This response is the real video playback address." So now to find out how to get this JS link,

The appearance of the problem

Well...? (Guess in ...) You can see from the response that the JSON format is in MTOPJSONP1 (), so what is this mtopjsonp1 ()? Is he a JS function? If it is a JS function, then you can try to search and see "tried, and did not see exactly the same function", do you want to try to request this JS link? "This link also looked, special ~ Special long, looks very complex, this road first not consider", if the previous two are not considered, then also began to consider refactoring that play URL, after all, play URL as long as find Psid, ups_key two parameters ok. Then began to think: if the two other paths exist, it should be in a certain JS function, so began to search the network two parameters Psid, Ups_key. Well, I found traces of the PSID parameters. Such as:

Indeed in another JS file found Psid, but not much like, and even if it is not familiar with JS I also can not find out how PSID is generated, so, this variable is placed first, and then search Ups_key, unfortunately, did not find this variable in the JS file, so need to go back to conjecture "How to ask for that very long and complicated JS link", why is it complicated? You can see that.

And also a GET request, send data link, is the parameter of the link (first despair):

To say the truth, see such a link, really do not want to get.

But I still have to take the time to get it all figured out, or I'm going to have to stop crawling technology. (Still want to do, but think of the problem of the crawler and no one can ask, not very sad, technical bottlenecks can only rely on their own time to heap out of unknown) and then to do it.

Write at the end

I am now in the crawler to crack JS in the direction of the technical bottleneck, the last time that the search site is a hack to the wrong one by JS encryption parameters, because there is no way to solve the JS encryption, and finally failed. Want to go, this technology bottleneck can only be now began to learn JS, oneself also learn to do the next JS encrypted data. So over and over, presumably JS hack in the corner. After JS finished, I want to encrypt their own hack.

In addition, the writing is entirely my personal thinking. May be correct, may be wrong, may be partial deviation and so on, if you unfortunately see my essay, but also unfortunate to see this place, I sincerely hope you can correct the mistake.

In addition, this article will be updated if the subsequent hack succeeds.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.