12-18
Tonight, I received a phone call from elder sister, said she has been in the "Medical Education network" ordered a lot of video, I want to help her to download all the video down.
I looked at it, there are 24 subjects, there are more than 40 sections per subject. If I had to do it manually, I might as well let me die.
This duplication of things or let the program do it! Here is the process of writing a live blog.
Crawled URL: http://www.med66.com/
A few days ago I just finished a qihuiwang crawler software. This time I evaluated, this time to do the video download crawler than the last time there are new challenges:
(1) to deal with the landing process, the previous no need to log on can be directly climbed. It has to be landed this time. Process involving the Post data table
(2) To identify the JavaScript program. I'll take a look at the button on my page that says "onclick=" godownload (' 700914 ', '). This is going to be converted into a URL address
(3) Download need to record which files have been downloaded, so as not to start the program every time the download from the beginning. This is unreasonable.
(4) The documents to be downloaded are organized in the catalogue by course.
The site path is as follows:
Landing page-(login)--Student Course page-(access Course)-Directory page-(Download Center)-download page--section video
All right, let's do it tomorrow.
Medical Education web crawler Program (live)