In front of me in a it live over the www.med66.com landing process. Blog: http://my.oschina.net/hevakelcj/blog/357852
Successful landing means entering the portal of the website. The rest of the job is to go inside and take the thought out of it.
The following is a successful landing page, we need to get a list of courses from this page.
Open the Firefox debugging tool and see how the above elements are laid out.
It is easy to find the elements of the course list through the Firefox debugging tool, with all the courses listed in <div class= "Ul_con_uc_show" >.
And every <div class= "Uc_row" > is a course.
There is a link to "Click here to learn from the beginning" of each course. As above href= "http://elearning.med66.com/cware/video/videoList/videoList.shtm?cwareID=700914"
Let's analyze this link address to access the pinned page http://elearning.med66.com/cware/video/videoList/videoList.shtm
With a parameter cwareid=700914 in the back. This "700914" is the ID number of the course.
Go to the download page for this course:
On this "Download Center" page you can download handouts, exercises, videos, and more. I was surprised to find that the address of the Download Center is related to the course ID:
http://elearning.med66.com/cware/download/downloadIndex.shtm?cwareID=700914
This URL is also a fixed page address, followed by a parameter cwareid=700914.
The author boldly Imagine, is not all the course download page is to Cwareid to distinguish course?
The author opens the link "course handout Word document Download" on the "Download Center" page. Observe its address:
http://elearning.med66.com/cware/download/wordDownload.shtm?wordType=1&cwareID=700914
The author then opens the "Practice Center Word document Download" and observes its address:
http://elearning.med66.com/cware/download/wordDownload.shtm?wordType=2&cwareID=700914
It can be seen that the two are only wordtype this parameter is different. Extrapolate, the author shows in tabular form:
Download content
|
Download link
|
Handout |
http://elearning.med66.com/cware/download/wordDownload.shtm?wordType=1&cwareID=700914 |
Practice |
http://elearning.med66.com/cware/download/wordDownload.shtm?wordType=2&cwareID=700914
|
Mobile video
|
http://elearning.med66.com/cware/download/videoDownload.shtm?cwareDownType=down12&cwareID=700914 |
Phone Audio
|
http://elearning.med66.com/cware/download/videoDownload.shtm?cwareDownType=down13&cwareID=700914
|
Flat-screen Video
|
http://elearning.med66.com/cware/download/videoDownload.shtm?cwareDownType=down14&cwareID=700914
|
Flat Panel Audio
|
http://elearning.med66.com/cware/download/videoDownload.shtm?cwareDownType=down15&cwareID=700914
|
With this watch, if you know the Cwareid of the course, the address of the resource to download the course is deduced. It's a big break!
Although it is possible to access the download page of a resource, the download page is not just a download, but a section of a section below. The following is the "mobile video" download page:
Use the Firefox debugging tool to open the layout of the elements:
We want to capture each row of the table, grab the name of the section and the Resource link address. It has more than one, there are 4 to choose from. Let's use the third one (which should be the least busy).
All right, today we'll analyze the process and analyze it tomorrow. Please look forward to the best!
Medical Education web crawler--Website Walk (live)