Crawl my spring and autumn courses with Python

Source: Internet
Author: User

See the lesson in the content is to use GET request to crawl the course, their own practice when the discovery has been changed to post request, the following start

Open Course Page

I use the Firefox, then is F12, click on the network, there may be many packages, but do not affect, click Delete on the line, and then click on the second page, other pages can also be seen, there is a total of 17 pages

Then there will be a lot of packages, find the following, click to view

Click to see the message header request URL, this is the URL we want to use

Copy to Code

' https://www.ichunqiu.com/courses/ajaxCourses '

Next look at the parameters

Then found "PageIndex", very clearly refers to the first few pages, you can try the fourth page value is 4,,. This is the form data to use for the post, get into the code, because the pageindex value needs to change, I set it to a null value

Post_data = {                'Coursetag':"',                'coursediffcuty':"',                'Isexp':"',                'Producerid':"',                'OrderField':"',                'orderdirection':"',                'PageIndex':"',                'Tagtype':"'            }

Because I spring and autumn set up anti-reptile processing, so we need to add headers,headers not much to say (do not want to expose their configuration, I will not tell you I also use XP), with Firefox is very simple, click Edit Re-send, copy on the line. The next step is to view the response

Obviously this is JSON data, first import JSON again, you can find the class is in the course result, starting from 0, we can use "Len (raw_data[' course ' [' result '])" To view the page there are several courses, with Raw_ data[' Course ' [' Result '][0][' coursename '] to get the first class name, here's the code

#-------------#author:glasses#-------------ImportRequestsImportJSONdefgetData (): I_url='https://www.ichunqiu.com/courses/ajaxCourses'Headers= {                   }#headers yourself to fill in     forP_indexinchRange (1,18): Post_data= {                'Coursetag':"',                'coursediffcuty':"',                'Isexp':"',                'Producerid':"',                'OrderField':"',                'orderdirection':"',                'PageIndex':"',                'Tagtype':"'} post_data['PageIndex'] =P_index R= Requests.post (i_url,headers=headers,data=post_data,timeout=10) Raw_data=json.loads (R.text) forIinchRange (Len (raw_data['Course']['result'])):            Printraw_data['Course']['result'][i]['Coursename']getdata ()

Let's study together in spring and autumn,

Step it Up! Small white, as long as the fall can not die I continue to fall!

Crawl my spring and autumn courses with Python

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.