[Python learning] Simple crawl csdn Download resource information

Last Update:2015-07-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is a python crawl csdn download Resource Information example, mainly through the urllib2 to obtain csdn a person all resources resource URL, resource name, download number, score and other information; The reason I wrote this article is that I want to get all the comments on my resources, However, because the comments are temporarily loaded with JS, this article first briefly describes how to manually parse the HTML page crawl information.

Source

# coding=utf-8 Import urllib import time import re import os#************************************************** #第一步 times Calendar gets the URL of each page corresponding to the topic #http://download.csdn.net/user/eastmount/uploads/1#http://download.csdn.net/user/eastmount/ Uploads/8#**************************************************num=1 #记录资源总数 A total of 46 resources Number=1 #记录列表总数1 -8fileurl=open (' Csdn_url.txt ', ' w+ ') fileurl.write (' **************** get Resource url*************\n\n ') while Number<9:url= '/HTTP download.csdn.net/user/eastmount/uploads/' + str (number) fileurl.write (' Download list url: ' +url+ ' \ n ') print Unicode (' Download list ur L: ' +url, ' Utf-8 ') content=urllib.urlopen (URL). Read () Open (' csdn.html ', ' w+ '). Write (content) #获取包含URL块内容 match needs to be calculated </ Div> number Start=content.find (R ' <div class= "List-container mb-bg" > ') end=content.find (R ' <div class= "Page_n AV ">") cutcontent=content[start:end] #print cutcontent #获取块内容中URL #形如 <dt><div>&lt ;/div>
 Show Results
The display includes the resource URL, resource title, resource credits, download count, resource type, and resource size:

For example, now crawl Guo Lin The resources of the Great God, where the page links are as follows: (total 7 pages)
Http://download.csdn.net/user/sinyu890807/uploads/1
Http://download.csdn.net/user/sinyu890807/uploads/7
After simply modifying the python source code URL, the download page looks like this:


the results of the operation are as follows:



html Analysis 
          First, get the URL and title of all the resources in each column, by parsing the source code. 
 <dt>   <div class= "icon" ></div>   <div class= "Btns" ></div>     
the corresponding HTML appears as shown in the following:


then through the URL to the specific resources to get what I call the message box information: 

the information corresponding to the review element is as follows, get <span>0 </span>:

The last thing I want to do is get comment information, but it is implemented by JS:
<div class= "section-list panel Panel-default" >   <div class= "panel-heading" >      

 finally hope that the article is helpful to you! Next prepare to analyze how Python gets the comments of JS, and this article can provide you with a simple manual analysis of the page example, you can also get a person csdn resources download more, score high for you to choose. Basic knowledge, for reference only ~
(By:eastmount 2015-7-21 5 o'clock in the afternoonhttp://blog.csdn.net/eastmount/)


 
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
 
[Python learning] Simple crawl csdn Download resource information

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Python learning] Simple crawl csdn Download resource information

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[Python learning] Simple crawl csdn Download resource information

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support