Python parses the 115 Network Disk linked instance in the source code of the webpage, and python115

Source: Internet
Author: User

Python parses the 115 Network Disk linked instance in the source code of the webpage, and python115

This article describes how to parse the 115 Network Disk link in the source code of a Web page using python. Share it with you for your reference. The specific method is analyzed as follows:

In the 1.txt, is the Web page http://bbs.pediy.com/showthread.php? T0000144788133 is 1.txt

The Code is as follows:

import re    if __name__ == "__main__":   fp = open("c:\\1.txt")      https = re.compile(r"(http://u.*)")   for url in https.findall(fp.read()):     print url 

Output result:

http://u.115.com/file/f61cb107c8 http://u.115.com/file/f6806f45b8 http://u.115.com/file/f6ec42d4d3 http://u.115.com/file/f6deb05ec4 http://u.115.com/file/f6e51f6838 http://u.115.com/file/f66edaf8d3  http://u.115.com/file/f6d07e07b9 http://u.115.com/file/f6d7f585a8 http://u.115.com/file/f639d8b3cf http://u.115.com/file/f6dcadbde6 http://u.115.com/file/f6ea3f01c1 http://u.115.com/file/f65b96a06f  http://u.115.com/file/f682da085a  http://u.115.com/file/f6486e698 http://u.115.com/file/f6b7491d9f http://u.115.com/file/f622b7f9a7 http://u.115.com/file/f64e2424b9 http://u.115.com/file/f6e5132d4d  http://u.115.com/file/f655c10e86  http://u.115.com/file/f6b22e64e6 http://u.115.com/file/f6812126a4  http://u.115.com/file/f6523e625c http://u.115.com/file/f63e0ccb28 http://u.115.com/file/f611e07b8a# http://u.115.com/file/f6e047bccc#  http://u.115.com/file/f6d348d781# http://u.115.com/file/f6ada24153# http://u.115.com/file/f64f97518b#  http://u.115.com/file/f6f9ba96f8# http://u.115.com/file/f650e06f38# http://u.115.com/file/f683ee5b2a# http://u.115.com/file/f69009bfc2# http://u.115.com/file/f6ea427646# http://u.115.com/file/f6acdc6b7f# http://u.115.com/file/f6c85745d0# http://u.115.com/file/f61a26cf12# http://u.115.com/file/f631edf5c6#  http://u.115.com/file/f6b0fa6fb8# http://u.115.com/file/f6f5fe8962# http://u.115.com/file/f6bf975e0#  http://u.115.com/file/f6d522784c#  http://u.115.com/file/f6b5ac9991#  http://u.115.com/file/f62e80ced5#  http://u.115.com/file/f6bff09c0c#  http://u.115.com/file/f663fc4a54# http://u.115.com/file/blpk4pv1 http://u.115.com/file/c4rjotdz http://u.115.com/file/f6a960aca8# http://u.115.com/file/efnn38jr http://u.115.com/file/c4leomjd http://u.115.com/file/dlpw9s6i http://u.115.com/file/f6d3cbebe0# http://u.115.com/file/f6de8062b2# http://u.115.com/file/ef8og8la http://u.115.com/file/f6f6391ac6# http://u.115.com/file/f628d256ae# http://u.115.com/file/f66a049dc9# http://u.115.com/file/f62bf1750a# http://u.115.com/file/f642e47260# http://u.115.com/file/f693eb7c89# http://u.115.com/file/f6ed68ba9b# http://u.115.com/file/f6f099c3f9# http://u.115.com/file/f61ac19339# http://u.115.com/file/f6f3c78d2c# http://u.115.com/file/f6696f6348# http://u.115.com/file/f6e88eeefb# http://u.115.com/file/f66471e4eb# http://u.115.com/file/f672da54ae# http://u.115.com/file/dnasw0kp# http://u.115.com/file/dnagnndx# http://u.115.com/file/clwr2xxg# http://u.115.com/file/bhbcnnwe# http://u.115.com/file/aq2rp9ga# http://u.115.com/file/e601turs# http://u.115.com/file/dn46qs7x# http://u.115.com/file/clwonrwg# http://u.115.com/file/dn43i7jf# http://u.115.com/file/bhbgrnfz# http://u.115.com/file/dnsl0kxp# 

I hope this article will help you with Python programming.


Python webpage capture code

Import urllib2
Print urllib2.urlopen ("www.baidu.com"). read ()

# Urllib2.urlopen () returns an object similar to a file, so read the content using read ().
# You can also save local files.

Outfile = open ("output.html", "w ")
Outfile. write (urllib2.urlopen ("www.baidu.com"). read ())
Outfile. close ()
Reference: docs.python.org/library/urllib2.html

The source code printed by Python is different from the source code on the web page.

I have also encountered this problem. For example, if Google Translate's return value is clearly displayed on the webpage, but the source code cannot be crawled, the reason is that some websites are anti-collected, this requires simulating the browser. Otherwise, more requests are required for cookies and account authentication.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.