Python crawler learning notes for beginners (with notes) and python learning notes

Source: Internet
Author: User

Python crawler learning notes for beginners (with notes) and python learning notes

1. Install the programming tool and enter the Programming Interface

First, go to notepad notebook and press enter (win7). It will automatically open the editing page (browser-based) for you. Click the new button to create a new Python3 editing box, and a new window will pop up, now you can type the code.

2. Crawl the entire page

3. Crawl the text of the specified tag

4. Common Code

A = soup. select ('A ')
L = len (a) # length of array
Aa = a [0]. contents # Content of the first a tag
Aa. strip () # Remove trailing Spaces
Type (a) # data type of
Dt = datetime. strptime (timestr, '% Y % m month % d % H: % m') # String Conversion time
Dt. strftime ('% Y-% m-% D') # convert time to string
Soup. select ('# div p') [:-1] # select id as all p elements except the last P element under the div tag
Article = [] # define a list
Article. append (a [0]. text) # append an element to the list
'@'. Join (article) # Separate the elements in article with the '@' symbol and convert them to strings.
[P. text. strip () for p in soup. select ('# artibody p')] # returns a list with the content p. text
Newsurl. split ('/') # string segmentation
Newsurl.rstrip('.html ') # Remove the specified character at the end of the string
Newsurl. lstrip ('aaa') # Remove the specified character from the string
Re.search('aaa(.w.20..html ') # capture the string and the re module must be introduced.
Jd = json. loads (comments. text. strip ('var data = ') # To Read json, You need to introduce the json module.
CommentURL. format ('gda') # Replace '{}' in commentURL with 'gda'
Def getNewsDetial (newsurl) # define a function with the parameter newsurl

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.