Python crawler learning notes for beginners (with notes) and python learning notes
1. Install the programming tool and enter the Programming Interface
First, go to notepad notebook and press enter (win7). It will automatically open the editing page (browser-based) for you. Click the new button to create a new Python3 editing box, and a new window will pop up, now you can type the code.
2. Crawl the entire page
3. Crawl the text of the specified tag
4. Common Code
A = soup. select ('A ')
L = len (a) # length of array
Aa = a [0]. contents # Content of the first a tag
Aa. strip () # Remove trailing Spaces
Type (a) # data type of
Dt = datetime. strptime (timestr, '% Y % m month % d % H: % m') # String Conversion time
Dt. strftime ('% Y-% m-% D') # convert time to string
Soup. select ('# div p') [:-1] # select id as all p elements except the last P element under the div tag
Article = [] # define a list
Article. append (a [0]. text) # append an element to the list
'@'. Join (article) # Separate the elements in article with the '@' symbol and convert them to strings.
[P. text. strip () for p in soup. select ('# artibody p')] # returns a list with the content p. text
Newsurl. split ('/') # string segmentation
Newsurl.rstrip('.html ') # Remove the specified character at the end of the string
Newsurl. lstrip ('aaa') # Remove the specified character from the string
Re.search('aaa(.w.20..html ') # capture the string and the re module must be introduced.
Jd = json. loads (comments. text. strip ('var data = ') # To Read json, You need to introduce the json module.
CommentURL. format ('gda') # Replace '{}' in commentURL with 'gda'
Def getNewsDetial (newsurl) # define a function with the parameter newsurl