Python Crawler Learning 3-Simple crawl fiction web information

Source: Internet
Author: User

Fiction Net https://www.qu.la/paihangbang/

Function: Fetch the name of the novel and the corresponding link in each leaderboard, and then write it into the Excel table.

Press F12 to review the page elements to get the class of information you want to locate.

See the code for a detailed explanation.

#Coding:utf-8 #为了正常转码 must writeImportcodecs #为下面新建excel, transcoding properly prepared for a package__author__='Administrator'ImportRequests fromBs4ImportBeautifulSoup "" "
The get_html function is to crawl the HTML page of the corresponding URL and return to this page.
In fact, you can write all of a function, but it will appear that the function is very bloated.
This public function is written independently, encapsulated, and useful for later reuse.
"""defget_html (URL):Try: R= Requests.get (url,timeout = 3000) R.raise_for_statusR.encoding ='Utf-8' returnR.textexcept: return"""
The Get_content function is used to extract the information you need and to write the information to an Excel table.
"""defget_content (URL): Url_list= []html = get_html (URL). Encode ('Utf-8') Soup= BeautifulSoup (HTML,"Html.parser") Category_list= Soup.find_all ('Div', class_='index_toplist mright Mbottom') History_list= Soup.find_all ('Div', Class_ ='index_toplist Mbottom')     forCateinchCategory_list:name= Cate.find ('Div', Class_ ='Toptab'). Span.text name= Name.encode ('Utf-8') with Codecs.open ('Novel_list.csv','A +','Utf-8') as F:f.write ('\ n Fiction Category: {}\n'. Format (name)) Book_list= Cate.find ('Div', Class_ ='Topbooks'). Find_all ('Li')         forBookinchBook_list:link='http://www.qu.la/'+book.a['href'] Title= book.a['title'].encode ('Utf-8') url_list.append (link) with Codecs.open ('Novel_list.csv','A +','Utf-8') as F:f.write ('novel name: {} \ t novel address: {}\n'. Format (title,link)) forCateinchHistory_list:name= Cate.find ('Div', class_='Toptab'). Span.string with Codecs.open ('Novel_list.csv','A +','Utf-8') as F:f.write ("\ n Fiction type: {} \ n". Format (name)) General_list= Cate.find (style='Display:block;')#find the total leaderboardBook_list = General_list.find_all ('Li')         forBookinchBook_list:link='http://www.qu.la/'+ book.a['href'] Title= book.a['title'] Url_list.append (link) with Codecs.open ('Novel_list.csv','A +','Utf-8') as F:f.write ("novel name: {: <} \ t novel address: {: <} \ n". Format (title, link))returnurl_listdefMain ():#Leaderboard AddressBase_url ='http://www.qu.la/paihangbang/'    #get links to all the novels in the leaderboardUrl_list =get_content (Base_url)if __name__=='__main__': Main ()

This is mainly a record coding problem.

After the run is finished, it is a garbled Excel table.

And then we start the Debug.

Set breakpoints at each step, observe each variable name title What exactly are these codes?

Now this version, I basically have added. Encode (' Utf-8 ')

Each of them comes out as a string variable

However, after the addition or garbled.

Then I tried to write the message to the TXT document and found it successful.

So the problem is that writing excel,excel is not encoded correctly, so I changed it to Codecs.open (' novel_list.csv ', ' a ', ' utf-8 ')

Finally successfully solve the problem.

The idea is only for reference coding problem always has, really is the head big.

Python Crawler Learning 3-Simple crawl fiction web information

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.