python2.7 Crawler _ Crawl The novel Tomb Raider chapter and URL and import the MySQL database _20161201

Source: Internet
Author: User

1. Crawl Page http://www.quanshu.net/book/9/9055/

2, use the module urllib (Web download), re regular match get title and titleurl,urlparse (stitching full URL), mysqldb (import MySQL) database

3. For loop traversal list Get tomb Grave Note chapter title and Titleurl

4. Try except exception handling

5. Python Code

#-*-coding:utf-8-*-import urllibimport reimport urlparseimport mysqldbrooturl= ' http://www.quanshu.net/book/9/9055/ ' Def getlist (URL): Html=urllib.urlopen (URL). Read () html=html.decode (' gb2312 '). Encode (' utf-8 ') reg=r ' <li>&lt A href= "(. *?)" title= ". *? > (. *?) </a></li> ' return Re.findall (reg,html) try:conn = mysqldb.connect (host= ' localhost ', user= ' root ', passwd= ' 123456 ', db= ' local_db ', port=3306, charset= ' UTF8 ') with conn:cursor = Conn.cursor () drop_table_sql= ' DRO           P TABLE IF EXISTS Daomubiji ' Cursor.execute (drop_table_sql) conn.commit () Create_table_sql = ' '           CREATE TABLE Daomubiji (ID INT (one), title varchar (255), Titleurl varchar (255) ) Engine=innodb DEFAULT Charset=utf8 ' Cursor.execute (create_table_sql) conn.commit () URL List = GetList (Rooturl) #href属性取得的url不完整 only the right half of the full URL is removed so the following for loop variable name is named Righturl id=0 for Righturl In urllist:title = righturl[1] Newurl = righturl[0] The Urlparse.urljoin method of the #urlparse module will rig Hturl in rooturl format into full URL Titleurl = Urlparse.urljoin (Rooturl, Newurl) id+=1 print Id,tit Le, Titleurl cursor.execute ("INSERT into Daomubiji values (%s,%s,%s)", (Id,title, Titleurl)) conn.co Mmit () print "entered" + str (ID) + "bar data" except MySQLdb.Error:print "Connection failed! "

Code Execution Status:

6, MySQL database query whether import success

SELECT * from Daomubiji

  

7. Successful execution

python2.7 Crawler _ Crawl The novel Tomb Raider chapter and URL and import the MySQL database _20161201

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.