1. Crawl Page http://www.quanshu.net/book/9/9055/
2, use the module urllib (Web download), re regular match get title and titleurl,urlparse (stitching full URL), mysqldb (import MySQL) database
3. For loop traversal list Get tomb Grave Note chapter title and Titleurl
4. Try except exception handling
5. Python Code
#-*-coding:utf-8-*-import urllibimport reimport urlparseimport mysqldbrooturl= ' http://www.quanshu.net/book/9/9055/ ' Def getlist (URL): Html=urllib.urlopen (URL). Read () html=html.decode (' gb2312 '). Encode (' utf-8 ') reg=r ' <li>< A href= "(. *?)" title= ". *? > (. *?) </a></li> ' return Re.findall (reg,html) try:conn = mysqldb.connect (host= ' localhost ', user= ' root ', passwd= ' 123456 ', db= ' local_db ', port=3306, charset= ' UTF8 ') with conn:cursor = Conn.cursor () drop_table_sql= ' DRO P TABLE IF EXISTS Daomubiji ' Cursor.execute (drop_table_sql) conn.commit () Create_table_sql = ' ' CREATE TABLE Daomubiji (ID INT (one), title varchar (255), Titleurl varchar (255) ) Engine=innodb DEFAULT Charset=utf8 ' Cursor.execute (create_table_sql) conn.commit () URL List = GetList (Rooturl) #href属性取得的url不完整 only the right half of the full URL is removed so the following for loop variable name is named Righturl id=0 for Righturl In urllist:title = righturl[1] Newurl = righturl[0] The Urlparse.urljoin method of the #urlparse module will rig Hturl in rooturl format into full URL Titleurl = Urlparse.urljoin (Rooturl, Newurl) id+=1 print Id,tit Le, Titleurl cursor.execute ("INSERT into Daomubiji values (%s,%s,%s)", (Id,title, Titleurl)) conn.co Mmit () print "entered" + str (ID) + "bar data" except MySQLdb.Error:print "Connection failed! "
Code Execution Status:
6, MySQL database query whether import success
SELECT * from Daomubiji
7. Successful execution
python2.7 Crawler _ Crawl The novel Tomb Raider chapter and URL and import the MySQL database _20161201