Build a simple crawler system with python3.x and MySQL database

Source: Internet
Author: User

This is the first article in the blog park, because I am still a programming rookie, also can not write those tall on the great article, this article is on their own this time to learn python a summary of it.

It is well known that Python is a very friendly programming language for beginners, like Ben, who has a good liking for it! Of course, there is a lot to learn when you want to master it. That's nonsense, I'll talk about it. How to build a simple crawler system with python3.x and MySQL database (in fact, the content that crawls from the webpage is stored in the MySQL database).

The first is to build the environment, here is a brief introduction of my environment it. The operating system of this machine is Win7,python version is the 3.3,mysql database version is the 5.6,mysql-wokebench version is 5.2.

After the environment has been set up, you can start to write crawlers. The site of the experiment here is the mouse painting animation network (I am a manga fan ^_^). First go to the site casing, looking for what we need, here I only need to crawl every word of the site

The comic's name is linked to the pictures of each comic. Here's the code right here.

Import urllib.requestimport refrom mysql.connector import * #爬取整个网页的方法def open_url (URL): Req=urllib.request.request ( URL) respond=urllib.request.urlopen (req) Html=respond.read (). Decode (' Utf-8 ') return html# crawl each page in each of the words comics corresponding link def get _url_list (URL): Html=open_url (URL) p=re.compile (R ' <a href= "(. +)" title= ". + <br>.+?" > ') url_list=re.findall (p,html) return url_list# automatically enter each link in the comic to crawl each image corresponding to the link and insert into the MySQL database def get_img (URL): #获取每个 Each word in the page corresponds to the link url_list=get_url_list (URL) #连接mysql数据库 conn=connect (user= ' root ', password= ', database= ' test2 ') #创 Build cursor C=conn.cursor () Try: #创建一张数据库表 c.execute (' CREATE table cartoon (name varchar (), IMG varchar (100)) ') except: #count用来计算每一张网页有多少行数据被插入 count=0 for Each_url in Url_list:html=open _url (Each_url) p1=re.compile (R '  (. +) </    H1> ') Img_list=re.findall (p1,html)        Title=re.findall (p2,html) for each_img in Img_list:c.execute (' insert INTO cartoon value S (%s,%s) ', [title[0],each_img]) count+=c.rowcount print ('%d rows of data are inserted '%count) finally: #提交数据, which        A step is very important oh!        Conn.commit () #以下两步把游标与数据库连接都关闭, this is also necessary! C.close () Conn.close () num=int (Input (' previous pages: ')) for I in range (num): Url= '                                         http://www.ishuhui.com/page/' +str (i+1) get_img (URL)

This is the result of the database:

The code has been commented out very clearly. It is important to note that to download the Mysql-connector-python module, which is a Python module connected to MySQL, directly

pip install mysql-connector-python --allow-external mysql-connector-python  

可以看出用python写爬虫并把数据存入数据库是很简单的,这也是python优雅的地方!当然,这只是一个很简单的爬虫系统,还有很多细节要去完善,只适合小
数据。但是学习都是从简单的开始嘛。


Http://www.cnblogs.com/tester-zhenghan/p/4887838.html

Build a simple crawler system with python3.x and MySQL database (turn)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.