Python crawler Combat

Source: Internet
Author: User

Demand

Crawl 2018 Sichuan University self-admission first instance through list information.

Have knowledge

1. Regular Expressions.

2.python basic syntax, crawler and database operations.

Operation

1. Crawl the Web page.

2. Parse out the required data.

3. Continue scratching the page and repeat the 12 steps until the last stop.

4. Store the parsed data in the database.

Instance

Using python3.6 and MySQL

Import Urllib.requestimport reimport pymysqldef catch_page (url_addr): try:page_data = Urllib.request.urlopen (ur L_ADDR). Read () except Urllib.            Urlerror as E:if hasattr (E, ' code '): Print (' server cannot accept request error code: ', E.code) elif hasattr (E, ' reason '): Print (' Unable to reach server, check URL and read reason!\n reason: ', E.reason) return page_datadef find_all_data (HTML): pattern = "<tr>[ \s\s]*?<td> (. *?) </td> "" [\s\s]*?<td> (. *?) </td> "" [\s\s]*?<td> (. *?) </td> "" [\s\s]*?<td> (. *?) </td>[\s\S]*?</tr> "#[\s\s]*" matches any character UserData = Re.findall (pattern,html) return userdatadef Add_to_mysql ( USERDATAS): conn = Pymysql.connect (host= ' 127.0.0.1 ', port=3306, user= ' root ', passwd= ' root ', db= ' scdx_zzzs_db ', CharSet = ' UTF8 ') cursor = Conn.cursor () for userdata in userdatas:sql = "INSERT INTO student (name,sex,school,provinc e) VALUES ('%s ', '%s ', '%s ', '%s '); "% (Userdata[0], userdata[1],USERDATA[2], userdata[3]) try:cursor.execute (SQL) print ("√ Execution succeeded----->>" + SQL) E Xcept:print ("x execution failed----->>" + sql) Conn.commit () Cursor.close () conn.close () Userdatas = []i = 0w    Hile i<=3300:url = "https://gaokao.chsi.com.cn/zzbm/mdgs/detail.action?oid=476754340&lx=1&start=%d"% i html = catch_page (URL). Decode () Userdatas.extend (Find_all_data (HTML)) print (i) i + = 30add_to_mysql (Userdatas)

After execution, the data is successfully stored in the database from the Web page fetch.

Python crawler Combat

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.