今天嘗試用Python去論壇抓了一些資料

來源:互聯網
上載者:User

標籤:

抓取結果:

Year: 15Fall
Degree: MS
Offer/Rej: Rej
Major: CS
University: Rutgers
T:
GRE:
GPA: ()
Detailed Major:
BackGround: 本科其他
Abroad_BackGround:

 

原始碼如下:

# -*- coding: utf-8 -*-

import urllib.parse

import urllib.request

 

url = ‘http://www.1point3acres.com/bbs/forum.php?mod=forumdisplay&fid=82&sortid=164&sortid=164&page=2‘

req = urllib.request.Request(url)

response = urllib.request.urlopen(req) the_page = response.read()

con1 = the_page.decode(‘gbk‘)

 

year_start = con1.find(‘#666">‘) year_end = con1.find(‘</font>‘,year_start)

degree_start = con1.find(‘blue">‘,year_end) degree_end = con1.find(‘</font>‘,degree_start)

offer_start = con1.find(‘"black"><b>‘,degree_end) offer_end = con1.find(‘</b>]</font>‘,offer_start)

major_start = con1.find(‘"#F60"><b>‘,offer_end) major_end = con1.find(‘</b></font>‘,major_start)

school_start = con1.find(‘"#00B2E8">‘,major_end) school_end = con1.find(‘</font>‘,school_start)

t_start = con1.find(‘T</b>:‘,school_end) t_end = con1.find(‘</font>‘,t_start)

g_start = con1.find(‘<b>G</b>‘,t_end) g_end = con1.find(‘</font>‘,g_start)

major2_start = con1.find(‘<font color="green">‘,g_end) major2_end = con1.find(‘</font>‘,major2_start)

gpa_start = con1.find(‘<font color="darkcyan">‘,major2_end) gpa_end = con1.find(‘</font>‘,gpa_start)

homebj_start = con1.find(‘<font color="purple">‘,gpa_end) homebj_end = con1.find(‘</font>‘,homebj_start)

abroadbj_start = con1.find(‘<font color="hotpink">‘,homebj_end) abroadbj_end = con1.find(‘</font>‘,abroadbj_start)

 

year = con1[year_start + 7 :year_end]

degree = con1[degree_start + 6 : degree_end]

offer = con1[offer_start + 11 : offer_end]

major = con1[major_start + 10 : major_end]

school = con1[school_start + 10 : school_end]

toefl = con1[t_start + 6 : t_end] gre = con1[g_start + 9 : g_end]

major2 = con1[major2_start+ 20:major2_end]

gpa = con1[gpa_start + 23 : gpa_end]

homebj = con1[homebj_start + 21 : homebj_end]

abroadbj = con1[abroadbj_start + 22 : abroadbj_end]

 

print ("=======++========")

print("Year: "+year)

print("Degree: "+degree)

print("Offer/Rej: "+offer)

print("Major: "+major)

print("University: " +school)

print("T: " + toefl)

print("GRE: " + gre)

print("GPA: "+ gpa)

print("Detailed Major: "+major2)

print("BackGround: "+homebj)

print("Abroad_BackGround: "+abroadbj)

 

con =str(con1)

fin = open("day01.txt",‘w‘)

fin.write(year + "===" + degree+"===" +offer +"===" + major + "===" + school + "==="+ toefl + "===" + gre + "==="+major2 +"==="+gpa + "==="+homebj + "==="+ abroadbj)

fin.close()

 

今天嘗試用Python去論壇抓了一些資料

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.