python簡單爬資料

來源:互聯網
上載者:User

標籤:nbsp   while   log   headers   min   zh-cn   for   _for   targe   

失敗了,即使跟Firefox看到的headers,參數一模一樣都不行,爬出來有網頁,但是就是不給資料,嘗試禁用了js,然後看到了cookie(不禁用js是沒有cookie的),用這個cookie爬,還是不行,隔了時間再看,cookie的內容也並沒有變化,有點受挫,但還是發出來,也算給自己留個小任務啥的

如果有大佬經過,還望不吝賜教

另外另兩個網站的指令碼都可以用,過會直接放下代碼,過程就不說了

 

目標網站 http://www.geomag.bgs.ac.uk/data_service/models_compass/igrf_form.shtml

先解決一下date到decimal years的轉換,僅考慮到天的粗略轉換

def date2dy(year, month, day):    months = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]    oneyear = 365    if year%100 == 0:        if year%400 == 0:            months[1] = 29            oneyear = 366    else:        if year%4 == 0:            months[1] = 29            oneyear = 366    days = 0    i = 1    while i < month:        days = days + months[i]        i = i + 1    days = days + day - 1    return year + days/366

第一個小目標是抓下2016.12.1的資料

開啟FireFox的F12,調到網路一欄

提交資料得到

有用的資訊是要求標頭,請求網址和參數,扒下來扔到程式裡面試試

這塊我試了大概一天多,抓不下來,我好菜呀.jpg

放下代碼吧先,萬一有大佬經過還望不吝賜教

#!usr/bin/pythonimport requestsimport sysweb_url = r‘http://www.geomag.bgs.ac.uk/data_service/models_compass/igrf_form.shtml‘request_url = r‘http://www.geomag.bgs.ac.uk/cgi-bin/igrfsynth‘filepath = sys.path[0] + ‘\\data_igrf_raw_‘ + ‘.html‘fid = open(filepath, ‘w‘, encoding=‘utf-8‘)headers = {    ‘Host‘: ‘www.geomag.bgs.ac.uk‘,    ‘User-Agent‘: ‘Mozilla/5.0 (Windows NT 6.1; rv:53.0) Gecko/20100101 Firefox/53.0‘,    ‘Accept‘: ‘text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8‘,    ‘Accept-Language‘: ‘zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3‘,    ‘Accept-Encoding‘: ‘gzip, deflate‘,    ‘Content-Type‘: ‘application/x-www-form-urlencoded‘,    ‘Content-Length‘: ‘136‘,    ‘Referer‘: ‘http://www.geomag.bgs.ac.uk/data_service/models_compass/igrf_form.shtml‘,    ‘Connection‘: ‘keep-alive‘,    ‘Upgrade-Insecure-Requests‘: ‘1‘}payload = {    ‘name‘: ‘-‘,  # your name and email address    ‘coord‘: ‘1‘,  # ‘1‘: Geodetic ‘2‘: Geocentic    ‘date‘: ‘2016.92‘,  # decimal years    ‘alt‘: ‘150‘,  # Altitude    ‘place‘: ‘‘,    ‘degmin‘: ‘y‘,  # Position Coordinates: ‘y‘: In Degrees and Minutes ‘n‘: In Decimal Degrees    ‘latd‘: ‘60‘,  # latitude degrees (degrees negative for south)    ‘latm‘: ‘0‘,  # latitude minutes    ‘lond‘: ‘120‘,  # longitude degrees (degrees negative for west)    ‘lonm‘: ‘0‘,  # longitude minutes    ‘tot‘: ‘y‘,  # Total Intensity(F)    ‘dec‘: ‘y‘,  # Declination(D)    ‘inc‘: ‘y‘,  # Inclination(I)    ‘hor‘: ‘y‘,  # Horizontal Intensity(H)    ‘nor‘: ‘y‘,  # North Component (X)    ‘eas‘: ‘y‘,  # East Component (Y)    ‘ver‘: ‘y‘,  # Vertical Component (Z)    ‘map‘: ‘0‘,  # Include a Map of the Location: ‘0‘: NO ‘1‘: YES    ‘sv‘: ‘n‘}#如果需要Secular Variation (rate of change), 加上‘sv‘: ‘y‘r = requests.post(request_url, data=payload, headers=headers)fid.write(r.text)fid.close();

 

python簡單爬資料

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.