一個python小爬蟲__python

來源:互聯網
上載者:User

iPhone6剛出的時候,移動官網有個每滿十萬送一台iPhone,當時用這個小爬蟲自動爬,快到十萬的時候開啟瀏覽器(這個版本好像沒加這個功能),哈哈,現在用不了,這篇從我扣扣空間(那時我還年少,只知道QQ)轉過來的,扣扣空間顯示代碼太糟糕了,遷過來,裡面有句不錯設定本地編碼,轉碼使用urllib2和cookeie,json等要素的小爬蟲,還是有點參考價值。

# -*- coding: utf-8 -*-import sysimport urllib2import cookielibimport jsonimport timeurl = "http://service.js.10086.cn/act_js/activity_web/1319/home.html"getNumUrl = "http://service.js.10086.cn/cmp_service/actionDispatcher.do"# 添加 cookie 到 urllib2cj = cookielib.LWPCookieJar()cookie_support = urllib2.HTTPCookieProcessor(cj)opener = urllib2.build_opener(cookie_support, urllib2.HTTPHandler)urllib2.install_opener(opener)# 偽裝成瀏覽器的頭, 移動真蛋疼檢測瀏覽器的headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36',           'Origin': 'http://service.js.10086.cn', 'Referer': 'http://service.js.10086.cn/act_js/activity_web/1319/index.html?WT.mc_ev=1412RRHB_JL'}req = urllib2.Request(url, headers=headers)urllib2.urlopen(req)type = sys.getfilesystemencoding()     # 設定本地編碼# 列印第一訪問的網頁#content = urllib2.urlopen(req).read()# print content.decode("UTF-8").encode(type)  # convert encode format# 請求參數postData = """jsonParam=%5B%7B%22activityCode%22%3A%221319%22%2C%22dynamicURI%22%3A%22doubleAct%22%2C%22dynamicParameter%22%3A%7B%22method%22%3A%22queryUserLoginInfo%22%2C%22actStageCode%22%3A%221319%22%7D%2C%22dynamicDataNodeName%22%3A%22API_queryUserLoginInfo_doubleAct%22%2C%22dynamicPriority%22%3A1%7D%5D"""request = urllib2.Request(getNumUrl, postData, headers)response = urllib2.urlopen(request)text = response.read()nextNum = int(json.loads(text)[              "API_queryUserLoginInfo_doubleAct"]["resultObj"]["nextNum"])max = 0print "下一輪上限", nextNum, "萬, 計數中最後一百進行倒計時"while True:    response = urllib2.urlopen(request)    text = response.read()    # print text.decode("UTF-8").encode(type)    num = int(json.loads(text)[            "API_queryUserLoginInfo_doubleAct"]["resultObj"]["fnum"])    if num > max:        max = num        print max    if num > nextNum * 10000:  # 大於某個上限後停止        break    elif num > nextNum * 10000 - 100:  # 最後一百個加速        time.sleep(0.1)    elif num > nextNum * 10000 - 500:        time.sleep(0.5)    elif num > nextNum * 10000 - 1000:        time.sleep(1)    else:        time.sleep(3)
相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.