抓取趕集app資料

來源:互聯網
上載者:User

標籤:index   eric   lte   list   sha   http   val   header   UI   

#!/usr/bin/env python# -*- coding:utf-8 -*-import jsonimport requestsurl = "https://app.ganji.com/datashare/"headers = {            "Content-Type": "application/x-www-form-urlencoded",            "userid": "C1ED10776D9B6108D8FEFEE4EA53058A",            "model":"Generic/iphone",            "customerid":"705",            "clientagent":"iPhone 6S Plus#414*736#11.0.3",            "versionid":"8.3.0",            "os":"ios",            "net":"wifi",            "dv":"iPhone 6S Plus",            "interface":"SearchPostsByJson3",            "accept-language":"zh-cn",        }def req(url, headers, data):    content = None    try:        r = requests.post(url, headers=headers, data=data, timeout=5)        content = r.json()    except Exception as e:        print("requests error: ", e, "requests url: ", url)    return contentdef get_ganji_list_data():    # 擷取列表資料    data = ‘t=-576747455&&showType=0&showtype=0&jsonArgs={"pageSize":20,"cityScriptIndex":2300,"majorCategoryScriptIndex":7,"queryFilters":[],"categoryId":7,"andKeywords":[{"name":"title","value":"%E5%95%86%E9%93%BA%E5%87%BA%E5%94%AE"}],"customerId":"705","sortKeywords":[{"field":"post_at","sort":"desc"}],"pageIndex":1}‘    ganji_data = req(url, headers, data)    if ganji_data is not None:        return ganji_data    return Nonedef get_article_data():    ganji_data = get_ganji_list_data()    if ganji_data is not None:        data_list = ganji_data["posts"]        print("count: ", ganji_data["total"])        for data_ in data_list:            title, d_sign, puid = data_["title"], data_["d_sign"], data_["puid"]            print(title, d_sign)            data_article = "d_sign={0}&cityId=176&post_type_for_maidian=5&categoryId=7&spfy=0".format(d_sign)            # 根據 puid 擷取詳細資料. puid  需放在headers中            headers["interface"] = "GetPostByPuid"            headers["puid"] = puid            content_data = req(url, headers, data_article)            if content_data["status"] == 0:                data = content_data["data"]                end_data = {}                end_data["price"] = data["price"]["v"]                end_data["price_unit"] = data["price"]["u"]                end_data["title"] = data["title"]                end_data["city"] = data["city"]                end_data["description"] = data["description"]                end_data["district_name"] = data["district_name"]                end_data["street_name"] = data["street_name"]                end_data["latlng"] = data["latlng"]                end_data["id"] = data["id"]            time.sleep(2)

header裡東西真多,最終測試 只需要這幾種,累死寶寶了,

抓取趕集app資料

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.