pyhton 從web擷取json資料 儲存到本地然後再讀取

來源:互聯網
上載者:User

標籤:odi   文本   檔案   pos   api   awt   鄭州   txt   ack   

從web中擷取json資料直接進行處理總認為太慢。主要是從web中擷取擷取資料的過程有點慢。

所以就在想 假設先利用空暇時間把json資料擷取並儲存到本地,然後再從本地檔案裡讀取和操作。應該就要快非常多。

這中間的主要問題就是格式的轉換問題:1、將擷取取的json資料序列化後儲存到本地檔案裡;2、從檔案裡逐行讀取資料,再將其還原序列化為json格式。

詳細的一個示範範例程式例如以下:

【為保護個人資訊,程式中的一些代碼不完整】

from urllib.request import urlopen;from urllib.parse import quote;import json;#方法作用:從ltp-cloud雲平台中擷取json資料,並儲存到本地檔案裡#方法參數:sent是被處理的句子參數def getAndSaveJSON(sent):        #產生訪問目標url    url_get_base = "http://api.ltp-cloud.com/analysis/?"    api_key = ‘你的api_key值‘    text = quote(sent); #利用quote方法將url中的漢字進行轉碼    format = ‘json‘;    pattern = ‘all‘;    fullurl=url_get_base+"api_key="+api_key+"&text="+text+"&format="+format+"&pattern="+pattern        try:        #擷取json資料        rawtext=urlopen(fullurl,timeout=15).read();        jsonStr = json.loads(rawtext.decode(‘utf8‘));                #將ltp處理結果儲存到文本中        f=open("txt/jsondatafile.json","a",encoding="utf8");         f.write(json.dumps(jsonStr[0][0],ensure_ascii=False)+"\n"); #儲存前,須要將jsonStr序列化為python相對的資料類型。去掉最後的分行符號        f.close();    except Exception as err:        print(err);        print(fullurl);        print(‘url 訪問出錯‘);#調用方法擷取並儲存json資料getAndSaveJSON("河南省公安廳曾以涉嫌騙取出入境證件,對山西前首富張新明進行通緝,並懸賞500元。");getAndSaveJSON("鄭州是河南的省會。");#從jsondatafile.json中讀取出json資料for eachLine in open("txt/jsondatafile.json","r",encoding="utf8"):    jsonData=json.loads(eachLine);#還原序列化,得到json格式資料    print(jsonData)


結果:

[{‘cont‘: ‘河南省‘, ‘parent‘: 1, ‘relate‘: ‘ATT‘, ‘ne‘: ‘B-Ni‘, ‘pos‘: ‘ns‘, ‘arg‘: [], ‘id‘: 0}, {‘cont‘: ‘公安廳‘, ‘parent‘: 5, ‘relate‘: ‘SBV‘, ‘ne‘: ‘E-Ni‘, ‘pos‘: ‘n‘, ‘arg‘: [], ‘id‘: 1}, {‘cont‘: ‘曾‘, ‘parent‘: 5, ‘relate‘: ‘ADV‘, ‘ne‘: ‘O‘, ‘pos‘: ‘d‘, ‘arg‘: [], ‘id‘: 2}, {‘cont‘: ‘以‘, ‘parent‘: 5, ‘relate‘: ‘ADV‘, ‘ne‘: ‘O‘, ‘pos‘: ‘p‘, ‘arg‘: [], ‘id‘: 3}, {‘cont‘: ‘涉嫌‘, ‘parent‘: 3, ‘relate‘: ‘POB‘, ‘ne‘: ‘O‘, ‘pos‘: ‘v‘, ‘arg‘: [], ‘id‘: 4}, {‘cont‘: ‘騙取‘, ‘parent‘: -1, ‘relate‘: ‘HED‘, ‘ne‘: ‘O‘, ‘pos‘: ‘v‘, ‘arg‘: [{‘type‘: ‘A0‘, ‘end‘: 1, ‘id‘: 0, ‘beg‘: 0}, {‘type‘: ‘ADV‘, ‘end‘: 2, ‘id‘: 1, ‘beg‘: 2}, {‘type‘: ‘MNR‘, ‘end‘: 4, ‘id‘: 2, ‘beg‘: 3}, {‘type‘: ‘A1‘, ‘end‘: 7, ‘id‘: 3, ‘beg‘: 6}], ‘id‘: 5}, {‘cont‘: ‘出入境‘, ‘parent‘: 7, ‘relate‘: ‘ATT‘, ‘ne‘: ‘O‘, ‘pos‘: ‘j‘, ‘arg‘: [], ‘id‘: 6}, {‘cont‘: ‘證件‘, ‘parent‘: 5, ‘relate‘: ‘VOB‘, ‘ne‘: ‘O‘, ‘pos‘: ‘n‘, ‘arg‘: [], ‘id‘: 7}, {‘cont‘: ‘,‘, ‘parent‘: 5, ‘relate‘: ‘WP‘, ‘ne‘: ‘O‘, ‘pos‘: ‘wp‘, ‘arg‘: [], ‘id‘: 8}, {‘cont‘: ‘對‘, ‘parent‘: 14, ‘relate‘: ‘ADV‘, ‘ne‘: ‘O‘, ‘pos‘: ‘p‘, ‘arg‘: [], ‘id‘: 9}, {‘cont‘: ‘山西‘, ‘parent‘: 12, ‘relate‘: ‘ATT‘, ‘ne‘: ‘S-Ns‘, ‘pos‘: ‘ns‘, ‘arg‘: [], ‘id‘: 10}, {‘cont‘: ‘前‘, ‘parent‘: 12, ‘relate‘: ‘ATT‘, ‘ne‘: ‘O‘, ‘pos‘: ‘nd‘, ‘arg‘: [], ‘id‘: 11}, {‘cont‘: ‘首富‘, ‘parent‘: 13, ‘relate‘: ‘ATT‘, ‘ne‘: ‘O‘, ‘pos‘: ‘n‘, ‘arg‘: [], ‘id‘: 12}, {‘cont‘: ‘張新明‘, ‘parent‘: 9, ‘relate‘: ‘POB‘, ‘ne‘: ‘S-Nh‘, ‘pos‘: ‘nh‘, ‘arg‘: [], ‘id‘: 13}, {‘cont‘: ‘進行‘, ‘parent‘: 5, ‘relate‘: ‘COO‘, ‘ne‘: ‘O‘, ‘pos‘: ‘v‘, ‘arg‘: [{‘type‘: ‘A1‘, ‘end‘: 15, ‘id‘: 0, ‘beg‘: 15}], ‘id‘: 14}, {‘cont‘: ‘通緝‘, ‘parent‘: 14, ‘relate‘: ‘VOB‘, ‘ne‘: ‘O‘, ‘pos‘: ‘v‘, ‘arg‘: [], ‘id‘: 15}, {‘cont‘: ‘,‘, ‘parent‘: 5, ‘relate‘: ‘WP‘, ‘ne‘: ‘O‘, ‘pos‘: ‘wp‘, ‘arg‘: [], ‘id‘: 16}, {‘cont‘: ‘並‘, ‘parent‘: 18, ‘relate‘: ‘ADV‘, ‘ne‘: ‘O‘, ‘pos‘: ‘c‘, ‘arg‘: [], ‘id‘: 17}, {‘cont‘: ‘懸賞‘, ‘parent‘: 5, ‘relate‘: ‘COO‘, ‘ne‘: ‘O‘, ‘pos‘: ‘v‘, ‘arg‘: [{‘type‘: ‘A1‘, ‘end‘: 19, ‘id‘: 0, ‘beg‘: 19}], ‘id‘: 18}, {‘cont‘: ‘500‘, ‘parent‘: 20, ‘relate‘: ‘ATT‘, ‘ne‘: ‘O‘, ‘pos‘: ‘m‘, ‘arg‘: [], ‘id‘: 19}, {‘cont‘: ‘元‘, ‘parent‘: 18, ‘relate‘: ‘VOB‘, ‘ne‘: ‘O‘, ‘pos‘: ‘q‘, ‘arg‘: [], ‘id‘: 20}, {‘cont‘: ‘。‘, ‘parent‘: 5, ‘relate‘: ‘WP‘, ‘ne‘: ‘O‘, ‘pos‘: ‘wp‘, ‘arg‘: [], ‘id‘: 21}][{‘cont‘: ‘鄭州‘, ‘parent‘: 1, ‘relate‘: ‘SBV‘, ‘ne‘: ‘S-Ns‘, ‘pos‘: ‘ns‘, ‘arg‘: [], ‘id‘: 0}, {‘cont‘: ‘是‘, ‘parent‘: -1, ‘relate‘: ‘HED‘, ‘ne‘: ‘O‘, ‘pos‘: ‘v‘, ‘arg‘: [{‘type‘: ‘A0‘, ‘end‘: 0, ‘id‘: 0, ‘beg‘: 0}, {‘type‘: ‘A1‘, ‘end‘: 4, ‘id‘: 1, ‘beg‘: 2}], ‘id‘: 1}, {‘cont‘: ‘河南‘, ‘parent‘: 4, ‘relate‘: ‘ATT‘, ‘ne‘: ‘S-Ns‘, ‘pos‘: ‘ns‘, ‘arg‘: [], ‘id‘: 2}, {‘cont‘: ‘的‘, ‘parent‘: 2, ‘relate‘: ‘RAD‘, ‘ne‘: ‘O‘, ‘pos‘: ‘u‘, ‘arg‘: [], ‘id‘: 3}, {‘cont‘: ‘省會‘, ‘parent‘: 1, ‘relate‘: ‘VOB‘, ‘ne‘: ‘O‘, ‘pos‘: ‘n‘, ‘arg‘: [], ‘id‘: 4}, {‘cont‘: ‘。‘, ‘parent‘: 1, ‘relate‘: ‘WP‘, ‘ne‘: ‘O‘, ‘pos‘: ‘wp‘, ‘arg‘: [], ‘id‘: 5}]


pyhton 從web擷取json資料 儲存到本地然後再讀取

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.