International - English

Cart Console

Topic Center

Contact Sales

首頁 > 開發者 > MongoDB

python使用pymongo訪問MongoDB的基本操作，以及CSV檔案匯出__python

最後更新：2018-07-28 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

python使用pymongo訪問MongoDB的基本操作，以及CSV檔案匯出 1. 環境。

Python：3.6.1
Python IDE：pycharm
系統：win7 2. 簡單樣本

import pymongo# mongodb服務的地址和連接埠號碼mongo_url = "127.0.0.1:27017"# 串連到mongodb，如果參數不填，預設為“localhost:27017”client = pymongo.MongoClient(mongo_url)#串連到資料庫myDatabaseDATABASE = "myDatabase"db = client[DATABASE]#串連到集合(表):myDatabase.myCollectionCOLLECTION = "myCollection"db_coll = db[COLLECTION ]# 在表myCollection中尋找date欄位等於2017-08-29的記錄，並將結果按照age從大到小排序queryArgs = {'date':'2017-08-29'}search_res = db_coll.find(queryArgs).sort('age',-1)for record in search_res:      print(f"_id = {record['_id']}, name = {record['name']}, age = {record['age']}")

3. 要點 針對讀操作，進行資料統計，盡量使用多線程，節省時間，只是要注意線程數量，會大量吃記憶體。 4. mongoDB的資料類型

MongoDB支援許多資料類型，如下：字串 - 用於儲存資料的最常用的資料類型。MongoDB中的字串必須為UTF-8。整型 - 用於儲存數值。整數可以是32位或64位，具體取決於伺服器。 布爾類型 - 用於儲存布爾值(true / false)值。 雙精確度浮點數 - 用於儲存浮點值。 最小/最大鍵 - 用於將值與最小和最大BSON元素進行比較。數組 - 用於將數組或列表或多個值儲存到一個鍵中。 時間戳記 - ctimestamp，當文檔被修改或添加時，可以方便地進行錄製。對象 - 用於嵌入式文檔。對象 - 用於嵌入式文檔。 Null - 用於儲存Null值。符號 - 該資料類型與字串相同; 但是，通常保留用於使用特定符號類型的語言。日期 - 用於以UNIX時間格式儲存當前日期或時間。您可以通過建立日期對象並將日，月，年的 - 日期進行指定自己需要的日期時間。 對象ID - 用於儲存文檔的ID。 位元據 - 用於儲存位元據。代碼 - 用於將JavaScript代碼儲存到文檔中。 Regex - 用於儲存Regex。

不支援的資料類型： python中的集合（set） 5. 對錶（集合collection）的操作

import pymongo# mongodb服務的地址和連接埠號碼mongo_url = "127.0.0.1:27017"# 串連到mongodb，如果參數不填，預設為“localhost:27017”client = pymongo.MongoClient(mongo_url)#串連到資料庫myDatabaseDATABASE = "amazon"db = client[DATABASE]#串連到集合(表):myDatabase.myCollectionCOLLECTION = "galance20170801"db_coll = db[COLLECTION]

5.1. 尋找記錄：find （5.1.1）指定返回哪些欄位

# 樣本一：所有欄位# select * from galance20170801searchRes = db_coll.find()# 或者searchRes = db_coll.find({})

# 樣本二：用字典指定要顯示的哪幾個欄位# select _id,key from galance20170801queryArgs = {}projectionFields = {'_id':True, 'key':True}  # 用字典指定searchRes = db_coll.find(queryArgs, projection = projectionFields)# 結果{'_id': 'B01EYCLJ04', 'key': 'pro audio'}

# 樣本三：用字典指定去掉哪些欄位queryArgs = {}projectionFields = {'_id':False, 'key':False}  # 用字典指定searchRes = db_coll.find(queryArgs, projection = projectionFields)# 結果{'activity': False, 'avgStar': 4.3,  'color': 'Yellow & Black', 'date': '2017-08-01'}

# 樣本四：用列表指定要顯示哪幾個欄位# select _id,key,date from galance20170801queryArgs = {}projectionFields = ['key','date']  # 用列表指定，結果中一定會返回_id這個欄位searchRes = db_coll.find(queryArgs, projection = projectionFields)# 結果{'_id': 'B01EYCLJ04', 'date': '2017-08-01', 'key': 'pro audio'}

（5.1.2）指定查詢條件
（5.1.2.1）. 比較：=，。=，>, <, >=, <=

$ne：不等於(not equal)$gt：大於(greater than)$lt：小於(less than)$lte：小於等於(less than equal)$gte：大於等於(greater than equal)

# 樣本一：相等# select _id,key,sales,date from galance20170801 where key = 'TV & Video'queryArgs = {'key':'TV & Video'}projectionFields = ['key','sales','date']searchRes = db_coll.find(queryArgs, projection = projectionFields)# 結果：{'_id': '0750699973', 'date': '2017-08-01', 'key': 'TV & Video', 'sales': 0}

# 樣本二：不相等# select _id,key,sales,date from galance20170801 where sales != 0queryArgs = {'sales':{'$ne':0}}projectionFields = ['key','sales','date']searchRes = db_coll.find(queryArgs, projection = projectionFields)# 結果：{'_id': 'B01M996469', 'date': '2017-08-01', 'key': 'stereos', 'sales': 2}

# 樣本三：大於 # where sales > 100queryArgs = {'sales':{'$gt':100}}# 結果：{'_id': 'B010OYASRG', 'date': '2017-08-01', 'key': 'Sound Bar', 'sales': 124}

# 樣本四：小於 # where sales < 100queryArgs = {'sales':{'$lt':100}}# 結果：{'_id': 'B011798DKQ', 'date': '2017-08-01', 'key': 'pro audio', 'sales': 0}

# 樣本五：指定範圍 # where sales > 50 and sales < 100queryArgs = {'sales':{'$gt':50, '$lt':100}}# 結果：{'_id': 'B008D2IHES', 'date': '2017-08-01', 'key': 'Sound Bar', 'sales': 66}

# 樣本六：指定範圍，大於等於，小於等於 # where sales >= 50 and sales <= 100queryArgs = {'sales':{'$gte':50, '$lte':100}}# 結果：{'_id': 'B01M6DHW26', 'date': '2017-08-01', 'key': 'radios', 'sales': 50}

（5.1.2.2）. and

# 樣本一：不同欄位，並列條件 # where date = '2017-08-01' and sales = 100queryArgs = {'date':'2017-08-01', 'sales':100}# 結果：{'_id': 'B01BW2YYYC', 'date': '2017-08-01', 'key': 'Video', 'sales': 100}

# 樣本二：相同欄位，並列條件 # where sales >= 50 and sales <= 100# 正確：queryArgs = {'sales':{'$gte':50, '$lte':100}}# 錯誤：queryArgs = {'sales':{'$gt':50}, 'sales':{'$lt':100}}# 結果：{'_id': 'B01M6DHW26', 'date': '2017-08-01', 'key': 'radios', 'sales': 50}

（5.1.2.3）. or

# 樣本一：不同欄位，或條件 # where date = '2017-08-01' or sales = 100queryArgs = {'$or':[{'date':'2017-08-01'}, {'sales':100}]}# 結果：{'_id': 'B01EYCLJ04', 'date': '2017-08-01', 'key': 'pro audio', 'sales': 0}

# 樣本二：相同欄位，或條件 # where sales = 100 or sales = 120queryArgs = {'$or':[{'sales':100}, {'sales':120}]}# 結果：#    {'_id': 'B00X5RV14Y', 'date': '2017-08-01', 'key': 'Chargers', 'sales': 120}#    {'_id': 'B0728GGX6Y', 'date': '2017-08-01', 'key': 'Glasses', 'sales': 100}

（5.1.2.4）. in，not in，all

# 樣本一：in # where sales in (100,120)queryArgs = {'sales':{'$in':[100,120]}}# 結果：#    {'_id': 'B00X5RV14Y', 'date': '2017-08-01', 'key': 'Chargers', 'sales': 120}#    {'_id': 'B0728GGX6Y', 'date': '2017-08-01', 'key': 'Glasses', 'sales': 100}

# 樣本二：not in # where sales not in (100,120)queryArgs = {'sales':{'$nin':[100,120]}}# 結果：{'_id': 'B01EYCLJ04', 'date': '2017-08-01', 'key': 'pro audio', 'sales': 0}

# 樣本三：匹配條件內所有值 all # where sales = 100 and sales = 120queryArgs = {'sales':{'$all':[100,120]}}  # 必須同時滿足# 結果：無結果

# 樣本四：匹配條件內所有值 all  # where sales = 100 and sales = 100queryArgs = {'sales':{'$all':[100,100]}}  # 必須同時滿足# 結果：{'_id': 'B01BW2YYYC', 'date': '2017-08-01', 'key': 'Video', 'sales': 100}

（5.1.2.5）. 欄位是否存在

# 樣本一：欄位不存在# where rank2 is nullqueryArgs = {'rank2':None}projectionFields = ['key','sales','date', 'rank2']searchRes = db_coll.find(queryArgs, projection = projectionFields)# 結果：{'_id': 'B00ACOKQTY', 'date': '2017-08-01', 'key': '3D TVs', 'sales': 0}# mongodb中的命令db.categoryAsinSrc.find({'isClawered': true, 'avgCost': {$exists: false}})

# 樣本二：欄位存在# where rank2 is not nullqueryArgs = {'rank2':{'$ne':None}}projectionFields = ['key','sales','date','rank2']searchRes = db_coll.find(queryArgs, projection = projectionFields).limit(100)# 結果：{'_id': 'B014I8SX4Y', 'date': '2017-08-01', 'key': '3D TVs', 'rank2': 4.0, 'sales': 0}

（5.1.2.6）. Regex匹配：$regex（SQL：like）

# 樣本一：關鍵字key包含audio子串# where key like "%audio%"queryArgs = {'key':{'$regex':'.*audio.*'}}# 結果：{'_id': 'B01M19FGTZ', 'date': '2017-08-01', 'key': 'pro audio', 'sales': 1}

（5.1.2.7）. 數組中必須包含元素：$all

# 查詢記錄，linkNameLst是一個數組，指定linkNameLst欄位必須包含 'Electronics, Computers & Office' 這個元素。db.getCollection("2018-01-24").find({'linkNameLst': {'$all': ['Electronics, Computers & Office']}})# 查詢記錄，linkNameLst是一個數組，指定linkNameLst欄位必須同時包含 'Wearable Technology' 和 'Electronics, Computers & Office' 這兩個元素。db.getCollection("2018-01-24").find({'linkNameLst': {'$all': ['Wearable Technology', 'Electronics, Computers & Office']}})

（5.1.2.8）. 按數組大小查詢
兩個思路：第一個思路：使用$where（具有很大的靈活性，但是速度會慢一些）

# priceLst是一個數組， 目標是查詢 len(priceLst) < 3 db.getCollection("20180306").find({$where: "this.priceLst.length < 3"})

關於$where，請參考官方文檔：http://docs.mongodb.org/manual/reference/operator/query/where/。第二個思路：判斷數組中的某個指定索引的元素是否存在（會比較高效）例如：如果要求 len(priceLst) < 3：就意味著 num[ 2 ]不存在

# priceLst是一個數組， 目標是查詢 len(priceLst) < 3 db.getCollection("20180306").find({'priceLst.2': {$exists: 0}})

例如：如果要求 len(priceLst) > 3：就意味著 num[ 3 ]存在

# priceLst是一個數組， 目標是查詢 len(priceLst) > 3 db.getCollection("20180306").find({'priceLst.3': {$exists: 1}})

（5.1.3）指定查詢條件
（5.1.3.1）. 限定數量：limit

# 樣本一：按sales降序排列，取前100# select top 100 _id,key,sales form galance20170801 where key = 'speakers' order by sales descqueryArgs = {'key':'speakers'}projectionFields = ['key','sales']searchRes = db_coll.find(queryArgs, projection = projectionFields)topSearchRes = searchRes.sort('sales',pymongo.DESCENDING).limit(100)

（5.1.3.2）. 排序：sort

# 樣本二：按sales降序，rank升序# select _id,key,date,rank from galance20170801 where key = 'speakers' order by sales desc,rankqueryArgs = {'key':'speakers'}projectionFields = ['key','sales','rank']searchRes = db_coll.find(queryArgs, projection = projectionFields)# sortedSearchRes = searchRes.sort('sales',pymongo.DESCENDING) # 單個欄位sortedSearchRes = searchRes.sort([('sales', pymongo.DESCENDING),('rank', pymongo.ASCENDING)]) # 多個欄位# 結果：# {'_id': 'B000289DC6', 'key': 'speakers', 'rank': 3.0, 'sales': 120}# {'_id': 'B001VRJ5D4', 'key': 'speakers', 'rank': 5.0, 'sales': 120}

（5.1.3.3）. 統計：count

# 樣本三：統計匹配記錄總數# select count(*) from galance20170801 where key = 'speakers'queryArgs = {'key':'speakers'}searchResNum = db_coll.find(queryArgs).count()# 結果：# 106

5.2. 添加記錄 5.2.1. 單條插入

# 樣本一：指定 _id，如果重複，會產生異常ID = 'firstRecord'insertDate = '2017-08-28'count = 10insert_record = {'_id':ID, 'endDate': insertDate, 'count': count}insert_res = db_coll.insert_one(insert_record)print(f"insert_id={insert_res.inserted_id}: {insert_record}")# 結果：insert_id=firstRecord: {'_id': 'firstRecord', 'endDate': '2017-08-28', 'count': 10}

# 樣本二：不指定 _id，自動產生insertDate = '2017-10-10'count = 20insert_record = {'endDate': insertDate, 'count': count}insert_res = db_coll.insert_one(insert_record)print(f"insert_id={insert_res.inserted_id}: {insert_record}")# 結果：insert_id=59ad356d51ad3e2314c0d3b2: {'endDate': '2017-10-10', 'count': 20, '_id': ObjectId('59ad356d51ad3e2314c0d3b2')}

5.2.2. 批量插入

# 更高效，但要注意如果指定_id，一定不能重複# ordered = True，遇到錯誤 break, 並且拋出異常# ordered = False，遇到錯誤 continue, 迴圈結束後拋出異常insertRecords = [{'i':i, 'date':'2017-10-10'} for i in range(10)]insertBulk = db_coll.insert_many(insertRecords, ordered = True)print(f"insert_ids={insertBulk.inserted_ids}")# 結果：insert_ids=[ObjectId('59ad3ba851ad3e1104a4de6d'), ObjectId('59ad3ba851ad3e1104a4de6e'), ObjectId('59ad3ba851ad3e1104a4de6f'), ObjectId('59ad3ba851ad3e1104a4de70'), ObjectId('59ad3ba851ad3e1104a4de71'), ObjectId('59ad3ba851ad3e1104a4de72'), ObjectId('59ad3ba851ad3e1104a4de73'), ObjectId('59ad3ba851ad3e1104a4de74'), ObjectId('59ad3ba851ad3e1104a4de75'), ObjectId('59ad3ba851ad3e1104a4de76')]

5.3. 修改記錄

# 根據篩選條件_id，更新這條記錄。如果找不到合格記錄，就插入這條記錄（upsert = True）updateFilter = {'_id': item['_id']}updateRes = db_coll.update_one(filter = updateFilter,                               update = {'$set': dict(item)},                               upsert = True)print(f"updateRes = matched:{updateRes.matched_count}, modified = {updateRes.modified_count}")

# 根據篩選條件，更新部分欄位：i是原有欄位，isUpdated是新增欄位filterArgs = {'date':'2017-10-10'}updateArgs = {'$set':{'isUpdated':True, 'i':100}}updateRes = db_coll.update_many(filter = filterArgs, update = updateArgs)print(f"updateRes: matched_count={updateRes.matched_count}, "      f"modified_count={updateRes.modified_count} modified_ids={updateRes.upserted_id}")# 結果：updateRes: matched_count=8, modified_count=8 modified_ids=None

5.4. 刪除記錄 5.4.1. 刪除一條記錄

# 樣本一：和查詢使用的條件一樣queryArgs = {'endDate':'2017-08-28'}delRecord = db_coll.delete_one(queryArgs)print(f"delRecord={delRecord.deleted_count}")# 結果：delRecord=1

5.4.2. 大量刪除

# 樣本二：和查詢使用的條件一樣queryArgs = {'i':{'$gt':5, '$lt':8}}# db_coll.delete_many({})  # 清空資料庫delRecord = db_coll.delete_many(queryArgs)print(f"delRecord={delRecord.deleted_count}")# 結果：delRecord=2

6. 將資料庫文檔寫入csv檔案。 6.1. 標準代碼 讀csv檔案

import csvwith open("phoneCount.csv", "r") as csvfile:    reader = csv.reader(csvfile)    # 這裡不需要readlines    for line in reader:        print(f"# line = {line}, typeOfLine = {type(line)}, lenOfLine = {len(line)}")# 輸出結果如下：line = ['850', 'rest', '43', 'NN'], typeOfLine = <class 'list'>, lenOfLine = 4line = ['9865', 'min', '1', 'CD'], typeOfLine = <class 'list'>, lenOfLine = 4

寫csv檔案

# 匯出資料庫所有記錄的標準模版import pymongoimport csv# 初始化資料庫mongo_url = "127.0.0.1:27017"DATABASE = "databaseName"TABLE = "tableName"client = pymongo.MongoClient(mongo_url)db_des = client[DATABASE]db_des_table = db_des[TABLE]# 將資料寫入到CSV檔案中# 如果直接從mongod booster匯出, 一旦有部分出現欄位缺失，那麼會出現結果錯位的問題# newline='' 的作用是防止結果資料中出現空行，專屬於python3with open(f"{DATABASE}_{TABLE}.csv", "w", newline='') as csvfileWriter:    writer = csv.writer(csvfileWriter)    # 先寫列名    # 寫第一行，欄位名    fieldList = [        "_id",        "itemType",        "field_1",        "field_2",        "field_3",    ]    writer.writerow(fieldList)    allRecordRes = db_des_table.find()    # 寫入多行資料    for record in allRecordRes:        print(f"record = {record}")        recordValueLst = []        for field in fieldList:            if field not in record:                recordValueLst.append("None")            else:                recordValueLst.append(record[field])        try:            writer.writerow(recordValueLst)        except Exception as e:            print(f"write csv exception. e = {e}")

6.2. 可能出現的問題以及解決方案 6.2.1. 寫csv檔案編碼問題 參考文章：Python UnicodeEncodeError: ‘gbk’ codec can’t encode character 解決方案 :

http://www.jb51.net/article/64816.htm 重要點：目標檔案的編碼是導致標題所指問題的罪魁禍首。如果我們開啟一個檔案，在windows下面，新檔案的預設編碼是gbk，這樣的話，python解譯器會用gbk編碼去解析我們的網路資料流txt，然而txt此時已經是decode過的unicode編碼，這樣的話就會導致解析不了，出現上述問題。解決的辦法就是，改變目標檔案的編碼。解決方案：

###### 確實最推薦的做法是在open檔案時，指定編碼格式：with open(f"{DATABASE}_{TABLE}.csv", "w", newline='', encoding='utf-8') as csvfileWriter:# 就像我們在windows環境下，寫csv檔案時，預設編碼是'gbk'，而從網上擷取的資料大部分是'utf-8'，這就可能出現某些編碼不相容的問題。比如：write csv exception. e = 'gbk' codec can't encode character '\xae' in position 80: illegal multibyte sequence

6.2.2. 寫csv檔案出現空白行（存在一行間一行） python2.x 版本
描述及解決方案，請參考：https://www.cnblogs.com/China-YangGISboy/p/7339118.html</

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

相關關鍵詞：

C#簡單操作MongoDB 09-12

MongoDB 提升效能的18原則（開發設計階段） 09-11

關於 MongoDB 與 SQL Server 通過本身內建工具實現資料快速遷移及注意事項的探究 09-07

實現MongoDB讀寫分離的“讀偏好”介紹 09-11

MongoDB分區在部署和維護管理中常見事項的總結 09-10

MongoDB 執行mongoexport時異常及分析（關於數字類型的查詢） 09-06

聯繫我們

該頁面正文內容均來源於網絡整理，並不代表阿里雲官方的觀點，該頁面所提到的產品和服務也與阿里云無關，如果該頁面內容對您造成了困擾，歡迎寫郵件給我們，收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容，歡迎發送郵件至： info-contact@alibabacloud.com 進行舉報並提供相關證據，工作人員會在 5 個工作天內聯絡您，一經查實，本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

python使用pymongo訪問MongoDB的基本操作，以及CSV檔案匯出__python

聯繫我們

熱門內容

熱門主題

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support