Python uses Pymongo to access MongoDB's basic operations, as well as CSV file export
1. Environment.
python:3.6.1
Python Ide:pycharm
System: Win7 2. Simple example
Import Pymongo
# MongoDB service address and port number
Mongo_url = "127.0.0.1:27017"
# Connect to MongoDB, if the parameter is not filled, the default is "localhost:27017 "
client = Pymongo." Mongoclient (mongo_url)
#连接到数据库myDatabase
DATABASE = "MyDatabase"
db = Client[database]
#连接到集合 (table): Mydatabase.mycollection
COLLECTION = "mycollection"
db_coll = db[collection]
# Find the Date field in table mycollection equal to the 2017-08-29 record and sort the results by age from large to small
Queryargs = {' Date ': ' 2017-08-29 '}
search_res = db _coll.find (Queryargs). Sort (' age ', -1)
for the record in Search_res:
print (f "_id = {record[' _id ']}, name = {Record [' Name ']}, age = {record[' age ']} '
3. The main pointsFor reading operations, data statistics, as much as possible to use multithreading, save time, but to pay attention to the number of threads, will eat a lot of memory.
4. Data type of MongoDB
MongoDB supports a number of data types, as follows: strings -the most commonly used data types for storing data. The string in the MongoDB must be UTF-8. integer -for storing numeric values. The integer can be either 32-bit or 64-bit, depending on the server. Boolean type -for storing a Boolean value (True/false) value. double-precision floating-point number-for storing floating-point values. minimum/maximum key -used to compare the value to the minimum and maximum bson elements. array -used to store an array or list or multiple values in a key. timestamp-ctimestamp, which can be easily recorded when a document is modified or added. objects -for embedded documents. objects -for embedded documents. NULL -for storing null values. symbol -The data type is the same as the string; However, the language used to use a particular symbol type is usually reserved. Date -used to store the current date or time in Unix time format. You can specify the date and time you want by creating a Date object and specifying the day, month, and year-date. Object ID -The ID used to store the document. binary Data -for storing binary data. code -used to store JavaScript code in a document. Regular Expressions -used to store regular expressions.
Unsupported data type: Collection in Python (set) 5. Action on a table (set collection)
Import Pymongo
# MongoDB service address and port number
Mongo_url = "127.0.0.1:27017"
# Connect to MongoDB, if the parameter is not filled, the default is "localhost:27017 "
client = Pymongo." Mongoclient (mongo_url)
#连接到数据库myDatabase
DATABASE = "Amazon"
db = Client[database]
#连接到集合 (table): Mydatabase.mycollection
COLLECTION = "galance20170801"
db_coll = db[collection]
5.1. Search records: Find
(5.1.1) specifies which fields are returned
# example One: All fields
# select * from galance20170801
searchres = Db_coll.find ()
# example two: Specify which fields to display in a dictionary
# Select _id,key from galance20170801
Queryargs = {}
projectionfields = {' _id ': True , ' key ': True} # The dictionary specifies
searchres = db_coll.find (Queryargs, projection = projectionfields)
# result {' _id ': ' B01eyclj04 ', ' key ': ' Pro Audio '}
# example Three: Use the dictionary to specify which fields to remove
Queryargs = {}
projectionfields = {' _id ': false, ' key ': false} # with dictionary specified
searchres = db_ Coll.find (Queryargs, projection = projectionfields)
# result {' Activity ': False, ' avgstar ': 4.3, ' color ': ' Yellow & Amp Black ', ' Date ': ' 2017-08-01 '}
# example four: Specify which fields to display with the list
# Select _id,key,date from galance20170801
Queryargs = {}
projectionfields = [' key ', ' Date '] # specified with the list, the result will definitely return _id this field
searchres = Db_coll.find (Queryargs, projection = Projectionfields
) # result { ' _id ': ' b01eyclj04 ', ' Date ': ' 2017-08-01 ', ' key ': ' Pro Audio '}
(5.1.2) Specify query criteria
(5.1.2.1). Comparison: =,. =,>, >=, <=
$ne: Not equal to (not equal)
$GT: greater than (greater than)
$lt: Less than (less than)
$lte: Less than or equal (less than equal)
$gte: greater than or equal to ( Greater than equal)
# example One: Equal
# Select _id,key,sales,date from galance20170801 where key = ' TV & Video '
Queryargs = {' key ': ' TV ; Video '}
projectionfields = [' key ', ' sales ', ' Date ']
searchres = db_coll.find (Queryargs, projection = Projectionfields)
# results: {' _id ': ' 0750699973 ', ' Date ': ' 2017-08-01 ', ' key ': ' TV & Video ', ' Sales ': 0}
# example Two: Unequal
# Select _id,key,sales,date from galance20170801 where sales!= 0
Queryargs = {' Sales ': {' $ne ': 0}}
P Rojectionfields = [' key ', ' sales ', ' Date ']
searchres = db_coll.find (Queryargs, projection = projectionfields)
# Result: {' _id ': ' b01m996469 ', ' Date ': ' 2017-08-01 ', ' key ': ' Stereos ', ' Sales ': 2}
# example three: greater than
# where sales >
Queryargs = {' Sales ': {' $gt ':}}
# Result: {' _id ': ' B010oyasrg ', ' Date ': ' 2017 -08-01 ', ' key ': ' Sound Bar ', ' Sales ': 124}
# example four: less than
# where sales <
Queryargs = {' Sales ': {' $lt ':}
# results: {' _id ': ' B011798dkq ', ' Date ': ' 2017-08 -01 ', ' key ': ' Pro Audio ', ' Sales ': 0}
# example five: Specify Range
# where sales > and sales <
Queryargs = {' Sales ': {' $gt ': $, ' $lt ':}}
# results: {' _id ': ' B008d2ihes ', ' Date ': ' 2017-08-01 ', ' key ': ' Sound Bar ', ' Sales ': 66}
# example Six: Specify range, greater than or equal to
# where sales >= and sales <=
Queryargs = {' Sales ': {' $gte ': $, ' $lte ':}}
# result: {' _id ': ' b01m6dhw26 ', ' Date ': ' 2017-08-01 ', ' key ': ' Radios ', ' Sales ': 50}
(5.1.2.2). and
# example one: different fields, coordinate conditions
# where date = ' 2017-08-01 ' and sales =
Queryargs = {' Date ': ' 2017-08-01 ', ' Sales ':}
# result: {' _id ': ' b01bw2yyyc ', ' Date ': ' 2017-08-01 ', ' key ': ' Video ', ' Sales ': 100}
# example two: same field, coordinate condition
# where sales >= and sales <=
# correct: Queryargs = {' Sales ': {' $gte ': $, ' $lte ':
} # error: Queryargs = {' Sales ': {' $gt ':}, ' sales ': {' $lt ':}}
# Result: {' _id ': ' b01m6dhw26 ', ' Date ': ' 2017-08-01 ', ' key ': ' Radios ', ' Sales ': 50}
(5.1.2.3). or
# example one: different fields, or conditions
# where date = ' 2017-08-01 ' or sales =
Queryargs = {' $or ': [{' Date ': ' 2017-08-01 '}, {' Sales ': 100} }
# Result: {' _id ': ' b01eyclj04 ', ' Date ': ' 2017-08-01 ', ' key ': ' Pro Audio ', ' Sales ': 0}
# example two: same field, or Condition
# where sales = + or sales =
Queryargs = {' $or ': [{' Sales ':}, {' Sales ':]}
# results:
# {' _id ': ' b00x5rv14y ', ' Date ': ' 2017-08-01 ', ' key ': ' Chargers ', ' Sales ':
# {' _id ': ' B0728ggx6y ', ' Date ': ' 2017-08-01 ', ' key ': ' Glasses ', ' Sales ': 100}
(5.1.2.4). In,not In,all
# example one: In
# where sales in (100,120)
Queryargs = {' Sales ': {' $in ': [100,120]}}
# Result:
# {' _id ': ' B00x5rv14y ', ' Date ': ' 2017-08-01 ', ' key ': ' Chargers ', ' Sales ':
# {' _id ': ' b0728ggx6y ', ' Date ': ' 2017-08-01 ', ' key ': ' Glasses ', ' Sales ': 100}
# example two: not in
# where sales not in (100,120)
Queryargs = {' Sales ': {' $nin ': [100,120]}}
# Result: {' _id ': ' b01eyclj04 ', ' Date ': ' 2017-08-01 ', ' key ': ' Pro Audio ', ' Sales ': 0}
# example three: All values in matching criteria all
# where sales = N and sales =
Queryargs = {' Sales ': {' $all ': [100,120]}} # must satisfy both
# results: No results
# example four: All values in matching criteria all
# where sales = N and sales =
Queryargs = {' Sales ': {' $all ': [100,100]}} # must satisfy both
# result: {' _id ': ' b01bw2yyyc ', ' Date ': ' 2017-08-01 ', ' key ': ' Video ', ' Sales ': 100}
(5.1.2.5). Does the field exist
# Example one: field does not exist
# where RANK2 is
null Queryargs = {' Rank2 ': None}
projectionfields = [' key ', ' sales ', ' Date ', ' Rank2 ']
searchres = db_coll.find (Queryargs, projection = projectionfields)
# Result: {' _id ': ' B00acokqty ', ' Date ': ' 2017-08-01 ', ' key ': ' 3D TVs ', ' Sales ': 0}
# MongoDB command
db.categoryAsinSrc.find ({' isclawered ': true, ' avgcost ' : {$exists: false}})
# example two: field exists
# where RANK2 is not null
Queryargs = {' Rank2 ': {' $ne ': None}}
projectionfields = [' key ', ' sales ', ' Date ', ' rank2 ']
searchres = db_coll.find (Queryargs, projection = projectionfields). Limit (m)
# Result: {' _id ': ' B014i8sx4y ', ' Date ': ' 2017-08-01 ', ' key ': ' 3D TVs ', ' rank2 ': 4.0, ' Sales ': 0}
(5.1.2.6). Regular expression matching: $regex (sql:like)
# example One: Keyword key contains audio substring
# where key like '%audio% '
Queryargs = {' key ': {' $regex ': '. *audio.* '}
# Result: {' _id ' : ' B01m19fgtz ', ' Date ': ' 2017-08-01 ', ' key ': ' Pro Audio ', ' Sales ': 1}
(5.1.2.7). An array must contain elements: $all
# query record, LINKNAMELST is an array that specifies that the Linknamelst field must contain the element ' electronics, Computers & Office '.
db.getcollection ("2018-01-24"). Find ({' linknamelst ': {' $all ': [' Electronics, Computers & Office ']}})
# Query record, LINKNAMELST is an array that specifies that the Linknamelst field must contain two elements, both ' wearable Technology ' and ' Electronics, Computers & Office '.
db.getcollection ("2018-01-24"). Find ({' linknamelst ': {' $all ': [' wearable Technology ', ' Electronics, Computers & Office ']}}
(5.1.2.8). Query by array size
Two ideas: The first idea: using $where (with great flexibility, but slower)
# Pricelst is an array, the target is query Len (PRICELST) < 3
db.getcollection ("20180306"). Find ({$where: "This.priceLst.length < 3 "})
For $where, please refer to the official documentation: Http://docs.mongodb.org/manual/reference/operator/query/where/. The second idea: to determine whether an element of a specified index in an array exists (which is more efficient) For example: If Len (PRICELST) < 3 is required, it means num[2] does not exist
# Pricelst is an array, the target is query Len (PRICELST) < 3
db.getcollection ("20180306"). Find ({' pricelst.2 ': {$exists: 0}})
For example: If you ask Len (PRICELST) > 3: It means num[3] exists
# Pricelst is an array, the target is query Len (pricelst) > 3
db.getcollection ("20180306"). Find ({' pricelst.3 ': {$exists: 1}})
(5.1.3) Specify query criteria
(5.1.3.1). Limited Quantity: Limit
# example one: in descending order of sales, take forward
# Select top _id,key,sales form galance20170801 where key = ' speakers ' ordered by sales desc< C5/>queryargs = {' key ': ' Speakers '}
projectionfields = [' key ', ' sales ']
searchres = Db_coll.find (Queryargs, projection = projectionfields)
topsearchres = Searchres.sort (' Sales ', Pymongo. Descending). Limit (100)
(5.1.3.2). Sorting: sort
# example two: By sales descending, rank ascending
# Select _id,key,date,rank from galance20170801 where key = ' speakers ' ORDER by sales desc,rank< C1/>queryargs = {' key ': ' Speakers '}
projectionfields = [' key ', ' sales ', ' rank ']
searchres = Db_coll.find ( Queryargs, projection = projectionfields)
# sortedsearchres = Searchres.sort (' Sales ', Pymongo. Descending) # single field
sortedsearchres = Searchres.sort ([' Sales ', Pymongo. Descending), (' Rank ', Pymongo. Ascending)] # Multiple fields
# results:
# {' _id ': ' b000289dc6 ', ' key ': ' Speakers ', ' rank ': 3.0, ' Sales ':
# {' _id ': ' B00 1vrj5d4 ', ' key ': ' Speakers ', ' rank ': 5.0, ' Sales ': 120}
(5.1.3.3). Statistics: Count
# example three: statistic matching records Total
# Select COUNT (*) from galance20170801 where key = ' speakers '
Queryargs = {' key ': ' Speakers '}
searchresnum = Db_coll.find (Queryargs). Count ()
# Result:
# 106
5.2. Add a record
5.2.1. Single Insert
# example One: Specifies _id, if repeated, generates an exception
id = ' Firstrecord '
insertdate = ' 2017-08-28 '
count = ten
Insert_record = {' _id ': I D, ' enddate ': insertdate, ' Count ': Count}
insert_res = Db_coll.insert_one (Insert_record)
print (f "insert_id={ INSERT_RES.INSERTED_ID}: {Insert_record} ")
# Result: Insert_id=firstrecord: {' _id ': ' Firstrecord ', ' enddate ': ' 2017-08-28 ', ' Count ': 10}
# example Two: Do not specify _id, automatically generate
insertdate = ' 2017-10-10 '
count =
Insert_record = {' EndDate ': insertdate, ' count ': Coun T}
insert_res = Db_coll.insert_one (Insert_record)
print (f "insert_id={insert_res.inserted_id}: {Insert_ Record} ")
# Result: insert_id=59ad356d51ad3e2314c0d3b2: {' enddate ': ' 2017-10-10 ', ' count ': ' _id ': ObjectId (' 59ad356d51ad3e2314c0d3b2 ')}
5.2.2. BULK Insert
# more efficient, but be aware that if you specify _id, you must not repeat
# ordered = True, encounter an error break, throw an exception
# ordered = False, encounter an error continue, throw an exception after the loop ends
Inser Trecords = [{' I ': I, ' date ': ' 2017-10-10 '} for I in range ()]
insertbulk = Db_coll.insert_many (insertrecords, ordered = True)
print (f "Insert_ids={insertbulk.inserted_ids}")
# Result: Insert_ids=[objectid (' 59AD3BA851AD3E1104A4DE6D '), ObjectId (' 59ad3ba851ad3e1104a4de6e '), ObjectId (' 59ad3ba851ad3e1104a4de6f '), ObjectId ( ' 59ad3ba851ad3e1104a4de70 '), ObjectId (' 59ad3ba851ad3e1104a4de71 '), ObjectId (' 59ad3ba851ad3e1104a4de72 '), ObjectId (' 59ad3ba851ad3e1104a4de73 '), ObjectId (' 59ad3ba851ad3e1104a4de74 '), ObjectId (' 59ad3ba851ad3e1104a4de75 '), ObjectId (' 59ad3ba851ad3e1104a4de76 ')]
5.3. Modify the Record
# Update this record according to the filter criteria _id. Insert this record if no record is found (Upsert = True)
updatefilter = {' _id ': item[' _id ']}
updateres = Db_coll.update_one (filter = Updatefilter,
update = {' $set ': Dict (item)},
Upsert = True)
print (f "updateres = matched:{ Updateres.matched_count}, Modified = {Updateres.modified_count} ")
# update Some fields according to filter criteria: I is the original field, Isupdated is the new field
Filterargs = {' Date ': ' 2017-10-10 '}
Updateargs = {' $set ': {' isupdated ': True, ' i ':
updateres = db_coll.update_many (filter = Filterargs, update = Updateargs)
print (f "updateres: Matched_count={updateres.matched_count}, "
F" Modified_count={updateres.modified_count} modified_ids={ UPDATERES.UPSERTED_ID} ")
# Results: updateres:matched_count=8, modified_count=8 Modified_ids=none
5.4. Delete Record 5.4.1. Delete a record
# example one: Same as the criteria used by the query
Queryargs = {' EndDate ': ' 2017-08-28 '}
Delrecord = Db_coll.delete_one (Queryargs)
print (f "Delrecord={delrecord.deleted_count}")
# Results: delrecord=1
5.4.2. Bulk deletion
# example two: the same criteria as the query used
Queryargs = {' I ': {' $gt ': 5, ' $lt ': 8}}
# Db_coll.delete_many ({}) # empty database
Delrecord = db _coll.delete_many (Queryargs)
print (f "Delrecord={delrecord.deleted_count}")
# Results: delrecord=2
6. Write the database document to the CSV file.
6.1. Standard Code
Read CSV file
Import CSV
with open ("Phonecount.csv", "R") as CSVFile:
reader = Csv.reader (csvfile)
# There's no need for readlines
here For line in reader:
print (f "# line = {line}, Typeofline = {type (line)}, Lenofline = {len (line)}")
# Output results are as follows:
Li NE = [' 850 ', ' rest ', ' n ', ' NN '], typeofline = <class ' list ', Lenofline = 4 line
= [' 9865 ', ' min ', ' 1 ', ' CD '], Typeofline = <class ' list ', Lenofline = 4
Write CSV file
# Export the standard template import of all records of the database Pymongo import CSV # Initialize database Mongo_url = "127.0.0.1:27017" db = "databaseName" TABLE = "Tabl" ename "client = Pymongo. Mongoclient (mongo_url) db_des = client[database] db_des_table = db_des[table] # writes data to a CSV file # If you export directly from Mongod Booster, once you have Part of the field is missing, then the result of the dislocation of the problem # newline= ' role is to prevent the result of a blank row in the data, exclusive belonging to Python3 with open (f "{database}_{table}.csv", "W", newline= ") as Csvfilewriter:writer = Csv.writer (csvfilewriter) # First write column name # write first row, field name fieldlist = ["_id", " ItemType "," Field_1 "," Field_2 "," Field_3 ",] Writer.writerow (fieldlist) allrecordres = Db_des_table.find () # writes multiple rows of data for the record in Allrecordres:print (f "record = {record}") Recordvalu Elst = [] for field in fieldlist:if field not in Record:recordValueLst.append ("None" ) Else:recordValueLst.append (Record[field]) Try:writer.writerow (Recordva LUELST) except Exception as E:print (f "Write csv Exception. E = {e}")
6.2. Problems that may arise and solutions
6.2.1. Write CSV file encoding problemReference article: Python unicodeencodeerror: ' GBK ' codec can ' t encode character solution:
Http://www.jb51.net/article/64816.htm Important point: the encoding of the target file is the culprit leading to the problem that the title refers to. If we open a file, under Windows, the default encoding for the new file is GBK, so that the Python interpreter uses GBK encoding to parse our network data stream txt, but TXT is already a decode Unicode encoding, This will lead to resolution, the problem arises. The solution is to change the encoding of the target file. Solution:
###### It is really recommended that you specify the encoding format when you open a file: with
open (f "{database}_{table}.csv", "w", newline= ', encoding= ' utf-8 ') as Csvfilewriter:
# Just like when we write a CSV file in a Windows environment, the default encoding is ' GBK ', and most of the data obtained from the Internet is ' utf-8 ', which may be a problem with some encoding incompatibility. For example: Write CSV exception. E = ' GBK ' codec can ' t encode character ' \xae ' in position 80:illegal, multibyte sequence
6.2.2. Write CSV file with blank line (there is a row between lines)
python2.x version
Description and solution, please refer to: https://www.cnblogs.com/China-YangGISboy/p/7339118.html</