Python uses Pymongo to access MongoDB's basic operations, as well as CSV file export __python

Source: Internet
Author: User
Tags bulk insert mongoclient mongodb
Python uses Pymongo to access MongoDB's basic operations, as well as CSV file export 1. Environment.

python:3.6.1
Python Ide:pycharm
System: Win7 2. Simple example

Import Pymongo

# MongoDB service address and port number
Mongo_url = "127.0.0.1:27017"

# Connect to MongoDB, if the parameter is not filled, the default is "localhost:27017 "
client = Pymongo." Mongoclient (mongo_url)

#连接到数据库myDatabase
DATABASE = "MyDatabase"
db = Client[database]

#连接到集合 (table): Mydatabase.mycollection
COLLECTION = "mycollection"
db_coll = db[collection]

# Find the Date field in table mycollection equal to the 2017-08-29 record and sort the results by age from large to small
Queryargs = {' Date ': ' 2017-08-29 '}
search_res = db _coll.find (Queryargs). Sort (' age ', -1)
for the record in Search_res:
      print (f "_id = {record[' _id ']}, name = {Record [' Name ']}, age = {record[' age ']} '
3. The main pointsFor reading operations, data statistics, as much as possible to use multithreading, save time, but to pay attention to the number of threads, will eat a lot of memory. 4. Data type of MongoDB

MongoDB supports a number of data types, as follows: strings -the most commonly used data types for storing data. The string in the MongoDB must be UTF-8. integer -for storing numeric values. The integer can be either 32-bit or 64-bit, depending on the server. Boolean type -for storing a Boolean value (True/false) value. double-precision floating-point number-for storing floating-point values. minimum/maximum key -used to compare the value to the minimum and maximum bson elements. array -used to store an array or list or multiple values in a key. timestamp-ctimestamp, which can be easily recorded when a document is modified or added. objects -for embedded documents. objects -for embedded documents. NULL -for storing null values. symbol -The data type is the same as the string; However, the language used to use a particular symbol type is usually reserved. Date -used to store the current date or time in Unix time format. You can specify the date and time you want by creating a Date object and specifying the day, month, and year-date. Object ID -The ID used to store the document. binary Data -for storing binary data. code -used to store JavaScript code in a document. Regular Expressions -used to store regular expressions.

Unsupported data type: Collection in Python (set) 5. Action on a table (set collection)

Import Pymongo

# MongoDB service address and port number
Mongo_url = "127.0.0.1:27017"

# Connect to MongoDB, if the parameter is not filled, the default is "localhost:27017 "
client = Pymongo." Mongoclient (mongo_url)
#连接到数据库myDatabase
DATABASE = "Amazon"
db = Client[database]

#连接到集合 (table): Mydatabase.mycollection
COLLECTION = "galance20170801"
db_coll = db[collection]
5.1. Search records: Find (5.1.1) specifies which fields are returned
# example One: All fields
# select * from galance20170801
searchres = Db_coll.find ()
# example two: Specify which fields to display in a dictionary
# Select _id,key from galance20170801
Queryargs = {}
projectionfields = {' _id ': True , ' key ': True}  # The dictionary specifies
searchres = db_coll.find (Queryargs, projection = projectionfields)
# result {' _id ': ' B01eyclj04 ', ' key ': ' Pro Audio '}
# example Three: Use the dictionary to specify which fields to remove
Queryargs = {}
projectionfields = {' _id ': false, ' key ': false}  # with dictionary specified
searchres = db_ Coll.find (Queryargs, projection = projectionfields)
# result {' Activity ': False, ' avgstar ': 4.3,  ' color ': ' Yellow & Amp Black ', ' Date ': ' 2017-08-01 '}
# example four: Specify which fields to display with the list
# Select _id,key,date from galance20170801
Queryargs = {}
projectionfields = [' key ', ' Date ']  # specified with the list, the result will definitely return _id this field
searchres = Db_coll.find (Queryargs, projection = Projectionfields
) # result { ' _id ': ' b01eyclj04 ', ' Date ': ' 2017-08-01 ', ' key ': ' Pro Audio '}
(5.1.2) Specify query criteria
(5.1.2.1). Comparison: =,. =,>, >=, <=
$ne: Not equal to (not equal)
$GT: greater than (greater than)
$lt: Less than (less than)
$lte: Less than or equal (less than equal)
$gte: greater than or equal to ( Greater than equal)
# example One: Equal
# Select _id,key,sales,date from galance20170801 where key = ' TV & Video '
Queryargs = {' key ': ' TV ; Video '}
projectionfields = [' key ', ' sales ', ' Date ']
searchres = db_coll.find (Queryargs, projection = Projectionfields)
# results: {' _id ': ' 0750699973 ', ' Date ': ' 2017-08-01 ', ' key ': ' TV & Video ', ' Sales ': 0}
# example Two: Unequal
# Select _id,key,sales,date from galance20170801 where sales!= 0
Queryargs = {' Sales ': {' $ne ': 0}}
P Rojectionfields = [' key ', ' sales ', ' Date ']
searchres = db_coll.find (Queryargs, projection = projectionfields)
# Result: {' _id ': ' b01m996469 ', ' Date ': ' 2017-08-01 ', ' key ': ' Stereos ', ' Sales ': 2}
# example three: greater than 
# where sales >
Queryargs = {' Sales ': {' $gt ':}}
# Result: {' _id ': ' B010oyasrg ', ' Date ': ' 2017 -08-01 ', ' key ': ' Sound Bar ', ' Sales ': 124}
# example four: less than 
# where sales <
Queryargs = {' Sales ': {' $lt ':}
# results: {' _id ': ' B011798dkq ', ' Date ': ' 2017-08 -01 ', ' key ': ' Pro Audio ', ' Sales ': 0}
# example five: Specify Range 
# where sales > and sales <
Queryargs = {' Sales ': {' $gt ': $, ' $lt ':}}
# results: {' _id ': ' B008d2ihes ', ' Date ': ' 2017-08-01 ', ' key ': ' Sound Bar ', ' Sales ': 66}
# example Six: Specify range, greater than or equal to 
# where sales >= and sales <=
Queryargs = {' Sales ': {' $gte ': $, ' $lte ':}}
# result: {' _id ': ' b01m6dhw26 ', ' Date ': ' 2017-08-01 ', ' key ': ' Radios ', ' Sales ': 50}
(5.1.2.2). and
# example one: different fields, coordinate conditions 
# where date = ' 2017-08-01 ' and sales =
Queryargs = {' Date ': ' 2017-08-01 ', ' Sales ':}
# result: {' _id ': ' b01bw2yyyc ', ' Date ': ' 2017-08-01 ', ' key ': ' Video ', ' Sales ': 100}
# example two: same field, coordinate condition 
# where sales >= and sales <=
# correct: Queryargs = {' Sales ': {' $gte ': $, ' $lte ':
} # error: Queryargs = {' Sales ': {' $gt ':}, ' sales ': {' $lt ':}}
# Result: {' _id ': ' b01m6dhw26 ', ' Date ': ' 2017-08-01 ', ' key ': ' Radios ', ' Sales ': 50}
(5.1.2.3). or
# example one: different fields, or conditions 
# where date = ' 2017-08-01 ' or sales =
Queryargs = {' $or ': [{' Date ': ' 2017-08-01 '}, {' Sales ': 100} }
# Result: {' _id ': ' b01eyclj04 ', ' Date ': ' 2017-08-01 ', ' key ': ' Pro Audio ', ' Sales ': 0}
# example two: same field, or Condition 
# where sales = + or sales =
Queryargs = {' $or ': [{' Sales ':}, {' Sales ':]}
# results: 
  #    {' _id ': ' b00x5rv14y ', ' Date ': ' 2017-08-01 ', ' key ': ' Chargers ', ' Sales ':
#    {' _id ': ' B0728ggx6y ', ' Date ': ' 2017-08-01 ', ' key ': ' Glasses ', ' Sales ': 100}
(5.1.2.4). In,not In,all
# example one: In 
# where sales in (100,120)
Queryargs = {' Sales ': {' $in ': [100,120]}}
# Result:
#    {' _id ': ' B00x5rv14y ', ' Date ': ' 2017-08-01 ', ' key ': ' Chargers ', ' Sales ':
#    {' _id ': ' b0728ggx6y ', ' Date ': ' 2017-08-01 ', ' key ': ' Glasses ', ' Sales ': 100}
# example two: not in 
# where sales not in (100,120)
Queryargs = {' Sales ': {' $nin ': [100,120]}}
# Result: {' _id ': ' b01eyclj04 ', ' Date ': ' 2017-08-01 ', ' key ': ' Pro Audio ', ' Sales ': 0}
# example three: All values in matching criteria all 
# where sales = N and sales =
Queryargs = {' Sales ': {' $all ': [100,120]}}  # must satisfy both 
  # results: No results
# example four: All values in matching criteria all  
# where sales = N and sales =
Queryargs = {' Sales ': {' $all ': [100,100]}}  # must satisfy both 
  # result: {' _id ': ' b01bw2yyyc ', ' Date ': ' 2017-08-01 ', ' key ': ' Video ', ' Sales ': 100}
(5.1.2.5). Does the field exist
# Example one: field does not exist
# where RANK2 is
null Queryargs = {' Rank2 ': None}
projectionfields = [' key ', ' sales ', ' Date ', ' Rank2 ']
searchres = db_coll.find (Queryargs, projection = projectionfields)
# Result: {' _id ': ' B00acokqty ', ' Date ': ' 2017-08-01 ', ' key ': ' 3D TVs ', ' Sales ': 0}

# MongoDB command
db.categoryAsinSrc.find ({' isclawered ': true, ' avgcost ' : {$exists: false}})
# example two: field exists
# where RANK2 is not null
Queryargs = {' Rank2 ': {' $ne ': None}}
projectionfields = [' key ', ' sales ', ' Date ', ' rank2 ']
searchres = db_coll.find (Queryargs, projection = projectionfields). Limit (m)
# Result: {' _id ': ' B014i8sx4y ', ' Date ': ' 2017-08-01 ', ' key ': ' 3D TVs ', ' rank2 ': 4.0, ' Sales ': 0}
(5.1.2.6). Regular expression matching: $regex (sql:like)
# example One: Keyword key contains audio substring
# where key like '%audio% '
Queryargs = {' key ': {' $regex ': '. *audio.* '}
# Result: {' _id ' : ' B01m19fgtz ', ' Date ': ' 2017-08-01 ', ' key ': ' Pro Audio ', ' Sales ': 1}
(5.1.2.7). An array must contain elements: $all
# query record, LINKNAMELST is an array that specifies that the Linknamelst field must contain the element ' electronics, Computers & Office '.
db.getcollection ("2018-01-24"). Find ({' linknamelst ': {' $all ': [' Electronics, Computers & Office ']}})

# Query record, LINKNAMELST is an array that specifies that the Linknamelst field must contain two elements, both ' wearable Technology ' and ' Electronics, Computers & Office '.
db.getcollection ("2018-01-24"). Find ({' linknamelst ': {' $all ': [' wearable Technology ', ' Electronics, Computers & Office ']}}
(5.1.2.8). Query by array size
Two ideas: The first idea: using $where (with great flexibility, but slower)
# Pricelst is an array, the target is query Len (PRICELST) < 3 
db.getcollection ("20180306"). Find ({$where: "This.priceLst.length < 3 "})
For $where, please refer to the official documentation: Http://docs.mongodb.org/manual/reference/operator/query/where/. The second idea: to determine whether an element of a specified index in an array exists (which is more efficient) For example: If Len (PRICELST) < 3 is required, it means num[2] does not exist
# Pricelst is an array, the target is query Len (PRICELST) < 3 
db.getcollection ("20180306"). Find ({' pricelst.2 ': {$exists: 0}})
For example: If you ask Len (PRICELST) > 3: It means num[3] exists
# Pricelst is an array, the target is query Len (pricelst) > 3 
db.getcollection ("20180306"). Find ({' pricelst.3 ': {$exists: 1}})
(5.1.3) Specify query criteria
(5.1.3.1). Limited Quantity: Limit
# example one: in descending order of sales, take forward
# Select top _id,key,sales form galance20170801 where key = ' speakers ' ordered by sales desc< C5/>queryargs = {' key ': ' Speakers '}
projectionfields = [' key ', ' sales ']
searchres = Db_coll.find (Queryargs, projection = projectionfields)
topsearchres = Searchres.sort (' Sales ', Pymongo. Descending). Limit (100)
(5.1.3.2). Sorting: sort
# example two: By sales descending, rank ascending
# Select _id,key,date,rank from galance20170801 where key = ' speakers ' ORDER by sales desc,rank< C1/>queryargs = {' key ': ' Speakers '}
projectionfields = [' key ', ' sales ', ' rank ']
searchres = Db_coll.find ( Queryargs, projection = projectionfields)
# sortedsearchres = Searchres.sort (' Sales ', Pymongo. Descending) # single field
sortedsearchres = Searchres.sort ([' Sales ', Pymongo. Descending), (' Rank ', Pymongo. Ascending)] # Multiple fields
# results:
# {' _id ': ' b000289dc6 ', ' key ': ' Speakers ', ' rank ': 3.0, ' Sales ':
# {' _id ': ' B00 1vrj5d4 ', ' key ': ' Speakers ', ' rank ': 5.0, ' Sales ': 120}
(5.1.3.3). Statistics: Count
# example three: statistic matching records Total
# Select COUNT (*) from galance20170801 where key = ' speakers '
Queryargs = {' key ': ' Speakers '}
  searchresnum = Db_coll.find (Queryargs). Count ()
# Result:
# 106
5.2. Add a record 5.2.1. Single Insert
# example One: Specifies _id, if repeated, generates an exception
id = ' Firstrecord '
insertdate = ' 2017-08-28 '
count = ten
Insert_record = {' _id ': I D, ' enddate ': insertdate, ' Count ': Count}
insert_res = Db_coll.insert_one (Insert_record)
print (f "insert_id={ INSERT_RES.INSERTED_ID}: {Insert_record} ")
# Result: Insert_id=firstrecord: {' _id ': ' Firstrecord ', ' enddate ': ' 2017-08-28 ', ' Count ': 10}
# example Two: Do not specify _id, automatically generate
insertdate = ' 2017-10-10 '
count =
Insert_record = {' EndDate ': insertdate, ' count ': Coun T}
insert_res = Db_coll.insert_one (Insert_record)
print (f "insert_id={insert_res.inserted_id}: {Insert_ Record} ")
# Result: insert_id=59ad356d51ad3e2314c0d3b2: {' enddate ': ' 2017-10-10 ', ' count ': ' _id ': ObjectId (' 59ad356d51ad3e2314c0d3b2 ')}
5.2.2. BULK Insert
# more efficient, but be aware that if you specify _id, you must not repeat
# ordered = True, encounter an error break, throw an exception
# ordered = False, encounter an error continue, throw an exception after the loop ends
Inser Trecords = [{' I ': I, ' date ': ' 2017-10-10 '} for I in range ()]
insertbulk = Db_coll.insert_many (insertrecords, ordered = True)
print (f "Insert_ids={insertbulk.inserted_ids}")
# Result: Insert_ids=[objectid (' 59AD3BA851AD3E1104A4DE6D '), ObjectId (' 59ad3ba851ad3e1104a4de6e '), ObjectId (' 59ad3ba851ad3e1104a4de6f '), ObjectId ( ' 59ad3ba851ad3e1104a4de70 '), ObjectId (' 59ad3ba851ad3e1104a4de71 '), ObjectId (' 59ad3ba851ad3e1104a4de72 '), ObjectId (' 59ad3ba851ad3e1104a4de73 '), ObjectId (' 59ad3ba851ad3e1104a4de74 '), ObjectId (' 59ad3ba851ad3e1104a4de75 '), ObjectId (' 59ad3ba851ad3e1104a4de76 ')]
5.3. Modify the Record
# Update this record according to the filter criteria _id. Insert this record if no record is found (Upsert = True)
updatefilter = {' _id ': item[' _id ']}
updateres = Db_coll.update_one (filter = Updatefilter,
                               update = {' $set ': Dict (item)},
                               Upsert = True)
print (f "updateres = matched:{ Updateres.matched_count}, Modified = {Updateres.modified_count} ")
# update Some fields according to filter criteria: I is the original field, Isupdated is the new field
Filterargs = {' Date ': ' 2017-10-10 '}
Updateargs = {' $set ': {' isupdated ': True, ' i ':
updateres = db_coll.update_many (filter = Filterargs, update = Updateargs)
print (f "updateres: Matched_count={updateres.matched_count}, "
      F" Modified_count={updateres.modified_count} modified_ids={ UPDATERES.UPSERTED_ID} ")
# Results: updateres:matched_count=8, modified_count=8 Modified_ids=none

5.4. Delete Record 5.4.1. Delete a record

# example one: Same as the criteria used by the query
Queryargs = {' EndDate ': ' 2017-08-28 '}
Delrecord = Db_coll.delete_one (Queryargs)
print (f "Delrecord={delrecord.deleted_count}")
# Results: delrecord=1
5.4.2. Bulk deletion
# example two: the same criteria as the query used
Queryargs = {' I ': {' $gt ': 5, ' $lt ': 8}}
# Db_coll.delete_many ({})  # empty database
Delrecord = db _coll.delete_many (Queryargs)
print (f "Delrecord={delrecord.deleted_count}")
# Results: delrecord=2
6. Write the database document to the CSV file. 6.1. Standard Code Read CSV file
Import CSV

with open ("Phonecount.csv", "R") as CSVFile:
    reader = Csv.reader (csvfile)
    # There's no need for readlines
    here For line in reader:
        print (f "# line = {line}, Typeofline = {type (line)}, Lenofline = {len (line)}")
# Output results are as follows:
Li NE = [' 850 ', ' rest ', ' n ', ' NN '], typeofline = <class ' list ', Lenofline = 4 line
= [' 9865 ', ' min ', ' 1 ', ' CD '], Typeofline = <class ' list ', Lenofline = 4
Write CSV file
# Export the standard template import of all records of the database Pymongo import CSV # Initialize database Mongo_url = "127.0.0.1:27017" db = "databaseName" TABLE = "Tabl" ename "client = Pymongo. Mongoclient (mongo_url) db_des = client[database] db_des_table = db_des[table] # writes data to a CSV file # If you export directly from Mongod Booster, once you have Part of the field is missing, then the result of the dislocation of the problem # newline= ' role is to prevent the result of a blank row in the data, exclusive belonging to Python3 with open (f "{database}_{table}.csv", "W", newline= ") as Csvfilewriter:writer = Csv.writer (csvfilewriter) # First write column name # write first row, field name fieldlist = ["_id", "  ItemType "," Field_1 "," Field_2 "," Field_3 ",] Writer.writerow (fieldlist) allrecordres = Db_des_table.find () # writes multiple rows of data for the record in Allrecordres:print (f "record = {record}") Recordvalu Elst = [] for field in fieldlist:if field not in Record:recordValueLst.append ("None" ) Else:recordValueLst.append (Record[field]) Try:writer.writerow (Recordva LUELST) except Exception as E:print (f "Write csv Exception. E = {e}")
 
6.2. Problems that may arise and solutions 6.2.1. Write CSV file encoding problemReference article: Python unicodeencodeerror: ' GBK ' codec can ' t encode character solution:

Http://www.jb51.net/article/64816.htm Important point: the encoding of the target file is the culprit leading to the problem that the title refers to. If we open a file, under Windows, the default encoding for the new file is GBK, so that the Python interpreter uses GBK encoding to parse our network data stream txt, but TXT is already a decode Unicode encoding, This will lead to resolution, the problem arises. The solution is to change the encoding of the target file. Solution:

###### It is really recommended that you specify the encoding format when you open a file: with
open (f "{database}_{table}.csv", "w", newline= ', encoding= ' utf-8 ') as Csvfilewriter:
# Just like when we write a CSV file in a Windows environment, the default encoding is ' GBK ', and most of the data obtained from the Internet is ' utf-8 ', which may be a problem with some encoding incompatibility. For example: Write CSV exception. E = ' GBK ' codec can ' t encode character ' \xae ' in position 80:illegal, multibyte sequence
6.2.2. Write CSV file with blank line (there is a row between lines) python2.x version
Description and solution, please refer to: https://www.cnblogs.com/China-YangGISboy/p/7339118.html</
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.