MONGO data is often too large to be put into memory for analysis, and if a dictionary is used to store each document directly in Python, the use of lists for storing data will soon be covered with memory. Models with NumPy and pandas
ImportNumPyImportPymongoc=Pymongo. Mongoclient () Collection=C.mydb.collectionnum=Collection.count () Arrays= [Numpy.zeros (num) forIinchRange (5) ] forI, recordinchEnumerate (Collection.find ()): forXinchRange (5): Arrays[x][i]= record["x%i"% x+1] forArrayinchArrays#prove that we did something ... PrintNumpy.mean (Array)
The above code, when dealing with large amounts of data, finds that the key to consuming time is the iteration of the Pymongo cursor, which has a C-written library monary to directly implement the conversion to improve efficiency.
fromMonaryImportmonaryImportNumpywith monary ("127.0.0.1") as Monary:arrays=Monary.query ("MyDB",#Database name "Collection",#Collection Name{},#Query Spec["X1","X2","X3","X4","X5"],#field names (in Mongo record)["float64"] * 5#monary field types (see below) ) forArrayinchArrays#prove that we did something ... PrintNumpy.mean (Array)
What about the conversion to pandas?
Refer here
How to quickly extract data from MONGO to NumPy and pandas