MongoDB version number is, MongoDB shell version:2.4.4
The operating environment, Shell window, is as follows:
[mongo_user@mongodb_dbs ~]# mongo --port 301002.4.4127.0.0.1:30000usepospos
1, the first statistical grouping records, the Paymentorder field to group statistics, to find out all the statistical results, the group statistics of >1
// 这里分组统计出来取分组字段paymentOrder的值_id、最大的objectid值max_id、分组统计数countvargroup=([ {$group:{_id:"$paymentOrder", max_id: {$max:"$_id"$sum1 }}}, {$sort:{count:-1}}])
2, the definition is to find the existence of duplicate groups, using the pipe operator maTCh,Strippiecesis aPUPassCheckInquiryof theLattice-,butis amadewithin The format of the output result of group:
var match ={"$match":{"count" : {"$gt"2}}};
3, finally, by aggregating the frame function db.paymentinfo.aggregate (group, match), there is a grouping of duplicate data. This process may seem complicated, but in fact it's just the group by in T-SQL ... having ... 's syntax.
var ds=db.paymentinfo.aggregate(groupmatch);
PS: Here match is invalid, out of a lot of data count is 1,that is {"match": {"Count": {"GT": 2}}; failure, why?
4, back up the first before deleting.
Backup
/usr/local/mongodb/mongodb-linux-x86_64-2.4.4--30000-d-c paymentinfo -o /home/backup/mongodb_pos_paymentinfo_3.txt
5, start loop Delete
Here DS is a large result set, directly with Ds.result can get the result set inside the packet query out of the data:
////below start loop to traverse, aggregate out the result has the characteristics of the array, you can directly for the loop processing vards = Db.paymentinfo.aggregate (group, match); for(vari =0; I <ds.result.length; i++) {varChild=ds.result[i];varCount=child.count;//Because {"$match" {"Count": {"$GT": 2}}} for the second step above, the filter is invalid, so add a count>1 here to filter out the data that is not duplicated and only perform data processing operations that have duplicates. if(count>1){varoid=child.max_id;Print(count);//Here Gets the Objectid of the collection record of the largest Objectid in the group varpayorder=child._id;//Get duplicate Paymentorder for all records queried out and traverse varPs=db.paymentinfo.find ({"Paymentorder":p Ayorder});//Direct find requires toarray () to be processed into an array so that it can traverse varPsc=ps.toarray (); for(varj=0; j<psc.length; J + +) {varPCHILD=PSC[J];//Objectid will be traversed, if it is the largest record retained, do not delete remove if(Oid.tostring () ==pchild._id.tostring ()) {Print("The same One");Print(Pchild._id.tostring ());Print(Oid.tostring ()); }Else{Print("The other one -----");Print(Pchild._id.tostring ());Print(Oid.tostring ());d B.paymentinfo.remove ({"_id":p child._id}); } } } }
By the way: If you copy my script, go to the shell of the MONGOs client to perform an error, there may be a malformed format, you can remove all the newline symbols or you manually input again, to execute, there will be no error.
MongoDB Group Query statistics remove duplicate records