Remove duplicate records from mongodb group query statistics
The mongodb version is, MongoDB shell version: 2.4.4
The operating environment and shell window are as follows:
[mongo_user@mongodb_dbs ~]# mongo --port 30100MongoDB shell version: 2.4.4connecting to: 127.0.0.1:30000/testmongos> mongos> use posswitched to db posmongos>
1. count the number of group records first, and use the paymentOrder field to group statistics. All the statistical results are queried.
// Here, group statistics show the value of the group field paymentOrder _ id, the maximum objectid value max_id, and the number of group statistics countvar group = ([{$ group: {_ id: "$ paymentOrder", max_id: {$ max: "$ _ id"}, count: {$ sum: 1 }}, {$ sort: {count: -1}])
2. The definition is to find duplicate groups and use the pipeline operator. Match. The condition is in the format of a common query, Format of the group output result:
var match ={"$match":{"count" : {"$gt" : 2}}};
3. Finally, we can use the aggregate framework function db. paymentinfo. aggregate (group, match) to obtain the group with duplicate data. This process seems complex, in fact, only the implementation of group by in the T-SQL... Having... .
var ds=db.paymentinfo.aggregate(group, match);
PS: the match is invalid. A lot of data with the count value of 1 is displayed, that is, {"match": {"count": {"gt": 2 }}; failed, why?
4. Back up the data before deletion.
Backup
/usr/local/mongodb/mongodb-linux-x86_64-2.4.4/bin/mongoexport --port 30000 -d pos -c paymentinfo -o /home/backup/mongodb_pos_paymentinfo_3.txt
5. Start to delete cyclically
Here ds is a large result set. You can use ds. result to obtain the data queried by grouping in the result set:
// The following starts to enable loop traversal. The result produced by aggregate already has the array feature. You can directly process var ds = db for the for loop. paymentinfo. aggregate (group, match); for (var I = 0; I 1. filter out data that has not been duplicated, and only process data that has already been duplicated. if (count> 1) {var oid = child. max_id; print (count); // obtain the objectid var payorder = child of the set record of the largest objectid in the group. _ id; // retrieve all records of repeated paymentOrder and query them to traverse var ps = db. paymentinfo. find ({"paymentOrder": payorder}); // directly find and use toArray () for processing to convert to an array to traverse var psc = ps. toArray (); for (var j = 0; j <psc. length; j ++) {= "" var = "" pchild = "psc [j];" traverses objectid. If it is the largest record, it is retained, remove = "" if (oid. tostr Ing () = "= pchild. _ id. toString () {"print (" the = "" same = "" one "); print (pchild. _ id. tostring (); = "" print (oid. tostring (); = ""} else {= "" other = "" one = "" ----- "); print (pchild. _ id. tostring (); print (oid. tostring (); db. paymentinfo. remove ({"_ id": pchild. _ id}); = ""} = ""} by the way: if you copy my script, an error is reported when you run THE mongos client in shell, you can remove all the line breaks or manually enter them to execute them, so that no error is reported.