Remove duplicate records from mongodb group query statistics

Source: Internet
Author: User
Tags mongodb version

Remove duplicate records from mongodb group query statistics

The mongodb version is, MongoDB shell version: 2.4.4
The operating environment and shell window are as follows:

[mongo_user@mongodb_dbs ~]# mongo --port 30100MongoDB shell version: 2.4.4connecting to: 127.0.0.1:30000/testmongos> mongos> use posswitched to db posmongos> 
1. count the number of group records first, and use the paymentOrder field to group statistics. All the statistical results are queried.
// Here, group statistics show the value of the group field paymentOrder _ id, the maximum objectid value max_id, and the number of group statistics countvar group = ([{$ group: {_ id: "$ paymentOrder", max_id: {$ max: "$ _ id"}, count: {$ sum: 1 }}, {$ sort: {count: -1}])
2. The definition is to find duplicate groups and use the pipeline operator. Match. The condition is in the format of a common query, Format of the group output result:
var match ={"$match":{"count" : {"$gt" : 2}}};
3. Finally, we can use the aggregate framework function db. paymentinfo. aggregate (group, match) to obtain the group with duplicate data. This process seems complex, in fact, only the implementation of group by in the T-SQL... Having... .
var ds=db.paymentinfo.aggregate(group, match);

PS: the match is invalid. A lot of data with the count value of 1 is displayed, that is, {"match": {"count": {"gt": 2 }}; failed, why?

4. Back up the data before deletion.

Backup

/usr/local/mongodb/mongodb-linux-x86_64-2.4.4/bin/mongoexport --port 30000 -d pos -c paymentinfo -o /home/backup/mongodb_pos_paymentinfo_3.txt
5. Start to delete cyclically

Here ds is a large result set. You can use ds. result to obtain the data queried by grouping in the result set:

// The following starts to enable loop traversal. The result produced by aggregate already has the array feature. You can directly process var ds = db for the for loop. paymentinfo. aggregate (group, match); for (var I = 0; I  1. filter out data that has not been duplicated, and only process data that has already been duplicated. if (count> 1) {var oid = child. max_id; print (count); // obtain the objectid var payorder = child of the set record of the largest objectid in the group. _ id; // retrieve all records of repeated paymentOrder and query them to traverse var ps = db. paymentinfo. find ({"paymentOrder": payorder}); // directly find and use toArray () for processing to convert to an array to traverse var psc = ps. toArray (); for (var j = 0; j <psc. length; j ++) {= "" var = "" pchild = "psc [j];" traverses objectid. If it is the largest record, it is retained, remove = "" if (oid. tostr Ing () = "= pchild. _ id. toString () {"print (" the = "" same = "" one "); print (pchild. _ id. tostring (); = "" print (oid. tostring (); = ""} else {= "" other = "" one = "" ----- "); print (pchild. _ id. tostring (); print (oid. tostring (); db. paymentinfo. remove ({"_ id": pchild. _ id}); = ""} = ""} by the way: if you copy my script, an error is reported when you run THE mongos client in shell, you can remove all the line breaks or manually enter them to execute them, so that no error is reported. 

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.