MONGODB statistical retention rate with MapReduce

Source: Internet
Author: User
Tags assert echo date emit

MONGODB statistical retention rate with MapReduce

(Kyung's Column)

The definition of retention is based on
New account number X day: New account added in the date of the new date on the X-day of the login behavior is recorded as retained

The output is as follows: (similar to the League of Friends retention rate display)
Retained users
Registration time new User retention rate
1 days after 2 days 3 days after 4 days 5 days after 6 days 7 days after 14 days later
2015-09-17 2300 20.7 15.6% 13 11.3% 9.9
2015-09-18 2694 21.8 14.8% 11.5 10.5
2015-09-19 3325 19 11.4% 10.3
2015-09-20 3093 16.2 11.9%
2015-09-21 2303 20.5

Server record new account to Retention.register collection,
Log in daily to the Retention.login collection,
Run a statistical script daily to count the retention rate of the day before.

The following collections are related to MongoDB retention rates,
In addition to Retention.register and retention.login written by server code,
Other collections are generated by a statistical script.

Retention.register
========================
For the retention rate statistics, create a new account.
Record the date that the new account was created.
The following fields are available:
Platform, platform name
ACCOUNT_ID, account number
Date, registration Day, string, format: "2015-01-01"
For example: {platform: "Baidu", account_id: "Jinqing", Date: "2015-09-20"}
Index (platform, account_id), (date)
Used to count the number of new accounts added daily.

Retention.login
==================
Retention rate statistics, account login record.
The following fields are available:
Date, Login
Platform, platform name
ACCOUNT_ID, account number
Register_date, account registration date
For example: {date: "2015-09-23", Platform: "Baidu", account_id: "Jinqing", Register_date: "2015-09-20"}
Index (date, platform, account_id).

Retention.result
===================
Retention rate results. For example:
{Date: "2015-09-01", register:3344, 1:91.1, 2:82.2, 3:73.3, 4:64.4, 5:55.5, 6:46.6, 7:37.7, 14:14, 30 : 3.33}
{Date: "2015-09-02", register:3344, 1:91.1, 2:82.2, 3:73.3, 4:64.4, 5:55.5, 6:46.6, 7:37.7, 14:14, 30 : 3.33}
The mongoexport can be exported as a CSV table file.
For example:
D:\mongodb\bin>mongoexport-h localhost-d mydb-c retention.result-f date,register,1,2,3,4,5,6,7,14,30--csv-o d:\t Emp\retention.csv
which
Date: Registration Dates
Register: Number of new registrations
,... 7,14,30:1th Day, 2nd, ... Percentage of retention on 30th, 7th, 14th


Retention Rate Statistics Script
--------------
Linux under Crontab,
Under Windows with scheduled Tasks,
Run the statistics script 00:30 daily.

Allowed to run for a few days, the runtime is counted from the last run to the day.
If this is the first run, the statistics start from the earliest date of the Retention.register collection.
Running multiple times a day does not affect the results.
However, you cannot run multiple instances at the same time.

The client needs to be MONGO.
Can be run on the MONGO host.

MONGO My.mongo.host Retention.js
Build results in the Mydb.retention.result collection, you can export the Mongoexport to a CSV file.


#!/bin/sh# retention.sh# The Daily hours of the morning, statistical retention rate. # requires MONGO client. # The following needs to be changed to the actual directory, which will run under this directory. cd/home/jinq/retention/# The following address should be changed to Mongod server address. Mongodb=192.168.8.9mongo ${mongodb} retention.js >> log.txtecho Mongo export retention result...mongoexport-h ${ MONGODB}-D mydb-c retention.result   --sort ' {"Value.date": 1} '   -f value.date,value.register,value.1,value.2, value.3,value.4,value.5,value.6,value.7,value.14,value.30   --type=csv-o retention_tmp.csv  DATE= ' DATE +%Y% m%d ' file=retention_${date}.csv# CSV replacement column header echo date, registration number, 1st, 2nd, 3rd, 4th, 5th, 6th, 7th, 14th, 30th > ${file}tail-n +2 retention_ Tmp.csv >> ${file}echo Done ${file}!

Retention Rate Statistics script//reference document: Retention rate statistics. txt//usage://MONGO My.mongo.host retention.jsprint (Date ());d B = Db.getsisterdb ("MyDB"); Use Mydbvar StartDate = Getstartdate (), var endDate = formatdate (New Date ());p rint ("Calculating retention rate of [" + S    Tartdate + "," + EndDate + "), if (StartDate < endDate) {Insertdefaultresult (startdate);    Calcregistercount (StartDate);    Calcretention (StartDate);    Print (Date ()); Print ("done."); else {print ("do nothing.");} Internal functions.//Gets the statistics start date, the previous statistics have been completed and no redo is required. Returns the string, in the format: "2015-01-01"//Gets the maximum date of the Retention.result + 1 days, and only the data for that day and beyond will be processed. If it is first run, Retention.result is empty and must read the earliest date of the Retention.register as the beginning.    function Getstartdate () {var lastresultdate = getlastresultdate ();    if (null = = Lastresultdate) {return getfirstregisterdate (); }//Plus one day return getnextdate (lastresultdate);} Gets the earliest Retention.register date. function Getfirstregisterdate () {var cursor = Db.retention.register.find ({date: {$gt: "2015-09-01 "}},//Remove null {_id:0, date:1}). Sort ({date:1}). Limit (1);    if (Cursor.hasnext ()) {return Cursor.next (). Date; } return FormatDate (New Date ());} Gets the last Date field in the Retention.result. Null is returned without a date field. Normal returns such as: "2015-01-01" function getlastresultdate () {///_id is a date string var cursor = Db.retention.result.find ({}, {_    Id:1}). Sort ({_id:-1}). Limit (1);    if (Cursor.hasnext ()) {return Cursor.next (). _id; } return null; function add0 (m) {return M < 10? ' 0 ' + m:m;}    Return likes: "2015-01-02" function formatdate (date) {var y = date.getfullyear ();  var m = date.getmonth () + 1;    1..12 var d = date.getdate (); Return y + '-' + add0 (m) + '-' + add0 (d);}    "2015-12-31", "2016-01-01" function Getnextdate (datestr) {var dateobj = new Date (datestr + "00:00:00");    var nextdaytime = dateobj.gettime () + 24 * 3600 * 1000;    var nextdate = new Date (nextdaytime); return FormatDate (nextdate);} ASSERT (Getnextdate ("2015-12-31 ") = =" 2016-01-01 "), Assert (Getnextdate (" 2015-01-01 ") = =" 2015-01-02 "), Assert (Getnextdate (" 2015-01-31 ") = =" 2015-02-01 ");//insert default result. In some days without a new registration, MapReduce does not generate the result and must be forced to insert.    function Insertdefaultresult (startdatestr) {var docs = new Array ();    var enddatestr = formatdate (New Date ());        for (var datestr = startdatestr;        Datestr < Enddatestr;    Datestr = Getnextdate (datestr)) {Docs.push ({_id:datestr, value: {Date:datestr, register:0}});    }//For Db.retention.result.insert (docs); }//read Retention.register collection,//Calculate daily new registrations, record in retention.result.value.register Field//StartDate is like: "2015-01-01" functio        n Calcregistercount (startdate) {var mapfunction = function () {var key = This.date;        var value = {date:key, register:1};    Emit (key, value);  };        mapfunction var reducefunction = function (key, values) {var reducedobject = {date:key, register:0};     Values.foreach (function (value) {           Reducedobject.register + = Value.register;    }) return reducedobject;  };    reducefunction var endDate = formatdate (New Date ()); Db.retention.register.mapReduce (Mapfunction, reducefunction, {query: {date: {$gte: StartDate, $lt: endd  ATE}}, out: {merge: "Retention.result"}}); MapReduce ()}//function Calcregistercount ()//Read Retention.login collection,//Calculate retention rate, save in Retention.result collection. StartDate is like: "2015-01-01" function calcretention (startdate) {var mapfunction = function () {var key = T        His.register_date;        var registerdateobj = new Date (this.register_date + "00:00:00");        var logindateobj = new Date (this.date + "00:00:00");        var days = (logindateobj-registerdateobj)/(24 * 3600 * 1000);        var value = {date:key, register:0};  var field = days + "_count";        Like:1_count Value[field] = 1;    Emit (key, value);  };       Mapfunction var reducefunction = function (key, values) {var reducedobject = {date:key, register:0};            for (var i = 1; i <=; i++) {var field = i + "_count";        Reducedobject[field] = 0;                } Values.foreach (function (value) {reducedobject.register + = Value.register;  for (var i = 1; i <=; i++) {var field = i + "_count";                    Like:1_count var count = Value[field];                    if (null! = count) {Reducedobject[field] + = count;    }//if}//for}/function)//Values.foreach () return reducedobject;  };            Reducefunction () var finalizefunction = function (key, reducedval) {if (0 = = Reducedval.register)        return reducedval;  for (var i = 1; i <=; i++) {var field = i + "_count"; 1_count var count = ReducedvaL[field];        Reducedval[string (i)] = count * 100/reducedval.register;    } return reducedval;  };    finalizefunction var endDate = formatdate (New Date ()); Db.retention.login.mapReduce (Mapfunction, reducefunction, {query: {date: {$gte: StartDate, $lt: endDate  }}, out: {reduce: "Retention.result"}, Finalize:finalizefunction,}); MapReduce ()}//function calcretention ()


Reference
-----

User Retention Rate _ Baidu Encyclopedia
Http://baike.baidu.com/link?url=28-agScaamT__jLEBdn5VW-a6CHRlf53bDUrVezkeaHd6TMhO0ULm_9JMmcOu541taQjWGe0JypERg2hIwJCAa

Game player retention statistics implementation-stream sub column-Blog channel-csdn.net
http://blog.csdn.net/jiangguilong2000/article/details/16119119

How is the retention rate counted in the MONGO database? -Segmentfault
http://segmentfault.com/q/1010000000652638

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

MONGODB statistical retention rate with MapReduce

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.