Using Ml.net to implement customer value analysis based on RFM model

Source: Internet
Author: User
Tags day and date dotnet

RFM model

In many customer value analysis models, the RFM model is widely used, especially in the retail and enterprise Services areas as a classic classification method. Its core definition is from the basic transaction data, with the help of the appropriate clustering algorithm, reflecting the more intuitive classification of customers, for the lack of data analysis and machine learning technology support of the start-up enterprises, it is a simple and easy to use one of the customer analysis approach.

The RFM model has three main indicators:

Recency: Last Consumption time interval

Frequency: Consumption frequency

Monetary: Consumption amount

We rate our customers on these three indicators, then there will be a total of 27 combinations, using the K-means algorithm, can be reduced to a specified limited number of bins (typically 5 classes), the calculation of each customer in the bin location is the customer value.

There are, of course, more derived versions of the RFM model, which can refer to WIKI:RFM (customer value).

ML. NET and K-means

ML. NET self-v0.2 version provides the implementation of k-means++ clustering, which is the most common training for unsupervised learning, and is suitable for machine learning for the classification of RFM models.

Hands-on practice Basic requirements
    • Visual Studio 2017 or Visual Studio Code
    • DotNet Core 2.0+
    • Ml.net v0.3
Data sources

This case data comes from Uci:online retail, a multinational data set that includes all transactions that occurred in the UK-registered non-store online retail business during the period from December 1, 2010 to December 9, 2011. The company sells unique full-time gifts. Many of the company's customers are wholesalers.

Property Information:

Invoiceno: Invoice number. The nominal value, which is the only 6-bit integer assigned to each transaction. If this code starts with the letter ' C ', it means cancel.

Stockcode: Product (Project) code. A nominal value, a unique 5-bit integer assigned for each different product.

Description: Product (Project) name. Nominal.

Quantity: The number of each product (item) per transaction. Digital.

Invoicedate: Invoice date and time. Number that generates the date and time of each transaction.

UnitPrice: Unit Price. Number, GBP unit product price.

CustomerID: Customer number. The nominal value, which is a unique 5-bit integer assigned to each customer.

Country: Country name. The name of the country/region in which each customer is located.

Data processing
    1. Using Excel, you add 4 fields to the original data, amount (the result of the amount, the unit price multiplied by the quantity), date (the integer value of the Invoicedate), Today (the integer value of the day's date), and the DateDiff (the difference between the day and date).

    1. Establish a perspective, get the sum of each customer on the amount, the maximum and minimum value of DateDiff, and calculate the frequency value by calculating the formula Amount/(DateDiff最大值-DateDiff最小值+1) .

    1. Calculate the score for RFM by following these rules
      • R: (DateDiff最大值- DateDiff最小值-2000) The difference, less than 480 meters 3 points, 480-570 between 2 points, 570-750 between 1 points, greater than 750 0 points.
      • F: Frequency value, greater than 1000 meters 5 points, 500-1000 between 4 points, 100-500 meters between 3 points, 50-100 between 2 points, 0-50 1 points, less than 0 0 points.
      • M:amount sum value, greater than 10000 meters 5 points, 5000-10000 between 4 points, 2000-5000 between 3 points, 1000-2000 between 2 points, 0-1000 meters between 1 points, less than 0 0 points.

有小伙伴可能存在疑问,为什么要这么划分,其实这就是对数据分布合理分段的一种思想,为了减小数据源的不平衡性对机器学习的影响,我们尽量使得数据的分布是自然的。
Coding section

Or a familiar flavor, create a dotnet core console application that adds a reference to ml.net through NuGet.

    • Create a data structure for learning
public class ClusteringPrediction{    [ColumnName("PredictedLabel")]    public uint SelectedClusterId;    [ColumnName("Score")]    public float[] Distance;}public class ClusteringData{    [Column(ordinal: "0")]    public string CustomId;    [Column(ordinal: "1")]    public float Amount;    [Column(ordinal: "2")]    public float MinDataDiff;    [Column(ordinal: "3")]    public float MaxDataDiff;    [Column(ordinal: "4")]    public float MeanAmount;    [Column(ordinal: "5")]    public float M;    [Column(ordinal: "6")]    public float F;    [Column(ordinal: "7")]    public float RelativaDataDiff;    [Column(ordinal: "8")]    public float R;}
    • Training Section
static PredictionModel<ClusteringData, ClusteringPrediction> Train(){    int n = 1000;    int k = 5;    var textLoader = new Microsoft.ML.Data.TextLoader(DataPath).CreateFrom<ClusteringData>(useHeader: true, separator: ',', trimWhitespace: false);    var pipeline = new LearningPipeline();    pipeline.Add(textLoader);    pipeline.Add(new ColumnConcatenator("Features",                                        "R",                                        "M",                                        "F"));    pipeline.Add(new KMeansPlusPlusClusterer() { K = k });    var model = pipeline.Train<ClusteringData, ClusteringPrediction>();    return model;}
    • Evaluation Section
static void Evaluate(PredictionModel<ClusteringData, ClusteringPrediction> model){    var textLoader = new Microsoft.ML.Data.TextLoader(DataPath).CreateFrom<ClusteringData>(useHeader: true, separator: ',', trimWhitespace: false);    var evaluator = new ClusterEvaluator();    var metrics = evaluator.Evaluate(model, textLoader);    Console.WriteLine("AvgMinScore:{0}", metrics.AvgMinScore);    Console.WriteLine("Dbi:{0}", metrics.Dbi);    Console.WriteLine("Nmi:{0}", metrics.Nmi);}
    • Forecast section
static void Predict(PredictionModel<ClusteringData, ClusteringPrediction> model){    var predictedData = new ClusteringData()    {        R = 3F,        M = 3F,        F = 1F    };    var predictedResult = model.Predict(predictedData);    Console.WriteLine("the predicted cluster id is: {0}", predictedResult.SelectedClusterId);}
    • Invoke part
static void Main(string[] args){    var model = Train();    Evaluate(model);    Predict(model);}
    • Run results

As you can see, I used to test the customer, was divided into the 2nd class above.

尽管完成了聚类的工作,对于学习出来的这5个类别,仍然需要按原始数据集全部遍历预测出对应的分类,根据客户的RFM评分与分类的对应关系,才能够对每个类别的意义进行有效地解释。
End

This simple case shows you the machine learning that uses ml.net to accomplish clustering. For those who want to get started on their own business and do some low-threshold customer analysis, using ml.net will be a good choice. Of course ml.net is still in the iteration, I hope you continue to focus on the new feature feature release.

The complete code is as follows:

Using microsoft.ml;using microsoft.ml.models;using microsoft.ml.runtime.api;using microsoft.ml.trainers;using Microsoft.ml.transforms;using System;namespace rmfclusters{Class Program {const string DataPath = @ ". \data\        Online Retail.csv "; public class Clusteringprediction {[ColumnName (' Predictedlabel ')] public uint Selectedcluste            RId;        [ColumnName ("Score")] public float[] Distance;            public class Clusteringdata {[Column (ordinal: ' 0 ")] public string customid;            [Column (Ordinal: "1")] public float Amount;            [Column (Ordinal: "2")] public float Mindatadiff;            [Column (Ordinal: "3")] public float Maxdatadiff;            [Column (Ordinal: "4")] public float meanamount;            [Column (Ordinal: "5")] public float M;            [Column (Ordinal: "6")] public float F; [Column (OrdiNAL: "7")] public float Relativadatadiff;        [Column (Ordinal: "8")] public float R;            } static Predictionmodel<clusteringdata, clusteringprediction> Train () {int n = 1000;            int k = 5; var textloader = new Microsoft.ML.Data.TextLoader (DataPath).            Createfrom<clusteringdata> (useheader:true, separator: ', ', trimwhitespace:false);            var pipeline = new Learningpipeline (); Pipeline.            ADD (Textloader); Pipeline.                                               ADD (New Columnconcatenator ("Features", "R",            "M", "F")); Pipeline.            ADD (New Kmeansplusplusclusterer () {k = k}); var model = pipeline.            Train<clusteringdata, clusteringprediction> ();        return model; } static void Evaluate (Predictionmodel<clusteringdata, clusteringprediction> modeL) {var textloader = new Microsoft.ML.Data.TextLoader (DataPath).            Createfrom<clusteringdata> (useheader:true, separator: ', ', trimwhitespace:false);            var evaluator = new Clusterevaluator (); var metrics = Evaluator.            Evaluate (model, textloader); Console.WriteLine ("Avgminscore:{0}", metrics.            Avgminscore); Console.WriteLine ("Dbi:{0}", metrics.            DBI); Console.WriteLine ("Nmi:{0}", metrics.        NMI); } static void Predict (Predictionmodel<clusteringdata, clusteringprediction> model) {var PR            Edicteddata = new Clusteringdata () {R = 3F, M = 5F, F = 1F            }; var predictedresult = model.            Predict (Predicteddata);        Console.WriteLine ("The Predicted cluster ID is: {0}", Predictedresult.selectedclusterid);            } static void Main (string[] args) {var model = Train (); Evaluate (MoDEL);        Predict (model); }    }}

Using Ml.net to implement customer value analysis based on the RFM model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.