Level division of e-commerce merchants based on K-means clustering clustering algorithm (including octave simulation)

Source: Internet
Author: User

When engaged in the e-commerce channel operation, every key time node, big promotion, the end of the quarter and so on, we have to do one thing is the brand pool rating, update all the shop level. For example, so the merchant is divided into Ska,ka, ordinary shop, new shop These 4 levels, for different levels of merchants, will give different degree of traffic support or advertising strategy. Generally speaking, in a certain period of time, the evaluation of the dimensions can be: UV, booking amount, praise rate, pin back amount, ad bit CTR, conversion rate, PC-side traffic, mobile phone-side traffic, guest unit price ... n multiple dimensions, how can we find an algorithm in these n multiple dimensions to divide our brand into 4 levels? Today's discussion of the K-means clustering algorithm is one of the e-commerce channel based on 296 brands of weekly sales of real data, we come to the brand Pool division.

First, the K-means clustering algorithm can be described in the following steps:

1. Random selection of K centroid (Centroids);

2, calculate the distance from the K centroid of each data point, select a centroid with the smallest distance as the owning group of the data point. For example, if a data point is closest to the center of mass, then it belongs to the # # group.

3, update the coordinates of the centroid, the data point coordinates of each group to calculate the average, to obtain a new centroid location and update.

4, repeat the second and third steps n times.

where k and n are specified in advance.

In order to visualize the K-means run process, we only take 296 of the brand's 2 dimensions: UV and booking amount. The main control code is as follows:

Percent ================= part 1:load data ====================fprintf (' Load parameters.\n\n ');p kg load io;tmp = xlsread (' Dat A.xlsx '); Id=tmp (:, 1); X=tmp (:, 2:3); percent =================== part 2:set parameters ======================k = 4;max_iters = 10;%% ================ = = = Part 3:k-means Clustering ======================fprintf (' \nrunning k-means clustering on Example dataset.\n\n '); Initial_centroids = Kmeansinitcentroids (x,k);% Run K-means algorithm. The ' true ' at the end tells we function to plot% the progress of k-means[centroids, idx] = Runkmeans (X, Initial_centroids , Max_iters, True); fprintf (' \nk-means done.\n\n ');

K-means Clustering Algorithm Core code:

function [Centroids, idx] = Runkmeans (X, Initial_centroids, ...                                      Max_iters, plot_progress) [m n] = size (X); K = Size (initial_centroids, 1); centroids = Initial_centroids;previous_centroids = Centroids;idx = Zeros (m, 1);% Run K-Mean SFOR i=1:max_iters        % Output Progress    fprintf (' K-means Iteration%d/%d...\n ', I, max_iters);    If exist (' octave_version ')        fflush (stdout);    End        example in X, assign it to the closest centroid    idx = findclosestcentroids (X, centroids);        % Given The memberships, compute new centroids    centroids = Computecentroids (X, IDX, K); endend

Select the algorithm for the nearest centroid:

function idx = Findclosestcentroids (X, centroids) K = Size (centroids, 1), idx = zeros (Size (x,1), 1), M = size (x,1); for (i = 1: m)  distance =-1;  index =-1;  for (j=1:k)    e = X (i,:)-centroids (J,:);    d_tmp = E*e ';    if (distance = =-1)      distance = d_tmp;      index = j;    else      if (d_tmp<distance)        distance = d_tmp;        index = j;      endif    endif  endfor  idx (i) = Index;endforend

Algorithm for recalculating centroid and initializing centroid:

function centroids = computecentroids (x, IDX, K) [m n] = size (X), centroids = Zeros (K, n); num = zeros (k,1); for (i = 1:m) 
   
    c = idx (i,:);  Centroids (c,:) + = X (i,:);  Num (c,:) ++;endforcentroids = centroids./num;function centroids = kmeansinitcentroids (x, K) centroids = zeros (k, size (x, 2 )); randidx = randperm (Size (x, 1)), Centroids = X (Randidx (1:k),:); end
   

After 10 iterations, the results of the grouping are as follows:

In my local raw data table, there are about 20 dimensions to measure the operation of each store, according to the K-means clustering algorithm can be easily categorized, although it cannot be visualized, but the principle is identical to the two-dimensional k-means.

Level division of e-commerce merchants based on K-means clustering clustering algorithm (including octave simulation)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.