When engaged in the e-commerce channel operation, every key time node, big promotion, the end of the quarter and so on, we have to do one thing is the brand pool rating, update all the shop level. For example, so the merchant is divided into Ska,ka, ordinary shop, new shop These 4 levels, for different levels of merchants, will give different degree of traffic support or advertising strategy. Generally speaking, in a certain period of time, the evaluation of the dimensions can be: UV, booking amount, praise rate, pin back amount, ad bit CTR, conversion rate, PC-side traffic, mobile phone-side traffic, guest unit price ... n multiple dimensions, how can we find an algorithm in these n multiple dimensions to divide our brand into 4 levels? Today's discussion of the K-means clustering algorithm is one of the e-commerce channel based on 296 brands of weekly sales of real data, we come to the brand Pool division.
First, the K-means clustering algorithm can be described in the following steps:
1. Random selection of K centroid (Centroids);
2, calculate the distance from the K centroid of each data point, select a centroid with the smallest distance as the owning group of the data point. For example, if a data point is closest to the center of mass, then it belongs to the # # group.
3, update the coordinates of the centroid, the data point coordinates of each group to calculate the average, to obtain a new centroid location and update.
4, repeat the second and third steps n times.
where k and n are specified in advance.
In order to visualize the K-means run process, we only take 296 of the brand's 2 dimensions: UV and booking amount. The main control code is as follows:
Percent ================= part 1:load data ====================fprintf (' Load parameters.\n\n ');p kg load io;tmp = xlsread (' Dat A.xlsx '); Id=tmp (:, 1); X=tmp (:, 2:3); percent =================== part 2:set parameters ======================k = 4;max_iters = 10;%% ================ = = = Part 3:k-means Clustering ======================fprintf (' \nrunning k-means clustering on Example dataset.\n\n '); Initial_centroids = Kmeansinitcentroids (x,k);% Run K-means algorithm. The ' true ' at the end tells we function to plot% the progress of k-means[centroids, idx] = Runkmeans (X, Initial_centroids , Max_iters, True); fprintf (' \nk-means done.\n\n ');
K-means Clustering Algorithm Core code:
function [Centroids, idx] = Runkmeans (X, Initial_centroids, ... Max_iters, plot_progress) [m n] = size (X); K = Size (initial_centroids, 1); centroids = Initial_centroids;previous_centroids = Centroids;idx = Zeros (m, 1);% Run K-Mean SFOR i=1:max_iters % Output Progress fprintf (' K-means Iteration%d/%d...\n ', I, max_iters); If exist (' octave_version ') fflush (stdout); End example in X, assign it to the closest centroid idx = findclosestcentroids (X, centroids); % Given The memberships, compute new centroids centroids = Computecentroids (X, IDX, K); endend
Select the algorithm for the nearest centroid:
function idx = Findclosestcentroids (X, centroids) K = Size (centroids, 1), idx = zeros (Size (x,1), 1), M = size (x,1); for (i = 1: m) distance =-1; index =-1; for (j=1:k) e = X (i,:)-centroids (J,:); d_tmp = E*e '; if (distance = =-1) distance = d_tmp; index = j; else if (d_tmp<distance) distance = d_tmp; index = j; endif endif endfor idx (i) = Index;endforend
Algorithm for recalculating centroid and initializing centroid:
function centroids = computecentroids (x, IDX, K) [m n] = size (X), centroids = Zeros (K, n); num = zeros (k,1); for (i = 1:m)
c = idx (i,:); Centroids (c,:) + = X (i,:); Num (c,:) ++;endforcentroids = centroids./num;function centroids = kmeansinitcentroids (x, K) centroids = zeros (k, size (x, 2 )); randidx = randperm (Size (x, 1)), Centroids = X (Randidx (1:k),:); end
After 10 iterations, the results of the grouping are as follows:
In my local raw data table, there are about 20 dimensions to measure the operation of each store, according to the K-means clustering algorithm can be easily categorized, although it cannot be visualized, but the principle is identical to the two-dimensional k-means.
Level division of e-commerce merchants based on K-means clustering clustering algorithm (including octave simulation)