[R Language] Association rule 2---Consider the strict timing relationship between items

Source: Internet
Author: User

The Association rule 1 is described earlier--- does not take into account the timing relationship between items purchased by the user, but in some cases the user buys the item in a strictly sequential relationship, such as in some casual games, The user purchased a prop A to buy prop B, and props A and B can only be purchased once, that is, the purchase of props A is a sufficient condition to buy prop B, if the purchase of a user will usually buy props a, when not considering the timing relationship, will come to "bàa" such association rules, This will give the operating colleagues this conclusion: "Buy a prop B users are also very likely to buy item A, when the user buys a prop B should be recommended to it" a ", which is not a problem from the data point of view, but from a business point of view is completely wrong, Because the person who bought the Prop B must have purchased a prop a, and the prop AB can only buy once, it is useless to recommend prop a again.

Based on this background, this article describes the---to consider The strict timing relationship between items to analyze the user props purchase Path and association rules mining. (The code and data set required for this article can be downloaded here .)

This article focuses on the R language Implementation of association rules and the Visualization of association rules, here does not explain the principle of association rules, can refer to the Baidu Encyclopedia---Association rules, wikipedia---Apriori algorithm, Wikipedia---association rule Learning

Table of Contents 0. Create a data set    for purchase records 1. Convert the purchase record to 0-1 matrix     2. Get each user's prop purchase path     3. Execute the Apriori algorithm and remove the    redundancy Rule 4. Visualization of association Rules    
0. Create a data set for purchase records

The following creates a 1W purchase record data set, one row for a user, the columns are: User ID, item name pname, paid amount amount, time of purchase

The style of the data is as follows:

The code to create the simulation dataset is explained in detail, please refer to the previous section, where only the code is posted:

RM (list=ls ()) SETWD ("E:/cnblogs")#The following creates a data set for a 1W purchase record:#The columns are: User ID, item name pname, paid amount amount, time of purchase## #有放回地抽取1W个从10000000到10002000, as User IDUid<-sample (10000000:10002000,10000,replace=T)## #将日期限定在20160401 10:01:01~20160408 10:01:01Start_time<-as.numeric (AS. POSIXCT ("2016/04/01 10:01:01", format="%y/%m/%d%h:%m:%s")) End_time<-as.numeric (AS. POSIXCT ("2016/04/08 10:01:01", format="%y/%m/%d%h:%m:%s")) time<-sample (start_time:end_time,10000,replace=T)#combine the two into a single data frame ordersorders<-Data.frame (uid,time) head (orders)## #下面用P1 ~p20 to indicate the name of the item purchasedPname_list<-c (1:20) for(Iinch1:20) {Pname_list[i]<-paste ('P', i,sep="")}#randomly pass the item name to the 1W lineorders$pname<-'P1' for(Iinch1:20) {orders[sample (1:nrow (Orders), 1000,replace=t),'pname']<-Pname_list[i]}orders$pname<-as.factor (orders$pname)#randomly pass the paid amount amount (1 to 50) to the 1W lineOrders$amount<-10 for(Iinch1:50) {orders[sample (1:nrow (Orders), 1000,replace=t),'Amount']<-i}head (Orders) Summary (orders)#Write data set back to localWrite.table (Orders,'Orders_test.txt', sep='\ t', row.names = F,col.names = T)
1. Convert purchase record to 0-1 matrix

The above is just the first step: Create a DataSet. The second step is to convert the purchase record to the 0-1 matrix, where the row represents the user, the column represents the product, and 1 indicates that the user purchased the item.

#读取数据集

Payer<-read.table ("orders_test.txt", sep='\ t', header =T) Head (payer) Dim (payer)

#将数据按照uid, Pname,time the item "PName" purchased in the same user ID, sorted from small to large by time of purchase

Library (sqldf) payer2<-sqldf ("select Uid,pname,time from payer GROUP by Uid,pname,time Order by Uid,time")

#数据样式如下

Head (PAYER2)

#从数据来看记录已经按照时间先后顺序来排列, remove the 3rd column of time

PAYER3<-PAYER2[,-3]

#将用户id转换为因子型, it's for the back split function.

Payer3$uid<-as.factor (Payer3$uid)
2. Get the item purchase path for each user

#将道具名称pname按照相同的uid进行分组

Trans.list<-split (payer3[,'pname'],payer3[,'uid' )

#此时相当于得到了用户的购买路径了, but there may be a case where a user repeats the purchase of a prop

Head (trans.list) str (trans.list) # A total of 1991 user purchase Paths

#测试一下 to see if the order of purchase is in chronological order

trans.list['10000003'# view uid=10000003 user purchases of props. 

Judging from the test, the data in the trans.list is arranged in chronological order.

# # # #将数据变成关联规则函数Apriori可用的transactions形式

Library (arules) trans<-as (trans.list,'transactions'

#因为存在 "A user has repeatedly purchased a prop," The following error occurred:

Error in Asmethod (object):

Can not coerce list with transactions with duplicated items

####### #因此这里需要加一步: Delete the UID and pname duplicate records in Player3 (for later transactions conversion)

index<-duplicated (payer3[,c)) Payer6<-payer3[!index,]trans.list<-split (payer6[, ' pname '],payer6[,'uid']) head (trans.list)# This is equivalent to " The item goes back to the user to buy the path of str (trans.list)

#转换为apriori函数可以用的transactions形式

Arules<-as (trans.list,'transactions')
3. Execute the Apriori algorithm and remove the redundancy rule

##### #下面执行apriori算法 (This section is the same as the previous article, here is no longer detailed, can refer to the previous article)

Rules<-apriori (Arules,parameter = list (support=0.01,confidence=0.5)) inspect (rules)#  Can be sorted by lift sorted_lift<-sort (rules,by='lift') Inspect (sorted_lift)

#规则较多, redundant rules need to be removed: if Rules2 's LHS and RHS are contained in Rules1, and Rules2 lift is less than or equal to Rules1, then rules2 is the redundant rule of rules1.

subset.matrix<- is. Subset (Rules,rules)#generates a subset of all the rules, the rows and columns are each rules, where the values are true and false, when Rules2 is a subset of rules1, the value of Rules2 in Rules1 is trueSubset.matrix[lower.tri (subset.matrix,diag=t)]<-na#set the element below the diagonal of the matrix to null, leaving only the upper triangleRedundant<-colsums (subset.matrix,na.rm=t) >=1#R will treat true in the matrix as 1, counting the sum of each column (ignoring missing values), if the sum of the column is greater than or equal to 1, that is, that the column (rule) is a subset of other rules and should be deleted. which (redundant) rules.pruned<-rules[!redundant]#remove the redundant rulesInspect (rules.pruned)#write back to local#Write (rules.pruned, "Rules_pruned.txt", Col.names=na)
4. Visualization of association Rules

####### #关联规则的可视化 (This section is the same as the previous article, here is no longer detailed, can refer to the previous article)

Library"Arulesviz")#Scatter plot of association rulesPlot (rules)#Plot a scatter plot with direct plots.Plot (rules,interactive=true)#you can use Interactive=true to realize the interactive function of scatter graphs.plot (rules, method="grouped")#similar to the "bubble chart" presentationplot (rules.pruned, method="Graph")#associate rules are represented by arrows and circles, and vertices represent itemsets, and edges represent relationships in rules. 

(The code and data set required for this article can be downloaded here .)

[R Language] Association rule 2---Consider the strict timing relationship between items

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.