Original address: http://blog.csdn.net/wolfbloodbj/article/details/8836441
The mining task for
Association analysis can be decomposed into two steps: one is to find frequent itemsets, and the other is to generate rules from frequent itemsets.
############################ Association Analysis case Practice ############################
Background assumption: In a movie store, a customer is in a shopping ( can also buy a lot of different kinds of brands of movies in different time periods. We want to find useful information to improve the sales of the store. The
question raised: 1. What is the preference for individual customers? That is, the purchase of a goods, may buy that potential goods (film) 2, in the customer, there is no obvious user group segmentation method?
Use data: Rattle package, dvdtrans.csv file in CSV directory
Data Description: The original data contains only two fields (ID, item) User ID, product name.
##### Code start ##### # Load Pack library (arules)
# Load Data dvdtrans <-read.csv (system.file ("CSV", " Dvdtrans.csv ", package=" Rattle ")) # function system.file () See Preliminary knowledge
# Convert data to Arules Association rules method Apriori data forms that can be processed. Transaction data <-as (Split (Dvdtrans$item, Dvdtrans$id), "Transactions")
# Look at data attributes
# Generate association rules using Apriori functions R Ules <-Apriori (data, paramter= list (support=0.6, conf=0.8))
# Extract rules using inspect function inspect (rules)
##### Code End ##### The above example just gives a feeling. Go on...
#################### Nutshell
##################################################################
Usage data: Titanic
# Look for data
STR (Titanic)
# Transform table into data frame
DF <-As.data.frame (Titanic)
Head (DF)
> Head (DF)
Class Sex Agesurvivedfreq
1 1st malechild No 0
2 2nd malechild No 0
3 3rd Malechild No 35
4 Crew malechild No 0
Titanic.raw <-NULL
# If the frequency field is greater than 0, append the row record to the variable by column, freq=0, of course, without appending
for (Iin1:4) {
Titanic.raw <-Cbind (Titanic.raw, Rep (As.character (Df[,i]), df$freq))
}
# The first 35 lines are the same
]]]]> titanic.raw[1:36,]
[, 1] [, 2] [, 3] [, 4]
[1,] "3rd" "Male" "Child" "No"
[2,] "3rd" "Male" "Child" "No"
[3,] "3rd" "Male" "Child" "No"
[4,] "3rd" "Male" "Child" "No"
...
[+] "3rd" "Male" "Child" "No"
[3rd] "Female" "Child" "No"
# Transform to Data frame
Titanic.raw <-as.data.frame (Titanic.raw)
> Head (TITANIC.RAW)
V1 V2 v3v4
1 3rd Malechildno
2 3rd Malechildno
3 3rd Malechildno
4 3rd Malechildno
5 3rd Malechildno
6 3rd Malechildno
# Add property name after generating data frame
Names (Titanic.raw) <-names (DF) [1:4];d im (TITANIC.RAW);
Summary (Titanic.raw)
# after conversion: Each row represents a person and can be used for association rules. What type of data is before conversion? (Data on the number of people living according to class, sex, age)
With the function, the default settings Are:1) supp=0.1, which are the minimum support for rules;2) conf=0.8, which is the M Inimum confidence of rules; and 3) maxlen=10, which is the maximum length of the rules.
Library (Arules)
Rules <-Apriori (titanic.raw) # Apriori can pass directly to objects of non-transactions type, internal automatic conversion
Rules # According to the minimum (supp=0.1,conf=0.8), the maximum number of rule returned is 10
Summary (rules);
Inspect (rules);
Quality (rules) <-quality (rules) inspect (rules)
Translation: Association rules Mining A common phenomenon is that many of the resulting rules are not interesting. Given that we only care about the right piece of the rule (RHS) indicates whether it is alive, we set the Rhs=c ("Survived=no", "Survived=yes") in the parameter appearance and determine that only these two cases appear in the right part of the rule (RHS). Other itemsets can appear on the left side of the rule (LHS), using the default= "LHS" setting.
The above results can also be seen, the first rule of the LHS is an empty set, in order to exclude such a rule, you can use minlen=2. Furthermore, the process of processing the algorithm is compressed (simplified) by verbose=f settings. After the association rule mining is complete, the rule will be sorted by the lift-to-small sort method
Rules.better <-Apriori (Titanic.raw,
Parameter=list (minlen= 2,supp =0.005,conf =0.8),
Appearance= List (rhs=c ("Survived=no", "Survived=yes"), default= "LHS"),
Control= List (verbose=f)
)
# Base on lift sorted
rules.sorted <-Sort (rules.better, by= "lift")
Inspect (rules.sorted)
> Inspect (rules.sorted)
LHS RHS supportconfidence Lift
1 {class=2nd,
Age=child} = {Survived=yes} 0.010904134 1.00000003.095640
2 {class=2nd,
Sex=female, &