Implementation of Association rule R language

Source: Internet
Author: User

Original address: http://blog.csdn.net/wolfbloodbj/article/details/8836441


The mining task for

Association analysis can be decomposed into two steps: one is to find frequent itemsets, and the other is to generate rules from frequent itemsets.
############################ Association Analysis case Practice  ############################ 
Background assumption: In a movie store, a customer is in a shopping ( can also buy a lot of different kinds of brands of movies in different time periods. We want to find useful information to improve the sales of the store. The
question raised: 1. What is the preference for individual customers? That is, the purchase of a goods, may buy that potential goods (film) 2, in the customer, there is no obvious user group segmentation method?
Use data: Rattle package, dvdtrans.csv file in CSV directory
Data Description: The original data contains only two fields (ID, item) User ID, product name.

##### Code start #####  # Load Pack library (arules)
# Load Data dvdtrans <-read.csv (system.file ("CSV", " Dvdtrans.csv ", package=" Rattle "))  # function system.file () See Preliminary knowledge
# Convert data to Arules Association rules method Apriori data forms that can be processed. Transaction data <-as (Split (Dvdtrans$item, Dvdtrans$id), "Transactions")
# Look at data attributes

# Generate association rules using Apriori functions R Ules <-Apriori (data, paramter= list (support=0.6, conf=0.8))
# Extract rules using inspect function inspect (rules)

##### Code End #####  The above example just gives a feeling. Go on...
#################### Nutshell

##################################################################

Usage data: Titanic

# Look for data

STR (Titanic)

# Transform table into data frame

DF <-As.data.frame (Titanic)


Head (DF)

> Head (DF)

Class Sex Agesurvivedfreq

1 1st malechild No 0

2 2nd malechild No 0

3 3rd Malechild No 35

4 Crew malechild No 0


Titanic.raw <-NULL

# If the frequency field is greater than 0, append the row record to the variable by column, freq=0, of course, without appending

for (Iin1:4) {

Titanic.raw <-Cbind (Titanic.raw, Rep (As.character (Df[,i]), df$freq))

}

# The first 35 lines are the same

]]]]> titanic.raw[1:36,]

[, 1]    [, 2]    [, 3] [, 4]

[1,] "3rd" "Male" "Child" "No"

[2,] "3rd" "Male" "Child" "No"

[3,] "3rd" "Male" "Child" "No"

[4,] "3rd" "Male" "Child" "No"

...

[+] "3rd" "Male" "Child" "No"

[3rd] "Female" "Child" "No"


# Transform to Data frame

Titanic.raw <-as.data.frame (Titanic.raw)


> Head (TITANIC.RAW)

V1 V2 v3v4

1 3rd Malechildno

2 3rd Malechildno

3 3rd Malechildno

4 3rd Malechildno

5 3rd Malechildno

6 3rd Malechildno

# Add property name after generating data frame

Names (Titanic.raw) <-names (DF) [1:4];d im (TITANIC.RAW);


Summary (Titanic.raw)

# after conversion: Each row represents a person and can be used for association rules. What type of data is before conversion? (Data on the number of people living according to class, sex, age)



With the function, the default settings Are:1) supp=0.1, which are the minimum support for rules;2) conf=0.8, which is the M Inimum confidence of rules; and 3) maxlen=10, which is the maximum length of the rules.

Library (Arules)

Rules <-Apriori (titanic.raw) # Apriori can pass directly to objects of non-transactions type, internal automatic conversion


Rules # According to the minimum (supp=0.1,conf=0.8), the maximum number of rule returned is 10


Summary (rules);

Inspect (rules);

Quality (rules) <-quality (rules) inspect (rules)

Translation: Association rules Mining A common phenomenon is that many of the resulting rules are not interesting. Given that we only care about the right piece of the rule (RHS) indicates whether it is alive, we set the Rhs=c ("Survived=no", "Survived=yes") in the parameter appearance and determine that only these two cases appear in the right part of the rule (RHS). Other itemsets can appear on the left side of the rule (LHS), using the default= "LHS" setting.
The above results can also be seen, the first rule of the LHS is an empty set, in order to exclude such a rule, you can use minlen=2. Furthermore, the process of processing the algorithm is compressed (simplified) by verbose=f settings. After the association rule mining is complete, the rule will be sorted by the lift-to-small sort method

Rules.better <-Apriori (Titanic.raw,

Parameter=list (minlen= 2,supp =0.005,conf =0.8),

Appearance= List (rhs=c ("Survived=no", "Survived=yes"), default= "LHS"),

Control= List (verbose=f)

)

# Base on lift sorted

rules.sorted <-Sort (rules.better, by= "lift")


Inspect (rules.sorted)

> Inspect (rules.sorted)

LHS RHS supportconfidence Lift

1 {class=2nd,

Age=child} = {Survived=yes} 0.010904134 1.00000003.095640

2 {class=2nd,

Sex=female, &

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.