Apriori演算法的Python實現,apriori演算法python
Apriori演算法是資料採礦中頻發模式挖掘的鼻祖,從60年代就開始流行,其演算法思想也十分簡單樸素,首先挖掘出長度為1的頻繁模式,然後k=2
將這些頻繁模式合并組成長度為k的頻繁模式,算出它們的頻繁次數,而且要保證其所有k-1長度的子集也是頻繁的,值得注意的是,為了避免重複,合并的時候,只合并那些前k-2個字元都相同,而k-1的字元一邊是少於另一邊的。
以下是演算法的Python實現:
__author__ = 'linfuyuan'min_frequency = int(raw_input('please input min_frequency:'))file_name = raw_input('please input the transaction file:')transactions = []def has_infrequent_subset(candidate, Lk): for i in range(len(candidate)): subset = candidate[:-1] subset.sort() if not ''.join(subset) in Lk: return False lastitem = candidate.pop() candidate.insert(0, lastitem) return Truedef countFrequency(candidate, transactions): count = 0 for transaction in transactions: if transaction.issuperset(candidate): count += 1 return countwith open(file_name) as f: for line in f.readlines(): line = line.strip() tokens = line.split(',') if len(tokens) > 0: transaction = set(tokens) transactions.append(transaction)currentFrequencySet = {}for transaction in transactions: for item in transaction: time = currentFrequencySet.get(item, 0) currentFrequencySet[item] = time + 1Lk = set()for (itemset, count) in currentFrequencySet.items(): if count >= min_frequency: Lk.add(itemset)print ', '.join(Lk)while len(Lk) > 0: newLk = set() for itemset1 in Lk: for itemset2 in Lk: cancombine = True for i in range(len(itemset1)): if i < len(itemset1) - 1: cancombine = itemset1[i] == itemset2[i] if not cancombine: break else: cancombine = itemset1[i] < itemset2[i] if not cancombine: break if cancombine: newitemset = [] for char in itemset1: newitemset.append(char) newitemset.append(itemset2[-1]) if has_infrequent_subset(newitemset, Lk) and countFrequency(newitemset, transactions) >= min_frequency: newLk.add(''.join(newitemset)) print ', '.join(newLk) Lk = newLk
用C實現apriori基本演算法的代碼
Apriori演算法的實現,關鍵是建立其數學模型.以前我寫作業時,設計的資料結構如下:
#include<stdio.h>
#include<string.h>
#include<malloc.h>
#define ITEM_NAME_LENGTH 20
#define MIN_SUPPORT 2
//項集結構
struct ITEMSET
{
char itemName[ITEM_NAME_LENGTH];
struct ITEMSET *next;
};
//資料庫結構
struct TRANSACTION
{
unsigned int tranID;
struct ITEMSET *itemPoint;
struct TRANSACTION *next;
};
//大項目集結構
struct BIGITEMSET
{
struct ITEMSET *itemPoint;
unsigned int count;
struct BIGITEMSET *next;
};
//以下是資料庫
char *tran1[3]={"1","3","4"};
char *tran2[3]={"2","3","5"};
char *tran3[4]={"1","2","3","5"};
char *tran4[2]={"2","5"};
//以下是變數聲明
struct TRANSACTION *tranHead;
struct BIGITEMSET *bigHead;
struct BIGITEMSET *test;
struct BIGITEMSET *subSetHeadC1,*subSetHeadC2;
當真正理解該演算法後,再寫程式並不難.
apriori演算法用什程式實現
你說的是什麼語言吧,這樣問也不對,既然是演算法,那麼用什麼語言都能實現。