Apriori演算法的Python實現，apriori演算法python

最後更新：2014-11-09 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

Apriori演算法是資料採礦中頻發模式挖掘的鼻祖，從60年代就開始流行，其演算法思想也十分簡單樸素，首先挖掘出長度為1的頻繁模式，然後k=2

將這些頻繁模式合并組成長度為k的頻繁模式，算出它們的頻繁次數，而且要保證其所有k-1長度的子集也是頻繁的，值得注意的是，為了避免重複，合并的時候，只合并那些前k-2個字元都相同，而k-1的字元一邊是少於另一邊的。

以下是演算法的Python實現：

__author__ = 'linfuyuan'min_frequency = int(raw_input('please input min_frequency:'))file_name = raw_input('please input the transaction file:')transactions = []def has_infrequent_subset(candidate, Lk):    for i in range(len(candidate)):        subset = candidate[:-1]        subset.sort()        if not ''.join(subset) in Lk:            return False        lastitem = candidate.pop()        candidate.insert(0, lastitem)    return Truedef countFrequency(candidate, transactions):    count = 0    for transaction in transactions:        if transaction.issuperset(candidate):            count += 1    return countwith open(file_name) as f:    for line in f.readlines():        line = line.strip()        tokens = line.split(',')        if len(tokens) > 0:            transaction = set(tokens)            transactions.append(transaction)currentFrequencySet = {}for transaction in transactions:    for item in transaction:        time = currentFrequencySet.get(item, 0)        currentFrequencySet[item] = time + 1Lk = set()for (itemset, count) in currentFrequencySet.items():    if count >= min_frequency:        Lk.add(itemset)print ', '.join(Lk)while len(Lk) > 0:    newLk = set()    for itemset1 in Lk:        for itemset2 in Lk:            cancombine = True            for i in range(len(itemset1)):                if i < len(itemset1) - 1:                    cancombine = itemset1[i] == itemset2[i]                    if not cancombine:                        break                else:                    cancombine = itemset1[i] < itemset2[i]                    if not cancombine:                        break            if cancombine:                newitemset = []                for char in itemset1:                    newitemset.append(char)                newitemset.append(itemset2[-1])                if has_infrequent_subset(newitemset, Lk) and countFrequency(newitemset, transactions) >= min_frequency:                    newLk.add(''.join(newitemset))    print ', '.join(newLk)    Lk = newLk

用C實現apriori基本演算法的代碼

Apriori演算法的實現,關鍵是建立其數學模型.以前我寫作業時,設計的資料結構如下:
#include<stdio.h>
#include<string.h>
#include<malloc.h>
#define ITEM_NAME_LENGTH 20
#define MIN_SUPPORT 2
//項集結構
struct ITEMSET
{
char itemName[ITEM_NAME_LENGTH];
struct ITEMSET *next;
};
//資料庫結構
struct TRANSACTION
{
unsigned int tranID;
struct ITEMSET *itemPoint;
struct TRANSACTION *next;
};
//大項目集結構
struct BIGITEMSET
{
struct ITEMSET *itemPoint;
unsigned int count;
struct BIGITEMSET *next;
};
//以下是資料庫
char *tran1[3]={"1","3","4"};
char *tran2[3]={"2","3","5"};
char *tran3[4]={"1","2","3","5"};
char *tran4[2]={"2","5"};
//以下是變數聲明
struct TRANSACTION *tranHead;
struct BIGITEMSET *bigHead;
struct BIGITEMSET *test;
struct BIGITEMSET *subSetHeadC1,*subSetHeadC2;
當真正理解該演算法後,再寫程式並不難.

apriori演算法用什程式實現

你說的是什麼語言吧，這樣問也不對，既然是演算法，那麼用什麼語言都能實現。

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More