Realization of incremental Apriori algorithm using coprocessor of HBase

Last Update:2015-06-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Apriori is a classical frequent itemsets mining algorithm in data mining, the main idea is that if an item set is not frequent, then any itemsets containing this set of itemsets must be infrequent. The incremental Apriori algorithm to be implemented today is a bit like distributed apriori, because we can think of the mining transaction set and the new transaction set as two separate datasets, mining the new transaction set, getting all the new frequent sets, and then making a set with the existing frequent sets, For both sides of the same frequent itemsets must be frequent, and only one side of the frequent itemsets need to be counted on both sides of the frequent count, so that the completion of all the global frequent sets, do not need to re-mining the existing transaction set, the efficiency will inevitably improve.
As for HBase's coprocessor, I believe you know it, this is an open source implementation of Percolator based on Google's bigtable, with the goal of providing incremental operations and establishing a two-level index. HBase provides a database-like stored procedure that provides two types of coprocessor,endpoint and Observer,endpoint, requires a program to be deployed to each regionserver in advance, and then called by the client. and summarizes the data returned after each regionserver processing. Observer is like a trigger in the database, just deploy to Regionserver, which provides Preget, Postget, Preput, Postput, Predelete, Postdelete, etc. So when each regionserver occurs, the observer is triggered.
Today we only use the endpoint type of coprocessor, by each Regionserver statistics its transaction set all the frequent itemsets, and then the client summarizes each region's frequent itemsets, do a set, for the count has reached the minimum support requirements of the itemsets identified as a global frequent , the remaining itemsets continue to count their frequent counts in all the region and eventually get all global frequent itemsets. The second step is to insert the transaction set incrementally, mark it with timestamp, and then get all the globally frequent itemsets again in the first way.
It is necessary to mention that hbase starting from the 0.98 version, Coprocessor's remote communication adopted the PROTOBUF standard, protobuf need to implement the definition of communication format, the following is the algorithm required proto

Package Apriori;optionJava_package ="Dave.apriori.protos";optionJava_outer_classname ="Aprioriprotos";optionJava_generic_services =true;optionJava_generate_equals_and_hash =true;optionOptimize_for = speed;messageApriorirequest {required Int32 length =1; Required FLOAT support =2;}messageAprioriresponse {messageFrequentset {Required bytes Fset =3; Required Int32 support =4; } Required Int32 count =5; Repeated Frequentset fsets =6;}messageSpecialrequest {repeated bytes fsets =7;}messageSpecialresponse {repeated Int32 supportcount =8;}messagehellorequest{Required bytes Hellostr =9;}messagehelloresponse{Required bytes Helloresp =Ten;}  Service Apriori {RPC Getfrequentset (apriorirequest) returns (Aprioriresponse);  RPC Getsepecialsupport (specialrequest) returns (Specialresponse); RPC SayHello (hellorequest) returns (Helloresponse);}

Defines three service, one is to get all the frequent itemsets of the region, the other is to get the count of an item set in that region, and finally the test SayHello.
After you have defined it, use protoc–java_out=. The Apriori.proto command allows you to generate the appropriate Java files in the current directory, and then import them into the project to write the server and client.
The deployment process and source code have been uploaded, and a friend in need can download it at http://download.csdn.net/detail/xanxus46/8801857

Realization of incremental Apriori algorithm using coprocessor of HBase

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Realization of incremental Apriori algorithm using coprocessor of HBase

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Realization of incremental Apriori algorithm using coprocessor of HBase

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support