--apriori Algorithm of Python Association analysis

Source: Internet
Author: User
using Apriori algorithm to correlate analysis Apriori principle

If a set of items is frequent, then all subsets of it are also frequent. That is, if {0,1} is frequent, {0},{1} is also frequent.

This principle is intuitively not helpful, but if you look at it in turn, it works.

If an item set is not frequent, then all of its superset are also infrequent. That is, if {0} is also infrequent, any superset of {0}, such as {0,1}, is also infrequent. Important Definition

# Test Item Set:
    [[' Soy milk ', ' lettuce '],
     [' lettuce ', ' diaper ', ' wine ', ' beet '],
     [' Soy milk ', ' diaper ', ' wine ', ' orange juice '],
     [' lettuce ', ' soy milk ', ' diaper ', ' wine '],
     [' lettuce ', ' soy milk ', ' diaper ', ' orange juice ']]
Support degree (support)

The degree of support for an item set, which is defined as the proportion of records that contain the set of items in the dataset, for example, the test data above, {Soy milk} support is 4/5, and in 5 records, 3 contains {soy milk, diapers}, so {soy milk, diapers} support is 3/5. Support is for the set of items, so you can define a minimum degree of support and keep only the set of items that meet the minimum support. Confidence or confidence level (confidence)

The confidence level is defined for an association rule such as {diaper}→{wine}. The credibility of this rule is defined as "support ({diaper, wine})/support ({diaper})". In the example above, because the support degree of {diaper, wine} is 3/5,{diaper} is 4/5, the trustworthiness of "diaper → wine" is 3/4=0.75. Python Implementation

Install the corresponding library

Open the Command Line window and enter

Pip Install Apyori

If you fail, you can use a different method.

Test installation:

From Apyori import Apriori

The import was successful, the installation succeeded or failed.

Api

From Apyori import apriori

data = [[' Soy milk ', ' lettuce '],
        [' lettuce ', ' diaper ', ' wine ', ' beet '],
        [' Soy milk ', ' diaper ', ' wine ', ' orange juice '],
        [' Lettuce ', ' soy milk ', ' diaper ', ' wine '],
        [' lettuce ', ' soy milk ', ' diaper ', ' orange juice ']] result

= List (Apriori (transactions=data)

# Apriori Other parameters description :
Min_support-The minimum support of relations (float). Minimum support, which can be used to filter item sets
Min_confidence--The minimum confidence O F Relations (float). Minimum trustworthiness, which can be used to filter the item set Min_lift-The
minimum lift of relations (float). Unknown
Max_length--the maximum Length of the relation (integer). Unknown

The result shown above is a list, and the following is an introduction to the attributes of the elements I can understand, because no related documents are found.

the attribute of each item set in result describes the items– item set, the Frozenset object, which can be iterated out of a subset. support– support degree, float type. confidence– confidence or confidence level, float type. Association rules for the existence of ordered_statistics–
Can iterate, iterate over the attributes of its elements:
items_base– the set of denominator items in an association rule confidence– the credibility end of the association rule corresponding to the denominator rule above

Well, the simple notes are written here. Although the corresponding document for this library is not found for the time being, you can preview the properties of each item set through Pycharm. Although the association rules can not be directly sorted, but do not need to write their own algorithm to calculate the support, than write the calculation of their own process much better. Attention

Apriori algorithm does not apply to the number of non-duplicates set elements more cases, if a store sales of goods are N, the number of all subsets of the 2^n-1, its computational volume, it is conceivable that the proposed analysis of the types of goods <10 species.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.