using Apriori algorithm to correlate analysis
Apriori principle
If a set of items is frequent, then all subsets of it are also frequent. That is, if {0,1} is frequent, {0},{1} is also frequent.
This principle is intuitively not helpful, but if you look at it in turn, it works.
If an item set is not frequent, then all of its superset are also infrequent. That is, if {0} is also infrequent, any superset of {0}, such as {0,1}, is also infrequent. Important Definition
# Test Item Set:
[[' Soy milk ', ' lettuce '],
[' lettuce ', ' diaper ', ' wine ', ' beet '],
[' Soy milk ', ' diaper ', ' wine ', ' orange juice '],
[' lettuce ', ' soy milk ', ' diaper ', ' wine '],
[' lettuce ', ' soy milk ', ' diaper ', ' orange juice ']]
Support degree (support)
The degree of support for an item set, which is defined as the proportion of records that contain the set of items in the dataset, for example, the test data above, {Soy milk} support is 4/5, and in 5 records, 3 contains {soy milk, diapers}, so {soy milk, diapers} support is 3/5. Support is for the set of items, so you can define a minimum degree of support and keep only the set of items that meet the minimum support. Confidence or confidence level (confidence)
The confidence level is defined for an association rule such as {diaper}→{wine}. The credibility of this rule is defined as "support ({diaper, wine})/support ({diaper})". In the example above, because the support degree of {diaper, wine} is 3/5,{diaper} is 4/5, the trustworthiness of "diaper → wine" is 3/4=0.75. Python Implementation
Install the corresponding library
Open the Command Line window and enter
Pip Install Apyori
If you fail, you can use a different method.
Test installation:
From Apyori import Apriori
The import was successful, the installation succeeded or failed.
Api
From Apyori import apriori
data = [[' Soy milk ', ' lettuce '],
[' lettuce ', ' diaper ', ' wine ', ' beet '],
[' Soy milk ', ' diaper ', ' wine ', ' orange juice '],
[' Lettuce ', ' soy milk ', ' diaper ', ' wine '],
[' lettuce ', ' soy milk ', ' diaper ', ' orange juice ']] result
= List (Apriori (transactions=data)
# Apriori Other parameters description :
Min_support-The minimum support of relations (float). Minimum support, which can be used to filter item sets
Min_confidence--The minimum confidence O F Relations (float). Minimum trustworthiness, which can be used to filter the item set Min_lift-The
minimum lift of relations (float). Unknown
Max_length--the maximum Length of the relation (integer). Unknown
The result shown above is a list, and the following is an introduction to the attributes of the elements I can understand, because no related documents are found.
the attribute of each item set in result describes the items– item set, the Frozenset object, which can be iterated out of a subset. support– support degree, float type. confidence– confidence or confidence level, float type. Association rules for the existence of ordered_statistics–
Can iterate, iterate over the attributes of its elements:
items_base– the set of denominator items in an association rule confidence– the credibility end of the association rule corresponding to the denominator rule above
Well, the simple notes are written here. Although the corresponding document for this library is not found for the time being, you can preview the properties of each item set through Pycharm. Although the association rules can not be directly sorted, but do not need to write their own algorithm to calculate the support, than write the calculation of their own process much better. Attention
Apriori algorithm does not apply to the number of non-duplicates set elements more cases, if a store sales of goods are N, the number of all subsets of the 2^n-1, its computational volume, it is conceivable that the proposed analysis of the types of goods <10 species.