Key Quantities
Let
- Support defines the evidence of how frequent a set of items appears in the itemset. It is the probability
. - Thresholding the support lets us define which items appear frequently enough that rules about them are interesting.
- Support is a monotonically non-increasing value. Intuitively, larger item sets of a specific configuration will be less likely to appear than its subsets.
- Confidence defines how likely a rule is true based on how many times it is found to be true within the item set. It is the probability
- Thresholding the confidence lets us define which rules are interesting enough because the rule holds often.
- Lift is used to compare the expected confidence with the actual confidence. It is defined as the quantity
- A lift
implies no rule can be drawn since and are independent. - A lift
implies a dependency between and . - A lift
implies can substitute with or vice versa.
- A lift
- Conviction determines the ratio of the expected frequency that a rule makes an incorrect prediction if
and were independent, divided by the actual observed incorrect predictions. It is calculated as
Apriori Algorithm
- Identify frequent individual items and extend them to larger and larger item sets as long as they meet a support threshold.
- It relies on prior knowledge of frequent itemset properties.
ECLAT
- Equivalence Class Transformation.
- Rather than explore in a BFS manner as in the Apriori algorithm, it explores in a DFS manner, checking larger itemsets first and then using backtracking.
- Since the support is monotonically non-increasing, it helps save on checking the support of some subsets.
FP-Growth
- First, it counts the support of individual items in the itemset. Items that do not meet the minimum threshold are discarded.
- Then, it builds a trie out of incoming transactions. Each item in the trie is sorted by descending order of frequency in the dataset.
- Unlike, apriori, this has the advantage of finding all frequent item sets without needing to compare to the original itemset.. All nodes that remain on the tree are, themselves, frequent item sets.