FP-Growth Algorithm
10 Jan 2016Intro
The FP-Growth Algorithm is an alternative way to find frequent itemsets without using candidate generations, thus improving performance. For so much it uses a divide-and-conquer strategy. The core of this method is the usage of a special data structure named frequent-pattern tree (FP-tree), which retains the itemset association information.
In simple words, this algorithm works as follows: first it compresses the input database creating an FP-tree instance to represent frequent items. After this first step it divides the compressed database into a set of conditional databases, each one associated with one frequent pattern. Finally, each such database is mined separately. Using this strategy, the FP-Growth reduces the search costs looking for short patterns recursively and then concatenating them in the long frequent patterns, offering good selectivity.
FP-Tree structure
The frequent-pattern tree (FP-tree) is a compact structure that stores quantitative information about frequent patterns in a database [4].
Han defines the FP-tree as the tree structure defined below:
-
One root labeled as “null” with a set of item-prefix subtrees as children, and a frequent-item-header table (presented in the left side of Figure 1);
-
Each node in the item-prefix subtree consists of three fields:
-
Item-name: registers which item is represented by the node;
-
Count: the number of transactions represented by the portion of the path reaching the node;
-
Node-link: links to the next node in the FP-tree carrying the same item-name, or null if there is none.
-
-
Each entry in the frequent-item-header table consists of two fields:
-
Item-name: as the same to the node;
-
Head of node-link: a pointer to the first node in the FP-tree carrying the item-name.
-
Additionally the frequent-item-header table can have the count support for an item.
-
FP-tree construction
-
Input: A transaction database DB and a minimum support threshold.
-
Output: FP-tree, the frequent-pattern tree of DB.
-
Method: The FP-tree is constructed as follows.
-
Scan the transaction database DB once. Collect F, the set of frequent items, and the support of each frequent item. Sort F in support-descending order as FList, the list of frequent items.
-
Create the root of an FP-tree, T, and label it as “null”. For each transaction Trans in DB do the following:
-
Select the frequent items in Trans and sort them according to the order of FList. Let the sorted frequent-item list in Trans be [p | P], where p is the first element and P is the remaining list. Call insert tree([ p | P], T ).
-
The function insert tree([ p | P], T ) is performed as follows. If T has a child N such that N.item-name = p.item-name, then increment N ’s count by 1; else create a new node N , with its count initialized to 1, its parent link linked to T , and its node-link linked to the nodes with the same item-name via the node-link structure. If P is nonempty, call insert tree(P, N ) recursively.
-
-
FP-Growth Algorithm
After constructing the FP-Tree it’s possible to mine it to find the complete set of frequent patterns. To accomplish this job, Han presents a group of lemmas and properties, and thereafter describes the FP-Growth Algorithm as presented below.
-
Input: A database DB, represented by FP-tree constructed according to Algorithm 1, and a minimum support threshold ?.
-
Output: The complete set of frequent patterns.
-
Method: call FP-growth(FP-tree, null).
-
Procedure FP-growth(Tree, a) {
(01) if Tree contains a single prefix path then // Mining single prefix-path FP-tree {
(02) let P be the single prefix-path part of Tree;
(03) let Q be the multipath part with the top branching node replaced by a null root;
(04) for each combination (denoted as ß) of the nodes in the path P do
(05) generate pattern ß ∪ a with support = minimum support of nodes in ß;
(06) let freq pattern set(P) be the set of patterns so generated;
}
(07) else let Q be Tree;
(08) for each item ai in Q do { // Mining multipath FP-tree
(09) generate pattern ß = ai ∪ a with support = ai .support;
(10) construct ß’s conditional pattern-base and then ß’s conditional FP-tree Tree ß;
(11) if Tree ß ≠ Ø then
(12) call FP-growth(Tree ß , ß);
(13) let freq pattern set(Q) be the set of patterns so generated;
}
(14) return(freq pattern set(P) ∪ freq pattern set(Q) ∪ (freq pattern set(P) × freq pattern set(Q)))
}
Code
https://github.com/Flowerowl/FP-Growth
Ref
https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Frequent_Pattern_Mining/The_FP-Growth_Algorithm#FP-Tree_structure