Machine Learning Association Rule Mining Algorithms
Concepts of Association Rule Mining
Association Rule Mining is a technique of data mining used for finding co-occurrence relationships and patterns in large datasets. It is employed to glean intriguing connections between variables in sizable databases. The relationships discovered in Association Rule Mining are represented in the form of rules, where the antecedent and consequent are a set of items.
There are several algorithms used in Association Rule Mining, such as Apriori, FP-Growth, ECLAT, and more. Among these algorithms, Apriori is the most widely used algorithm for Association Rule Mining.
The item sets that do not match the minimal support criterion are pruned by the Apriori algorithm after it generates a candidate set of item sets. The support threshold is a user-defined value that determines the minimum frequency of an item set to be considered as frequent.
Association Rule Mining Algorithm
- Define the problem and collect data.
- Set a minimum support threshold and a minimum confidence threshold.
- Identify all frequent itemsets that meet the minimum support threshold.
- Generate association rules for each frequent itemset that meets the minimum confidence threshold.
- Evaluate the model on a test dataset to estimate its performance.
Example:
Consider a retail store that wants to analyse the buying
patterns of its customers. The store has a transaction dataset containing the
items bought by the customers. The dataset contains the following transactions:
Transaction 1: {Bread, Milk, Cheese}
Transaction 2: {Bread, Milk}
Transaction 3: {Milk, Eggs}
Transaction 4: {Bread, Eggs}
Transaction 5: {Bread, Milk, Eggs, Cheese}
Using the Apriori algorithm, we can find the frequent itemsets and generate association rules from the dataset. Let us assume a minimum support threshold of 40%.
Step 1: Find the frequent 1-itemsets
The frequent 1-itemsets are:
{Bread} (4)
{Milk} (4)
{Cheese} (2)
{Eggs} (2)
The number in the parentheses represents the frequency of the itemset.
Step 2: Find the frequent 2-itemsets
The frequent 2-itemsets are:
{Bread, Milk} (3)
{Bread, Cheese} (1)
{Milk, Cheese} (1)
{Milk, Eggs} (2)
{Bread, Eggs} (1)
{Milk, Bread} (3)
Step 3: Find the frequent 3-itemsets
There is only one frequent 3-itemset:
{Bread, Milk, Eggs} (1)
Step 4: Generate association rules
Using the frequent itemsets, we can generate association rules. Let us assume a minimum confidence threshold of 50%.
The association rules are:
{Bread} -> {Milk} (3/4 = 75%)
{Milk} -> {Bread} (3/4 = 75%)
{Bread} -> {Eggs} (1/4 = 25%)
{Eggs} -> {Bread} (1/2 = 50%)
{Milk} -> {Eggs} (2/4 = 50%)
{Eggs} -> {Milk} (2/2 = 100%)
{Bread, Milk} -> {Eggs} (1/3 = 33.3%)
{Bread, Eggs} -> {Milk} (1/1 = 100%)
{Milk, Eggs} -> {Bread} (1/2 = 50%)
Python implementation of the Apriori algorithm for Association Rule Mining:
# Importing required libraries
import pandas as pd
from extend.frequent_patterns import apriori
from extend.frequent_patterns import association_rules
# Reading the dataset
data = pd.read_csv('dataset.CSV)
# Encoding categorical variables
data_encoded = pd.get_dummies(data)
# Applying the Apriori algorithm with minimum support of 0.01
frequent_itemsets = apriori(data_encoded, min_support=0.01, use_colnames=True)
# Generating association rules with a minimum lift of 1.5
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.5)
# Displaying the rules
print(rules)
Benefits and Advantages of Association Rule Mining:
- With huge datasets, association rule mining is a potent approach for identifying intriguing correlations between variables.
- It applies to many different industries, including market basket analysis, consumer segmentation, and fraud detection.
- Association Rule Mining can help businesses identify cross-selling opportunities and make better decisions based on customer behaviour.
- It can also be used for exploratory data analysis to discover patterns and relationships that may not be apparent from simple descriptive statistics.
Disadvantages of Association Rule Mining:
- Association Rule Mining can be computationally intensive and time-consuming, especially for large datasets.
- The results of Association Rule Mining can be difficult to interpret and may require domain expertise to understand.
- The quality of the results depends heavily on the quality and completeness of the input data.
- Association Rule Mining can produce numerous spurious or irrelevant results, which may need to be filtered out manually.
Main Contents (TOPICS of Machine Learning Algorithms)
CONTINUE TO (Bayesian networks)
Comments
Post a Comment