Naive Bayes Algorithm with Python
Concepts of Naive Bayes
Naive Bayes is a classification algorithm based on Bayes'
theorem, which states that the probability of a hypothesis is updated by
considering new evidence. Since it presumes that all features are independent
of one another, which may not always be the case in real-world datasets, it is
known as a "naive". Despite this limitation, Naive Bayes is widely
used in text classification, spam filtering, and sentiment analysis.
Naive Bayes Algorithm
- Define the problem and collect data.
- Choose a hypothesis class (e.g., Naive Bayes).
- Compute the prior probability and likelihood of each class based on the training data.
- Use Bayes' theorem to compute the posterior probability of each class given the input features.
- Classify the input by choosing the class with the highest posterior probability.
- Evaluate the model on a test dataset to estimate its performance.
Python code
# Import libraries
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Load the dataset
data = pd.read_csv('iris.CSV)
# Create X and y arrays
X = data[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']].values
y = data['Species'].values
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# Create the Naive Bayes model
model = GaussianNB()
# Fit the model to the training data
model.fit(X_train, y_train)
# Predict the test set results
y_pred = model.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
In this example, we first load the dataset from a CSV file that contains four columns: "SepalLengthCm", "SepalWidthCm", "PetalLengthCm", and "PetalWidthCm", and a label column "Species". We create the X and y arrays by selecting the four columns for X, and the "Species" column for y.
We split the data into training and testing sets using the
train_test_split() function. We create an instance of the GaussianNB class,
which implements Naive Bayes with a Gaussian distribution, and fit the model to
the training data using the fit() method.
We then use the predict() method to predict the test set results and evaluate the model's performance using the accuracy score.
Naive Bayes Benefits, advantages and Disadvantages
Benefits and advantages of Naive Bayes:
Simple and easy to implement:
Naive Bayes is a simple
algorithm that is easy to implement and understand, making it a good choice for
beginners.
Fast and efficient:
Naive Bayes is fast and efficient, making it well-suited for large datasets and real-time applications.
Good performance on small datasets:
Naive Bayes can perform well on small datasets, even with limited training data.
Can handle high-dimensional data:
Naive Bayes can handle high-dimensional data, making it useful for text classification and spam filtering.
Disadvantages of Naive Bayes:
Assumes feature independence:
Naive Bayes assumes that all
features are independent of each other, which may not be true in real-world
datasets.
Limited expressiveness:
Naive Bayes has limited expressiveness and may not capture complex relationships between features and labels.
Requires well-prepared data:
Naive Bayes assumes that the data is well-prepared and preprocessed, with no missing values or outliers.
Main Contents (TOPICS of Machine Learning Algorithms)
CONTINUE TO (K-Nearest Neighbors algorithm)
Comments
Post a Comment