Chapter 2: Naive Bayes Classification (Supervised Machine Learning Algorithm)

Sabita Rajbanshi
Machine Learning Community
4 min readAug 28, 2021

--

The Naive Bayes is a classification algorithm used for binary and multi-class classification problems. Naive Bayes is based on a probability-based technique called Bayes Theorem. In this chapter, we are going to understand the mathematics behind the Bayes theorem and the working mechanism of the Naive Bayes classifier.

Table of Contents

  1. The idea of Conditional Probability
  2. Bayes’ Theorem
  3. Naive Bayes Classifier
  4. Pros and Cons
  5. Real-World Applications of Naive Bayes:

1. The idea of Conditional Probability:

Conditional probability: In simple terms, conditional probability is the likelihood of an event occurring based on the occurrence of previous events. In P(A|B), the probability of occurring an event A given that B has already occurred. Two important definitions in conditional probability are:

i) Independent Events: Two events A and B are said to be independent if the probability of A, given B equals the probability of A. ie; P(A|B) = P(A).

Given, P(D1=6|D2=3) = P(D=6) probability of dice 1 getting 6 has nothing to do with the probability of dice 2 getting 3. In such a case, these two events are said to be independent.

ii) Mutually Exclusive Events: Two events A and B are said to be mutually exclusive if out of these two events only one can occur and are independent of each other.

Given, P(A=finished assignments | B = not finished assignments) = 0, here both these events A and B couldn’t occur together.

2. Bayes’ Theorem:

Bayes Theorem finds the probability of an event occurring based on the probability of another event that has already occurred.

Mathematically,

From the above figure, we can define terms as:

  • P(A|B) is the probability of A being true given that B is true. It is called the posterior
  • P(B|A) is the probability of B being true given that B is true. It is called the likelihood.
  • P(A) is the probability of A being true. It is called the prior.
  • P(B) is the probability of B being true. It is called the evidence.

3. Naive Bayes Classifier:

In the Naive Bayes algorithm, the term “Bayes” comes from the Bayes theorem. It implements major three different methods in Naive Bayes:

  1. Gaussian Naive Bayes
  2. Multinomial Naive Bayes
  3. Bernoulli Naive Bayes

Let’s consider a weather forecasting example. Suppose we have Weather conditions (ie; features) as Outlook, Temperature, Humidity, and Windy and corresponding target variable Play. Now using this dataset we need to decide that whether the Player should play or not.

Let X(independent features)=(Outlook, Temperature, Humidity, Windy), and Y(dependent feature)=Play.

Here, P(Y=Play) and P(Y=No) can be calculated as:

P(Y=Play) = (tot of Yes)/(tot of Yes + tot of No) = 9/14

P(Y=No) = (tot of No)/(tot of Yes + tot of No) = 5/14

Applying Bayes’ Theorem,

P(Yes|rainy) = P(rainy|Yes)*P(Yes)/P(rainy)

P(rainy|Yes) = 3/9 = 0.33

P(rainy) = 5/14 = 0.35

P(Yes) = 9/14 = 0.64

So, P(Yes|rainy) = 0.33 X 0.64 / 0.35 = 0.60

Similarly, P(No|rainy) = P(rainy|No)*P(No)/P(rainy)

P(rainy|No) = 2/5 = 0.4

P(rainy) = 5/14 = 0.35

P(No) = 5/14 = 0.35

So, P(NO/rainy) = 0.4 X 0.35 / 0.35 = 0.4

Hence, P(Yes|rainy)>P(No|rainy) we can say that on a rainy day Player cannot play the game.

4. Pros and Cons:

Pros

  • Naive Bayes can be used for binary as well as multi-class classifications.
  • It is popularly used for text classification problems(ie; spam email classification).
  • It can be easily trained on small as well as large datasets.
  • This algorithm is easy and fasts to use.

Cons

  • The main disadvantage of Naive Bayes is that they assume all the features as independent of each other so that it cannot learn the relationship between the features.

5. Real-World Applications of Naive Bayes:

  • It is used in medical data classification.
  • As Naive Bayes is an eager learner, It can be used in real-time predictions.
  • It is used in Text Classification such as Spam filtering and Sentiment analysis.
  • It is used in the Recommendation system.

Summary

In this article, we have learned that Naive Bayes uses probabilities and makes naive(simple) assumptions that all the features are independent. It is a simple algorithm yet so powerful.

Hope you can take helpful insights from this article.

Happy Learning!

--

--

Sabita Rajbanshi
Machine Learning Community

Writing Towards Machine Learning and Artificial Intelligence.