-
The goal is to classify vectors of discrete-valued features via a generative approach. That is
Let
. We define a generative classifier using the following equation -
We refer to
as the class conditional density (that is, the likelihood of generating given . -
We refer to
as the class prior The class prior is usually denoted as . So It captures the prior assumptions about the label distribution.
-
-
Assume: The features are conditionally independent (this is technically a naive assumption). That is,
Where
is the value of the label . Note that the distributions in the RHS can be substituted to whatever we want (i.e., a Gaussian or a Bernoulli). -
We estimate
from the dataset . Then, given , compute
Bayesian Naive Bayes
-
The Prior becomes:
This follows from the rule of product.
-
Correspondingly, the posterior becomes
Where
is a parameter for the distribution of . Also, the conditionals on the RHS are determined as needed (i.e., they can be Dirichlet or Beta, for example).
It is of the same form as the prior, except conditioned based on evidence seen from the dataset.
-
We can analyze the likelihood as follows: For a single datapoint,
We also have
-
At testing, the goal is
Note that to actually compute the above in a Bayesian Manner, we must integrate over
and to get the marginal distribution:
Filtering
- Since we assume the features are conditionally independent, we need to choose the appropriate features.
- One way to select the features is through variable filtering, that is by taking the top
features that are relevant for the problem. - One way to do this is to measure the mutual information between feature
and label .