Working things out backwards with Bayes

In this post, I’ll delve into Bayesian Inference which is used to determine the probably of cause and effect. That is; when I observe an event, with what certainty can I deduce the reason for it?

This is based on a probability theorem discovered by the English statistician  Thomas Bayes, though it would be wrong to attribute all the glory to him alone; Bayes’ notes were corrected, compiled and published posthumously by Richard Price (a Fellow of the Royal Society), and it was Pierre-Simon Laplace who understood the wide-ranging reach of (what is now called) Bayesian Logic.

Thomas Bayes

First a little review of the basics; probability is all about measuring the likelihood of events occurring, this can be calculated using the following ratio:

\textbf{probability of event occurring} = \frac{\textbf{number of times event can occur}}{\textbf{number of all possible events}}

In technical notation, the “probability of event occurring” is denoted P(\textbf{event}). So, for example, the probability of rolling a 1 on a 6-sided die. Would be calculated as follows:

P(\textbf{1}) = \frac{\textbf{number of sides on a die with value 1}}{\textbf{total number of sides on a die}} = \frac{\textbf{1}}{\textbf{6}} =  \textbf{0.1666666666666667}

Taking another example, the probability of someone having an Apple iPhone would be expressed as follows:

P(\textbf{iPhone}) = \frac{\textbf{number of people who have an iPhone}}{\textbf{total number of people}}

Similarly, the probability of owning an iMac would be:

P(\textbf{iMac}) = \frac{\textbf{number of people who have an iMac}}{\textbf{total number of people}}

This is all very interesting, but Bayesian logic allows us to take things a step further so that we can calculate the probability of something being true (e.g. that someone has an iPhone) if we already know something else is true (i.e. they already have an iMac).  The probability of something (A) being true given than something else (B) is already true, is denoted as P(\textbf{A} |\textbf{B}), or in our case P(\textbf{iPhone} |\textbf{iMac}).

In probability, the denominator (the thing on the bottom of the fraction) is the entire population. When we were just considering the probability of having an iPhone the denominator was the total number of all people. However, now our denominator will be the probability of the person having an iMac.

The nominator (the stuff on the top of the fraction) is the subpopulation we’re trying to focus on. In your case, this population has an iPhone and, having this iPhone, also has an iMac.

So the Bayesian theorem in full is written as follow:

P(\textbf{iPhone} |\textbf{iMac}) = \frac{P(\textbf{iMac} |\textbf{iPhone}) \textbf * P(\textbf{iPhone})}{P(\textbf{iMac})}

It seems a little backwards that in order to determine the probability that someone who has an iMac will also have an iPhone, you need to know the probability that someone who has an iPhone also has an iMac. However it it necessary since our numerator is really capturing two distinct events (each with its own probability of occurrence): first that he person has an iPhone (i.e. P(\textbf{iPhone}) ), and second that given someone has an iPhone what is the probability they also have an iMac (i.e. P(\textbf{iMac} |\textbf{iPhone}) ).

Unfortunately, there’s quite a lot of different variables in this theorem, which makes it difficult to apply. However this can be simplified (though it will appear that we’ve made it more complicated).

The total probability of having an iMac can be expressed as the probability of having an iMac if there already an iPhone and the probability of having an iPhone, plus the probability of having an iMac and no iPhone and the probability of not having an iPhone.

This can is written as follows:

P(\textbf{iMac}) = P(\textbf{iMac} |\textbf{iPhone}) \textbf * P(\textbf{iPhone}) +  P(\textbf{iMac} |\neg\textbf{iPhone}) \textbf * P(\neg\textbf{iPhone})

This would seem to be hyper pedantic, but note two things:

  1. We can express the probability of having an iMac while completely replacing P(\textbf{iMac})
  2. Also we know that P(\textbf{iPhone})  + P(\neg\textbf{iPhone}) = 1 , so if we know one we can easily calculate the other.

Now we can replace the denominator of Bayes’ theorem as follows:

P(\textbf{iPhone} |\textbf{iMac}) = \frac{P(\textbf{iMac} |\textbf{iPhone}) \textbf * P(\textbf{iPhone})}{P(\textbf{iMac} |\textbf{iPhone}) \textbf * P(\textbf{iPhone}) +  P(\textbf{iMac} |\neg\textbf{iPhone}) \textbf * P(\neg\textbf{iPhone})}

This may look more complicated than the original equation, but now that we’ve significantly reduced the number of variables in the formula and so it should (in theory) be easier to compute.