Conditional probability: why is it defined like that?

So, you want to calculate the probability of an event knowing that another has happened. There is a formula for that, it is called conditional probability, but why is it the way it is? Let’s first write down the definition of conditional probability:

$\mathbb{P}(A | B) = \dfrac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)}$

We need to wonder: what does the happening of event $B$ tell about the odds of happening of event $A$ ? How much more likely $A$ becomes if $B$ happens? Think in terms of how $B$ affects $A$ .

If $A$ and $B$ are independent, then knowing something about B will not tell us anything at all about $A$ , at least not that we did not know already. In this case $A \cap B$ is empty and thus $\mathbb{P}(A | B) = \mathbb{P}(A)$ . This makes sense! In fact, consider this example: how does me buying a copybook affects the likelihood that your grandma is going to buy a frying pan? It does not: the first event has no influence on the second, thus the conditional probability is just the same as the normal probability of the first event.

Sets no intersection

If $A$ and $B$ are not independent, several things can happen, and that is where things get interesting. We know that B happened, and we should now think as if $B$ was our whole universe. The idea is: we already know what are the odds of $A$ , right? It is just $\mathbb{P}(A)$ . But how do they increase if we know that we do not really have to consider all possible events, but just a subset of them? As an example, think of $\mathbb{P}(\text{drawing a red ball})$ versus $\mathbb{P}(\text{drawing a red ball})$ knowing that all balls are red. This makes a huge difference, right? (As an aside, that is what we mean when we say that probability is a measure of our ignorance.)

So anyway, now we ask: what is the probability of $A$ ? Well, it would just be $\mathbb{P}(A)$ , but we must account for the fact that we now live inside $B$ , and everything that is outside it is as if it did not existed. So $\mathbb{P}(A)$ actually becomes $\mathbb{P}(A \cap B)$ : we only care about the part of $A$ that is inside $B$ , because that is where we live now.

But, there is a caveat. We are thinking as if $B$ was the whole universe but, in terms of probabilities, it actually is not, because nobody has informed the probability distribution. In fact, we compute $\mathbb{P}(A \cap B)$ precisely because we still live in the bigger universe, but we need to account for the fact that $B$ is our real universe now. That is why we need a re-scaling factor: something that will scale $\mathbb{P}(A \cap B)$ to make it numerically correct, to account for the fact that $B$ is our current universe. This is what the $\mathbb{P}(B)$ at the denominator does.

In fact, the factor $\frac{1}{\mathbb{P}(B)}$ accounts for how much relevant the information that $B$ happened is. If $\mathbb{P}(B) = 1$ , it means that, for probability purposes, $B = \omega$ – the switch of universe was just apparent! A further consequence is that $\mathbb{P}(A \cap B) = \mathbb{P}(A)$ , because $A$ is basically inside $B$ (apart from silly null-measure caveats). In turn, this has the consequence of making $\mathbb{P}(A | B) = \mathbb{P}(A)$ . This makes sense: if $B$ is sure to happen, then what does it tell us about the odds of something else? As an example, if we are considering strings of digits ( $\omega$ ), what is the likelihood that a certain string is made of ones or twos, knowing that it is made out of digits ( $B$ )? It sounds tautological, and it certainly is.

What about a $\mathbb{P}(B)$ that is big, yet $\leq 1$ ? This is trickier, as it mostly depends on the interplay between $\mathbb{P}(A \cap B)$ and $\mathbb{P}(B)$ . But you are not on a university level website to read about inverse proportionality, are you?

Another case worth inspection is when $A \subset B$ . In that case,

$\mathbb{P}(A | B) = \dfrac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)} = \dfrac{\mathbb{P}(B)}{\mathbb{P}(B)} = 1$

A inside B

Makes sense, right? If $B$ happened, and $A$ is inside it, then clearly $A$ must happen as well. If I bought a red umbrella, what are the odds that I bought a generic umbrella as a consequence? Full, yep.

Finally, let’s consider the case in which $\mathbb{P}(B)$ is very small. Suppose that $\mathbb{P}(B) = \epsilon$ and $A \cap B \neq \emptyset$ . An $\epsilon$ at the denominator will make the resulting fraction become significantly bigger.

B small

The idea here is that if $B$ is very narrow, if it talks about something very unlikely, and it happened, this greatly influences the overall conditional probability. What are the odds that I get hospitalized in Japan as a 25 years old man? Very low. What are the odds that today there is an earthquake in Japan? Very low. What are the odds that I get hospitalized in Japan, knowing that today an earthquake happened? Quite high. That’s the idea: the more $B$ is unlikely, the higher $\mathbb{P}(A | B)$ tends to be. In a sense, the more narrow $B$ is, the higher the amount of information it brings knowing that it happened.

Was this Helpful ?
yes no

Conditional probability: why is it defined like that?

Leave a Reply Cancel reply

Categories

Blog entries