Probability – Quick Math Intuitions https://quickmathintuitions.org Sharing quick intuitions for math ideas Thu, 16 May 2019 10:01:06 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.5 Probability as a measure of ignorance https://quickmathintuitions.org/probability-as-measure-of-ignorance/?pk_campaign=&pk_source= https://quickmathintuitions.org/probability-as-measure-of-ignorance/#respond Sat, 11 May 2019 06:29:32 +0000 http://quickmathintuitions.org/?p=335 One of the most beautiful intuitions about probability measures came from Rovelli’s book, that took it in turn from Bruno de Finetti. What does a probability measure measure? Sure, the…

The post Probability as a measure of ignorance appeared first on Quick Math Intuitions.

]]>
One of the most beautiful intuitions about probability measures came from Rovelli’s book, that took it in turn from Bruno de Finetti.

What does a probability measure measure? Sure, the open sets of the \sigma-algebra that supports the measure space. But really, what? Thinking about it, it is very difficult to define probability without using the word probable or possible.

Well, probability measures our ignorance about something.

When we make some claim with 90% probability, what we are really saying is that the knowledge we have allows us to make a prediction that is that much accurate. And the main point here is that different people may assign different probabilities to the very same claim! If you have ever seen weather forecasts for the same day disagree, you know what I am talking about. Different data or different models can generate different knowledge, and thus different probability figures.

But we do not have to go that far to find reasonable examples. Let’s consider a very simple one. Imagine you found yourself on a train, and in front of you is sitting a girl with clothes branded Patagonia. What would be the odds that the girl has been to Patagonia? Not more than average, you would guess, because Patagonia is just a brand that makes warm clothes, and can be purchased in several stores all around the world, probably even more than in Patagonia itself! So you would probably say that is surely no more than 50% likely.

But now imagine a kid in the same scenario. If they see a girl with Patagonia clothes, they would immediately think that she had been to Patagonia (with probability 100% this time), because they are lacking a good amount of important information that you instead hold. And so the figure associated with \mathbb{P}(\text{The girl has been to Patagonia} | \text{The girl has a Patagonia jacket}) is pretty different depending on the observer, or rather on the knowledge (or lack of) they possess. In this sense probability is a measure of our ignorance.

The post Probability as a measure of ignorance appeared first on Quick Math Intuitions.

]]>
https://quickmathintuitions.org/probability-as-measure-of-ignorance/feed/ 0
Conditional probability: why is it defined like that? https://quickmathintuitions.org/conditional-probability-why-is-it-defined-like-that/?pk_campaign=&pk_source= https://quickmathintuitions.org/conditional-probability-why-is-it-defined-like-that/#respond Sun, 07 Apr 2019 14:55:36 +0000 http://quickmathintuitions.org/?p=284 So, you want to calculate the probability of an event knowing that another has happened. There is a formula for that, it is called conditional probability, but why is it…

The post Conditional probability: why is it defined like that? appeared first on Quick Math Intuitions.

]]>
So, you want to calculate the probability of an event knowing that another has happened. There is a formula for that, it is called conditional probability, but why is it the way it is? Let’s first write down the definition of conditional probability:

    \[\mathbb{P}(A | B) = \dfrac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)}\]

We need to wonder: what does the happening of event B tell about the odds of happening of event A? How much more likely A becomes if B happens? Think in terms of how B affects A.

If A and B are independent, then knowing something about B will not tell us anything at all about A, at least not that we did not know already. In this case A \cap B is empty and thus \mathbb{P}(A | B) = \mathbb{P}(A). This makes sense! In fact, consider this example: how does me buying a copybook affects the likelihood that your grandma is going to buy a frying pan? It does not: the first event has no influence on the second, thus the conditional probability is just the same as the normal probability of the first event.

Sets no intersection

If A and B are not independent, several things can happen, and that is where things get interesting. We know that B happened, and we should now think as if B was our whole universe. The idea is: we already know what are the odds of A, right? It is just \mathbb{P}(A). But how do they increase if we know that we do not really have to consider all possible events, but just a subset of them? As an example, think of \mathbb{P}(\text{drawing a red ball}) versus \mathbb{P}(\text{drawing a red ball}) knowing that all balls are red. This makes a huge difference, right? (As an aside, that is what we mean when we say that probability is a measure of our ignorance.)

So anyway, now we ask: what is the probability of A? Well, it would just be \mathbb{P}(A), but we must account for the fact that we now live inside B, and everything that is outside it is as if it did not existed. So \mathbb{P}(A) actually becomes \mathbb{P}(A \cap B): we only care about the part of A that is inside B, because that is where we live now.

But, there is a caveat. We are thinking as if B was the whole universe but, in terms of probabilities, it actually is not, because nobody has informed the probability distribution. In fact, we compute \mathbb{P}(A \cap B) precisely because we still live in the bigger universe, but we need to account for the fact that B is our real universe now. That is why we need a re-scaling factor: something that will scale \mathbb{P}(A \cap B) to make it numerically correct, to account for the fact that B is our current universe. This is what the \mathbb{P}(B) at the denominator does.

In fact, the factor \frac{1}{\mathbb{P}(B)} accounts for how much relevant the information that B happened is. If \mathbb{P}(B) = 1, it means that, for probability purposes, B = \omega – the switch of universe was just apparent! A further consequence is that \mathbb{P}(A \cap B) = \mathbb{P}(A), because A is basically inside B (apart from silly null-measure caveats). In turn, this has the consequence of making \mathbb{P}(A | B) = \mathbb{P}(A). This makes sense: if B is sure to happen, then what does it tell us about the odds of something else? As an example, if we are considering strings of digits (\omega), what is the likelihood that a certain string is made of ones or twos, knowing that it is made out of digits (B)? It sounds tautological, and it certainly is.

What about a \mathbb{P}(B) that is big, yet \leq 1? This is trickier, as it mostly depends on the interplay between \mathbb{P}(A \cap B) and \mathbb{P}(B). But you are not on a university level website to read about inverse proportionality, are you?

Another case worth inspection is when A \subset B. In that case,

    \[\mathbb{P}(A | B) = \dfrac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)} = \dfrac{\mathbb{P}(B)}{\mathbb{P}(B)} = 1\]

A inside B

Makes sense, right? If B happened, and A is inside it, then clearly A must happen as well. If I bought a red umbrella, what are the odds that I bought a generic umbrella as a consequence? Full, yep.

Finally, let’s consider the case in which \mathbb{P}(B) is very small. Suppose that \mathbb{P}(B) = \epsilon and A \cap B \neq \emptyset. An \epsilon at the denominator will make the resulting fraction become significantly bigger.

B small

The idea here is that if B is very narrow, if it talks about something very unlikely, and it happened, this greatly influences the overall conditional probability. What are the odds that I get hospitalized in Japan as a 25 years old man? Very low. What are the odds that today there is an earthquake in Japan? Very low. What are the odds that I get hospitalized in Japan, knowing that today an earthquake happened? Quite high. That’s the idea: the more B is unlikely, the higher \mathbb{P}(A | B) tends to be. In a sense, the more narrow B is, the higher the amount of information it brings knowing that it happened.

The post Conditional probability: why is it defined like that? appeared first on Quick Math Intuitions.

]]>
https://quickmathintuitions.org/conditional-probability-why-is-it-defined-like-that/feed/ 0
On the meaning of hypothesis and p-value in statistical hypothesis testing https://quickmathintuitions.org/meaning-hypothesis-p-value-statistical-hypothesis-testing/?pk_campaign=&pk_source= https://quickmathintuitions.org/meaning-hypothesis-p-value-statistical-hypothesis-testing/#respond Thu, 01 Jun 2017 09:36:29 +0000 http://quickmathintuitions.org/?p=194 Statistical hypothesis testing is really an interesting topic. I’ll just briefly sum up what statistical hypothesis testing is about, and what you do to test an hypothesis, but will assume…

The post On the meaning of hypothesis and p-value in statistical hypothesis testing appeared first on Quick Math Intuitions.

]]>
Statistical hypothesis testing is really an interesting topic. I’ll just briefly sum up what statistical hypothesis testing is about, and what you do to test an hypothesis, but will assume you are already familiar with it, so that I can quickly cover a couple of A-HAs moments I had.


In statistical hypothesis testing, we

  • have some data, whatever it is, which we imagine as being values of some random variable;
  • make an hypothesis about the data, such as that the expected value of the random variable is \mu;
  • find a distribution for any affine transformation of the random variable we are making inference about – this is the test statistic;
  • run the test, i.e. numerically say how much probable how observations were in relation to the hypothesis we made.

I had a couple of A-HA moments I’d like to share.

There is a reason why this is called hypothesis testing and not hypothesis choice. There are indeed two hypothesis, the null and the alternative hypothesis. However, their roles are widely different! 90% of what we do, both from a conceptual and a numerical point of view, has to do with the null hypothesis. They really are not symmetric. The question we are asking is “With the data I have, am I certain enough my null hypothesis no longer stands?” not at all “With the data I have, which of the two hypothesis is better?”

In fact, the alternative hypothesis is only relevant in determining what kind of alternative we have: whether it’s one-sided (and which side) or two-sided. This affects calculations. But other than that, the math doesn’t really care about the specific value of the alternative. In other words, the two following test are really equivalent:

\mu_0 = 5
\mu_1 = 7

\mu_0 = 5
\mu_1 = 700

This accounts for why, when evaluating a p-value, we refuse the null hypothesis only for very low figures. The way I first thought about it had been: “Well, the p-value is, intuitively, a measure of the proximity of the observed data to the null hypothesis. Then, if I get something around 0.30, I should refuse the null hypothesis and switch to the alternative, as it seems a better theory.” But this is a flawed argument indeed. To see if the alternative was really better I should run a test using it as principal hypothesis! We refuse for very low p-values because that means we null hypothesis really isn’t any more good, and should be thrown to the bin. Then we need to care about finding another good theory that can suit the data.

However, before throwing the current theory out of the window, we don’t accept all kinds of evidence against it: we want a very strong evidence. We don’t want to discard the current theory for another that could only be marginally better. It must be crushingly better!

The post On the meaning of hypothesis and p-value in statistical hypothesis testing appeared first on Quick Math Intuitions.

]]>
https://quickmathintuitions.org/meaning-hypothesis-p-value-statistical-hypothesis-testing/feed/ 0
Metaphysics on geometric distribution in probability theory https://quickmathintuitions.org/metaphysics-on-geometric-distribution-probability-theory/?pk_campaign=&pk_source= https://quickmathintuitions.org/metaphysics-on-geometric-distribution-probability-theory/#respond Tue, 31 Jan 2017 11:38:33 +0000 http://quickmathintuitions.org/?p=180 I realized geometric distribution is not exactly about the time needed to get the first success in a given number of trials. This is a very odd feeling. It is…

The post Metaphysics on geometric distribution in probability theory appeared first on Quick Math Intuitions.

]]>
I realized geometric distribution is not exactly about the time needed to get the first success in a given number of trials. This is a very odd feeling. It is probably a feeling applied mathematicians get sometimes, when they feel they are doing the best they can, and yet the theory is not perfect.

This may be a naive post, I warn you, but I was really stunned when I realized this.

Geometric distribution is not about the first success

Let’s jump to the point. We know (or at least, I was taught) that geometric distribution is used to calculate the probability that the first success in k trials (all independent and of probability p) will happen precisely at the k-th trial.

Remember that a geometric distribution is a random variable X such that its distribution is

\Pr(X=k)=(1-p)^{k-1}\,p\,

How can we relate the above distribution with the fact that it matches the first success? Well, we need to have one success, which explains the p at the bottom. Moreover, we want to have just one success, so all other trials must be unsuccessful, which explains the (1-p)^{k-1}.

But hey, where would first ever be written? Unless you do probability in a non-commutative ring (in which case, I don’t know what you are doing), multiplication is commutative. So who can tell the order between the events in a Bernoulli process?

In fact, (1-p)^{k-1}p could just as well refer to having unsuccessful outcomes for the first k-1 trials and then a successful one at the k-th trial, as to having a success in the very first attempt and then all failures. As it is, as long as we have one (and only one) success among the k attempts, the geometric distribution holds!

Apparently then, geometric distribution is about the time of first success, but it is not just about that. It encompasses way more cases, all equally likely. Geometric distribution allows to calculate exactly one success will happen in k trials in a Bernoulli process.

The universe does not care about the order of events (in a Bernoulli process, at least). As long as we do k trials, regardless of when the success happens, the universe does not care. This stuns me!

The post Metaphysics on geometric distribution in probability theory appeared first on Quick Math Intuitions.

]]>
https://quickmathintuitions.org/metaphysics-on-geometric-distribution-probability-theory/feed/ 0
Random variables: what are they and why are they needed? https://quickmathintuitions.org/random-variables-why-are-they-needed/?pk_campaign=&pk_source= https://quickmathintuitions.org/random-variables-why-are-they-needed/#respond Thu, 24 Nov 2016 20:02:11 +0000 http://quickmathintuitions.org/?p=156 This article aims at providing some intuition for what random variables are and why random variables are useful and needed in probability theory. Intuition for random variables Informally speaking, random variables…

The post Random variables: what are they and why are they needed? appeared first on Quick Math Intuitions.

]]>
This article aims at providing some intuition for what random variables are and why random variables are useful and needed in probability theory.

Intuition for random variables

Informally speaking, random variables encode questions about the world in a numerical way.

How many heads can I get if I flip a coin 3 times?

How many people will vote the Democrats at the US presidential elections?

I want to make pizza. What is the possible overall cost of the ingredients, considering all combinations of different brands of them?

These are all examples of random variables. What a random variable does, in plain words, is to take a set of possible world configurations and group them to a number. What I mean when I say world configurations will be clearer soon, when talking about the sample space \Omega (which, appropriately, is also called universe).

I just wanted to provide a very brief informal description of random variables, but stick with me and we will dive deeper in the matter with an example!

A simple random variable example

Suppose to flip a (balanced) two-headed coin three times. If we write down all possible outcomes, we obtain the universe (or sample space) \Omega:

\Omega = \{ HHH, THH, HTH, HHT, TTH, THT, HTT, TTT \}

random-variable-1In which we have identified head with H and tail with T. The first element corresponds to three heads, the following three elements correspond to two heads, the following three more correspond to one head, and the last one to no heads.

Let’s take a second to notice that \Omega is made up of 1 + 3 + 3 + 1 = 8 items.

Now, what if I asked you how many heads you can get overall by flipping three times a coin? You would answer me by exhibiting the following set (who wouldn’t reply exhibiting a set, really!):

X = \{3, 2, 1, 0\}

Notice that X is made up of only 4 elements, whereas \Omega had 8: we have reduced the amount of data to handle. (Also, \Omega was made up of more complex data, because each of its 8 elements was made up of 3 letters.)

And lo! We have stumbled upon a random variable. We had a universe of possible configurations \Omega and, passing through a question, we have mapped them in a numerical way that’s relevant for our question. This is crucial, so I will say it once again: from \Omega, which contained a lot more information than we needed, we managed to extract the part of the data that was relevant to our study.

In a way, every time you study a phenomenon through some data, you are always using random variables to do it, because you only look at the data that’s relevant and ignore what’s not important for you at that moment. In our case, for example, we don’t care in what order the heads came, we just want two of them.

Of course, we can ask a variety of questions about the same phenomenon. In the case of the 3-coins-flipping, apart from “How many heads could we get?” we could also ask “How many tails could we get?”. It was a trivial phenomenon so there’s not much we can study about it, but try to think about a medical experimentation: there is a lot of data and several questions can be asked about it.

Why is a random variable useful?

At this point, a random variable just seems like a very useful concept, but one could argue that reducing the amount of data is not a good enough reason to introduce a new idea.

But random variables are defined in probability theory, so they must have something to do with probabilities! Imagine we were interested in the following question “What’s the likelihood of getting 2 heads (flipping a balanced coin 3 times)?”. What is beautiful about random variables is that they work in perfect tune with the probability measure we have on \Omega!

As long as we talk about discrete cases (meaning numbers are integer: we cannot get 1.5 heads), it may look like the concept of a random variable is superfluous, because we could always go look at \Omega and see how many cases satisfy our question and how many do not. However, this is impractical for huge amounts of data, not to mention the fact that more often than not the universe \Omega is not even explicitly known. But most importantly, random variables are essential when dealing with continuous quantities and, above all, when asking more complex questions (which may involve combinations of more than one variable, for example).

Why can’t we do away with random variables?

Mathematically speaking, a random variable is a function

X\colon \Omega \to \mathbb{R}

Having \mathbb{R} as output gives us a huge advantage: we can make use of all the calculus we know! We can calculate integrals, which allows us to compute the mean and variance of a phenomenon.[1]

In a way, the abstract concept of a random variable is the price we have to pay for going beyond the “How may heads can I get by flipping 3 times a coin?”.

That’s all for now, I hope this helps in understanding the use and importance of random variables!

Footnotes

1. See this great math.stackexchange answer as well.

The post Random variables: what are they and why are they needed? appeared first on Quick Math Intuitions.

]]>
https://quickmathintuitions.org/random-variables-why-are-they-needed/feed/ 0