Probability and counting

Probability is a fraction

Possibility and probability… what’s the difference?

“Are you going to the grocery store?”

“Probably.”

“Possibly.”

“Probably,” sounds more likely than “Possibly,” right?

Mathematicians are more precise in their language than that.

Possibility is a count; probability is a fraction.

A possibility is the number of specified events that can happen. If you have a deck of playing cards (52 cards in the deck), then there are 52 possibilities of cards you might draw. There are four queens in the deck so there are 4 possibilities for drawing a queen.

But what is the probability of drawing a queen from a well shuffled deck in one draw?. A probability is a fraction with the number of possible events of interest (the draw of a queen) on top (the numerator), and the total number of possible events (the number of cards that could be drawn) on the bottom (the denominator). In this case, the probability of drawing a queen is 4/52 or 1/13 or about 0.077 or about 7.7%.

Probabilities are fractions so they can be represented in all the ways fractions can…and then some. The above probability can also be represented by “one in thirteen” or “an odds of “1 to twelve”. (Odds is the number of events of interest out of all the other events.”)

A probability, like an average, is an expected value. If you shuffled your cards, drew one and put it back, shuffled again, drew one and put it back, and repeated 98 more times, you would expect to see 7 or 8 queens (because of that seven tenths in the percent probability). Would you see seven or eight queens? Maybe, or maybe not. There is a chance that you might not see any. There’s even a chance (albeit, a very, very small one) that all 100 cards drawn will be queens. But if you kept drawing cards a huge number of times, on the average, you would see a queen about one time in each thirteen draws.

The count

Counting is fundamental to mathematics. Addition, multiplication, all the operations of arithmetic, even geometry and calculus are about counting and measurement, and measurement is about counting units. Only a few areas of abstract mathematics divorce numbers from counting.

But there is a whole field of mathematics that is about counting and it’s called “combinatorics”. It’s very important to probability theory because it provides easy ways to come up with numbers to go in the tops and bottoms of the fractions used as probabilities.

Say you have two groups of items and you want to pair each item in the first group with one in the second group, and you want all the first items to be from the first group, and all the second items to be from the second group. The pairs would be called “ordered pairs.” How many ordered pairs would there be? Well, you could write all the pairs out, hope you think of them all, and then count them. That would be a serious chore. Or you could multiply the number of items in the first group by the number of items in the second group and you would have it.

Or, say, you have one group of items and you want to know how many ways you can arrange them without regard to order. There’s a thing called the factorial that will do the job for you. If there are five items, you just multiply five times four times three times two and that’s your answer.

Factorials make sense. If you have five items, there are five ways to choose the first item. For each of the possibilities for the first item, you have four possibilities left for the second item – that’s 5×4 possible pairs. For each of those possible combinations, you have three possibilities left – 5x4x3 possible triplets. For each triplet, you have two items left, so that’s 5x4x3x2 possibilities and there’s only one item left, so you’re out of options.

Combinatorial mathematicians have come up with quick ways to count all sorts of things so combinatorics and probability theory go hand-in-hand.

Combining probabilities

Complications arise when probabilities must be combined. For instance, what is the probability that you might draw ten queens in ten draws of a well shuffled deck, replacing the drawn card each time?

You can either add or multiply each probability, but which is it? There are two good indicators. First, consider, the probability of drawing one queen is small, 1/13. But will the probability of drawing two queens successively be greater or smaller? Intuitively, it should be smaller So, multiplying two fractions gives a smaller result. You would multiply.

Also the question is, “What is the probability of drawing a queen, and a queen, and a queen…” That “and” is important… it’s also confusing. Usually when someone says “and” in a statement about math, they mean addition (“4 and 2 equals 6.”). That’s unfortunate because in mathematics “and” almost always means multiplication. In logic, when two statements are joined by “and” the resulting conjunction can only be correct if both statements are true. There are four possibilities (the first statement is true while the second is true, the first is true while the second is false, etc.). Out of four possibilities, only one can be true.

“Or” is the other conjunction you might run into in logic or math (in those places, it’s called a “disjunction”). In logic, if two statements are joined by “or”, the resulting statement is true if either if the constituent statements are true. Three of the four possibilities are true.

In set theory, “and” denotes the intersection of two sets. If a ={1, 2, 3, 4, 5} and b ={4, 5, 6, 7} , then a and b ={4, 5} . “or” denotes the union of two sets. a or b ={1, 2, 3, 4, 5, 6, 7} . “or” always results in a larger or equal result than “and”, so “and” should always make you think of multiplication and “or”, addition.

So, what is the probability of drawing 10 queens in 10 draws? It’s the product of 1/13 10 times or about 7 ten billionths.

To go on to illustrate “or”, what is the probability of drawing a queen or a king or a jack in a single draw. Well the probability of drawing any particular royalty card in one draw is the same, 1/13. So, 1/13 + 1/13 + 1/13 = 3/13. Of course, if you add up the probabilities of drawing any particular card, you add 13, 1/13s, to get 13/13 or 1.

The probability of certainty is 1. There is no probability greater than 1. The probability of impossibility is 0. There is no probability less than 0.

But the complications don’t stop there, oh, no. Usually, cards don’t get placed back in the deck. Look what happens to the deck then.

What is the probability of drawing four queens in succession when they are not replaced? The first draw has the probability of 4/52, but if a queen is drawn (if not, the game’s over), there are only 3 left in the deck of 51 cards, so the next probability is 3/51. Likewise, the next term is 2/50, and the next is 1/49. The product of those four terms, and the answer to our question is about 3.7 millionths.

Okay, just for completeness, in our first game, what is the probability that no queen will be drawn in 10 draws with replacement (if you want to calculate the answer for 100 draws, be my guest.) The probability that no queen is drawn on the first draw is 12/13. Our answer, then, is 12/13 to the tenth power, or about 0.45, or 9/20.

There are a lot more complications, which is why probability theory is a discipline unto itself. The take-away here is that each problem requires thought. Problems in probability can’t be approached in a purely mechanical way.

Continuous and discrete

In the most recent cohort of humans across the world, men have an average height of 178.4 centimeters with a standard deviation of 7.59 centimeters. Women have an average height of 165.7 centimeters with a standard deviation of 7.07 centimeters.

How do they know this? Did they measure everyone’s height? Nobody showed up at my door to measure me. Maybe they got my data from my doctor.

In fact, they have many sources of data – doctors, clinics, schools, etc. – but there has been no census that collected heights. Remember that a census collects data for a whole population and it generates parameters. A sample, on the other hand, collects data for a group selected from a population and what you get are statistics. That’s how the height statistics were obtained.

These statistics required a pretty big sample but, if you read the StatFiles on sampling and estimation, you realize that the samples aren’t nearly as big as most people would think. If you’re curious about how the data was collected, you might be interested in the Worldometer, which keeps up with demographic data (https://www.worldometers.info).

Once it was established that human height follows a normal distribution, we could just refer to the normal distribution to find lots of probabilities. For instance, we now know that the probability that a man will be between 170 and 186 centimeters tall is 68%, because 68% of the individuals in a normally distributed population will be within one standard deviation of the mean.

What’s the probability that a woman’s height will be in that range – 170 to 186 centimeters? There’s a (rather involved) formula that can be used to calculate cumulative probabilities for normal curves (a cumulative probability is the chance that a value from a distribution will be a particular probability or less – the sum of probabilities of all those values turning up in a random sample), but most spreadsheets will calculate it for you. The function probably looks something like “NORMDIST”. You feed it the value and the mean and standard deviation of the distribution you’re working with and it will give you the cumulative probability.

The probability that a randomly selected woman’s height will be 186 centimeters or less is 0.9979559799, or practically 100%. The probability that they will be 170 centimeters or less is about 0.73. I want the probability that she will have a height somewhere between these two values, so I can just subtract 0.73 from 1 to get 0.27. There’s a 27% chance that a woman will be as tall as a man who is within one standard deviation of a man’s average height.

When you read a report, you have to read it carefully. “4 out of 5 doctors…” and “4 out of 5 doctors asked…” mean two different things. The first indicates that a sample was taken; the second indicates that only five doctors might have been asked, and, in both cases the question should arise, “how were the doctors selected?”

Just on condition

Let’s look back at that smiles data we explored in the section “Analyzing nominal data”.

The probability that I smile at them is 77/157 or about 0.49. The probability that I smile at them and they smile at me is 48/157 or about 0.31. Now, what is the probability that someone will smile at me given that I smile at them. This is called a conditional probability. It is the probability that one thing will happen with the condition that something else has already happened.

Conditional probabilities are very important in medicine. What is the probability that you will contract the corona virus disease 2019 given that you have been exposed to the severe acute respiratory syndrome coronavirus 2? What is the probability that you will contract the flu if you take the current flu vaccine? What are the chances that you will continue having high blood pressure if you take a particular medication?

The shorthand for a conditional probability is P(y|x), which is read, “the probability that y will occur given x has occurred” or simply “the probability of y given x”. It can be calculated if you know the probability of x and y occurring together and the probability of x. And, it just so happens that a contingency table gives us all that!

To calculate a conditional probability, you can use the following formula.

If x and y are independent, if knowing about x doesn’t tell us anything about y, then yUx would just be the product of the probabilities of x and y, and the probabilities of x in the numerator and denominator would just cancel and all that would be left is the probability of y. In other words, the conditional probability of y given x, if x and y are independent, is just the unconditional probability of y.

But that’s not true of our smiles data. The joint probability of a person smiling back at me and me smiling at them is in the upper left cell of the contingency table, 0.31. The probability that I smile at them is the marginal for the first row, 0.49. So the probability that a person will smile at me given that I smile at them is 0.31/0.49 or about 0.63. That’s better than flipping a coin.

Backing up (Bayesian probability)

If you start with P(x|y) and you know the probability of observing both x and y (unconditionally), you can “go backwards” and find P(y|x).

P(y|x)=(P(x|y)P(y))/P(x)

That’s called Bayes’s Law. Let’s say that 1 percent of the population has a particular disease. If a person has that disease, there’s a 90% chance that they are also coughing. 80% of the population has a persistent cough, say from allergies, colds, etc. If a person has a persistent cough, what would you say the chances are that they have the disease?

You don’t have to guess. The answer is 0.9×0.01/0.8 or a little over 1% chance that a person has the disease given that they are coughing. Apply that to something like Covid-19 or a flu. Obviously, just because a person is coughing doesn’t mean that they have the current epidemic but, wanna clear a checkout line quickly?

That’s how math can dispel common sense.

Bayes’s Law can be used to clear up misconceptions, but wait! There’s more!

An entire system of statistics has been constructed on Bayes’s Law and it’s (reasonably) called Bayesian statistics.

Let’s say that I guess that 50% of the people I smile at will smile back at me. That’s a prior probability because I’ve made that guess before even checking any observations. I’ve made a subjective assessment, perhaps on prior experience.

Now, I go out and make some counts and I find that, when I have smiled at 49% of the people I observed, only 63% smiled back at me. 43% of all the people I met smiled at me. I can update my prior assessment. The posterior probability – the updated assessment – is 0.49×0.63/0.43 or 0.71%. The chance that I have smiled at them, given that they smile at me is much greater than the flip-of-a-coin probability that I originally surmised.

The amount of support that x provides for y is P(x|y)/P(x), or, in this case, 0.49/0.43 or about 1.14. If the ratio were 1, the probability that a person would smile at me given that I smile at them would be the same as the probability that they smile at me regardless – in other words, knowing whether I smiled at them or not would tell me nothing at all about whether they would smile.

One nice thing about this scenario is that, as more data comes in, I can continue to update my belief in the outcome.

The big difference between Bayesian statistics and traditional statistics (what is called “frequentist statistics”), is that Bayesian statistics include the subjective element in the picture.

What do you believe?