May 31, 2025

Simple Probability

I’m reading OpenIntro Statistics - Fourth Edition (free to download). This is a summary of chapters 3 and 4, and covers topics like: probablity, condtional probability,

Definition of Probability

Probability is the long-run proportion of times an outcome would occur if a random process were repeated infinitely.
It ranges from 0 to 1 (or 0% to 100%).

Law of Large Numbers

As the number of trials increases, the observed proportion of outcomes approaches the true probability.
Example: With enough die rolls, the proportion of 1s will converge to 1/6.

Disjoint (Mutually Exclusive) Events

Two events are disjoint if they cannot both occur at the same time.
Example: Rolling a 1 and a 2 on a single die roll are disjoint.
For disjoint events A and B: $P(A \text{ or } B) = P(A) + P(B)$

Addition Rule for Multiple Disjoint Events

If outcomes A₁, A₂, …, Aₖ are all disjoint: $P(A₁ \text{ or } A₂ \text{ or } \cdots \text{ or } Aₖ) = P(A₁) + P(A₂) + \cdots + P(Aₖ)$
Example: $P(1 \text{ or } 2) = P(1) + P(2) = \frac{1}{6} + \frac{1}{6} = \frac{1}{3}$

Complement Rule

The probability of not A: $P(\text{not } A) = 1 - P(A)$
Example: $P(\text{not 2}) = 1 - \frac{1}{6} = \frac{5}{6}$

Complementary vs Disjoint Events

Complementary events are a special case of disjoint events:
They are disjoint and together cover the entire sample space.
If A and B are complementary, then: $P(A) + P(B) = 1$
All complementary events are disjoint, but not all disjoint events are complementary.

Example 1: Complementary Events

Let A = “roll a 2”, B = “not roll a 2” on a die:

A = {2}, B = {1, 3, 4, 5, 6}
A and B are disjoint (no common outcomes)
A and B are complementary because they cover all outcomes.

Example 2: Disjoint but Not Complementary

Let A = “roll a 1”, B = “roll a 2”:

A = {1}, B = {2}
A and B are disjoint (cannot happen together)
But they are not complementary because other outcomes (3–6) exist.

Probability of Independent Events

For independent events A and B: $P(A \text{ and } B) = P(A) \times P(B)$
Example: Probability both dice show 1: $\frac{1}{6} \times \frac{1}{6} = \frac{1}{36}$

General Addition Rule (For Any Two Events)

If A and B are any events (disjoint or not): $P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B)$
This avoids double-counting overlapping outcomes.

Example Using a Deck of Cards

Probability of drawing a diamond or a face card:
- Diamonds: 13/52
- Face cards: 12/52
- Diamond face cards: 3/52 $P(\text{diamond or face}) = \frac{13}{52} + \frac{12}{52} - \frac{3}{52} = \frac{22}{52} = \frac{11}{26}$

Independence

Two random processes are independent if the outcome of one does not affect the outcome of the other.
Example: Flipping a coin and rolling a die. Knowing the coin landed on heads gives no clue about the die’s result.
Non-example: Stock prices – they often move together, so they are not independent.

Dice Example: Independence in Action

Consider rolling a red die and a white die.
- The probability both show a 1: $P(\text{red} = 1 \text{ and } \text{white} = 1) = \frac{1}{6} \times \frac{1}{6} = \frac{1}{36}$

Three Independent Dice

Add a blue die, also independent of the others.
Probability all three dice show a 1: $P(\text{red} = 1 \text{ and } \text{white} = 1 \text{ and } \text{blue} = 1) = \frac{1}{6} \times \frac{1}{6} \times \frac{1}{6} = \frac{1}{216}$

Multiplication Rule for Independent Processes

For two independent events A and B: $P(A \text{ and } B) = P(A) \times P(B)$
For multiple independent events A₁ through Aₖ: $P(A_1 \text{ and } A_2 \text{ and } \cdots \text{ and } A_k) = P(A_1) \times P(A_2) \times \cdots \times P(A_k)$

Guided Practice 3.22

About 9% of people are left-handed. Suppose 2 people are selected at random from the U.S. population. Because the sample size of 2 is very small relative to the population, it is reasonable to assume these two people are independent. (a) What is the probability that both are left-handed? (b) What is the probability that both are right-handed?

(a) The probability of two randomly selected people both being left-handed is calculated using the multiplication rule: 0.09 × 0.09 = 0.0081. This assumes that the handedness of the first person has no effect on the second, which is generally reasonable given a large population.

(b) The calculation for both being right-handed is 0.91 × 0.91 = 0.8281, again assuming independence, i.e, there are no people that are ambidextrous.

(d) The probability that 5 right-handed people are selected is: 0.91⁵ = 0.6240 (rounded to four decimal places), assuming independence.

(e) The probability that 5 left-handed people are selected is: 0.09⁵ = 0.0000059049 (about 0.000006), a very rare occurrence.

(f) The probability that not all of the people are right-handed is the complement of all five being right-handed: 1 − 0.91⁵ = 1 − 0.6240 = 0.3760

Guided Practice 3.24

Suppose the variables handedness and sex are independent, i.e. knowing someone's sex provides no useful information about their handedness and vice-versa. Then we can compute whether a randomly selected person is right-handed and female using the Multiplication Rule:

P(right-handed and female) = P(right-handed) × P(female) = 0.91 × 0.50 = 0.455

Three people are selected at random.

(a) What is the probability that the first person is male and right-handed?
0.50 × 0.91 = 0.455

(b) What is the probability that the first two people are male and right-handed?
(0.50 × 0.91)² = 0.455² = 0.207

(d) What is the probability that the first two people are male and right-handed and the third person is female and left-handed?
0.455 × 0.455 × 0.045 = 0.0093 (approximately)

Marginal, Joint and Conditional Probability

mach_learn	truth: fashion	truth: not	Total
pred_fashion	197	22	219
pred_not	112	1491	1603
Total	309	1513	1822

Marginal Probability

Marginal probabilities are the totals for each row or column, representing the probability of a single event regardless of other variables.

$P(\text{fashion}) = \frac{309}{1822} \approx 0.1695$
$P(\text{not}) = \frac{1513}{1822} \approx 0.8305$
$P(\text{pred_fashion}) = \frac{219}{1822} \approx 0.1202$
$P(\text{pred_not}) = \frac{1603}{1822} \approx 0.8798$

Joint Probability

Joint probabilities represent the probability of two events occurring together, corresponding to the individual cells in the table.

$P(\text{pred_fashion} \land \text{fashion}) = \frac{197}{1822} \approx 0.1081$
$P(\text{pred_fashion} \land \text{not}) = \frac{22}{1822} \approx 0.0121$
$P(\text{pred_not} \land \text{fashion}) = \frac{112}{1822} \approx 0.0615$
$P(\text{pred_not} \land \text{not}) = \frac{1491}{1822} \approx 0.8183$

Conditional Probability

Conditional probabilities describe the probability of one event given that another event has occurred.

The conditional probability of outcome A given condition B is computed as the following:

\[P(A \mid B) = \frac{P(A \text{ and } B)}{P(B)}\]

This reads as: the probability of event $A$ given $B$ is equal to the joint probability of $A$ and $B$ divided by the probability of $B$.

Examples:

$P(\text{fashion} \, \mid \, \text{pred_fashion}) = \frac{197}{219} \approx 0.8995$
$P(\text{not} \, \mid \, \text{pred_fashion}) = \frac{22}{219} \approx 0.1005$
$P(\text{fashion} \, \mid \, \text{pred_not}) = \frac{112}{1603} \approx 0.0699$
$P(\text{not} \, \mid \, \text{pred_not}) = \frac{1491}{1603} \approx 0.9301$
$P(\text{pred_fashion} \, \mid \, \text{fashion}) = \frac{197}{309} \approx 0.6375$
$P(\text{pred_not} \, \mid \, \text{fashion}) = \frac{112}{309} \approx 0.3625$

Marginal Probability Question

Subjective Social Class Identity	Working Class	Upper Middle Class	Total
Poor	0	0	0
Working Class	8	0	8
Middle Class	32	13	45
Upper Middle Class	8	37	45
Upper Class	0	0	0
Total	48	50	98

What is the probability that a student’s objective social class position is upper middle class?

\[P(\text{Objective = Upper Middle Class}) = \frac{50}{98} \approx 0.51\]

Conditional Probability Question

What is the probability that a student who is objectively in the working class associates with upper middle class?

We are asked to find the probability that a student’s subjective identity is “upper middle class” given their objective class is “working class”:

\[P(\text{Subjective = Upper Middle Class} \mid \text{Objective = Working Class}) = \frac{8}{48} \approx 0.17\]

Conditional Probability Question 2

Given:

14.6% of Americans live below the poverty line
→ $P(\text{Below PL}) = 0.146$
20.7% of Americans speak a language other than English at home
→ $P(\text{Speak non-Eng}) = 0.207$
4.2% fall into both categories
→ $P(\text{Below PL and Speak non-Eng}) = 0.042$

What percent of Americans who live below the poverty line also speak a language other than English at home?

We are looking for the conditional probability:
$P(\text{Speak non-Eng} \mid \text{Below PL})$

Using the formula:

\[P(\text{Speak non-Eng} \mid \text{Below PL}) = \frac{P(\text{Below PL and Speak non-Eng})}{P(\text{Below PL})}\] \[= \frac{0.042}{0.146} \approx 0.2877\]

Approximately 28.8% of Americans who live below the poverty line speak a language other than English at home.

General Multiplication Rule

General Multiplication Rule for events that might not be independent states:

If $A$ and $B$ represent two outcomes or events, then:

\[P(A \text{ and } B) = P(A|B) \times P(B)\]

It is useful to think of $A$ as the outcome of interest and $B$ as the condition.

Relationship to Conditional Probability

This General Multiplication Rule is simply a rearrangement of the conditional probability formula:

\[P(A|B) = \frac{P(A \text{ and } B)}{P(B)}\]

Multiplying both sides by $P(B)$ gives:

\[P(A \text{ and } B) = P(A|B) \times P(B)\]

Example: Smallpox Inoculation

Consider the smallpox dataset. Suppose we are given:

$P(\text{inoculated} = \text{no}) = 0.9608$
$P(\text{result} = \text{lived} \mid \text{inoculated} = \text{no}) = 0.8588$

We want to find the probability that a resident was not inoculated and lived:

\[P(\text{result} = \text{lived and inoculated} = \text{no})\] \[= P(\text{result} = \text{lived} \mid \text{inoculated} = \text{no}) \times P(\text{inoculated} = \text{no})\] \[= 0.8588 \times 0.9608 = 0.8251\]

This shows how we can apply the General Multiplication Rule to compute joint probabilities when events are not necessarily independent.

Independence Considerations in Conditional Probability

If two events are independent, then knowing the outcome of one should provide no information about the other. This idea can be verified using conditional probability.

Guided Practice 3.38

Let $X$ and $Y$ represent the outcomes of rolling two dice.

(a) What is the probability that the first die, $X$ , is 1?
⇒ $P (X = 1) = \frac{1}{6}$

(b) What is the probability that both $X$ and $Y$ are 1?
⇒ $P (X = 1 and Y = 1) = \frac{1}{36}$

P (Y = 1 ∣ X = 1) = \frac{P (Y = 1 and X = 1)}{P (X = 1)} = \frac{1 / 36}{1 / 6} = \frac{1}{6}

(d) What is $P (Y = 1)$ ? Is this different from part (c)?
⇒ $P (Y = 1) = \frac{1}{6}$

Conclusion:
No, the result is the same. This demonstrates that knowing $X = 1$ does not affect the probability of $Y = 1$ , meaning the two events are independent.

Even though the conditional probability $P (Y = 1 ∣ X = 1)$ equals $P (Y = 1)$ , the joint probability

P (X = 1 and Y = 1) = \frac{1}{36}

is much lower than either individual probability.

Conditional independence does not imply high joint probability; it simply means that knowing one event does not affect the probability of the other.

Guided Practice 3.39

Ron is watching a roulette table in a casino and notices that the last five outcomes were black. He figures that the chances of getting black six times in a row is very small (about 1/64) and puts his paycheck on red.

What is wrong with his reasoning?

Ron is falling victim to the gambler's fallacy—the mistaken belief that previous independent outcomes (such as the last five black spins) affect the probability of future outcomes. In a fair game of roulette, each spin is independent, meaning that the probability of black or red remains the same on every spin regardless of previous results. Roulette outcomes are independent events, meaning each spin is unaffected by previous results. The probability of black or red on any single spin remains approximately 18/38 (on a standard American roulette wheel), regardless of prior outcomes. Thus, the chance of red remains the same as always, and previous black spins do not increase its likelihood.

Tree Diagrams

Tree diagrams are a tool to organize outcomes and probabilities around the structure of the data. They are most useful when two or more processes occur in a sequence and each process is conditioned on its predecessors.

The smallpox data fit this description. We see the population as split by inoculation: yes and no. Following this split, survival rates were observed for each group. This structure is reflected in the tree diagram shown below. The first branch for inoculation is said to be the primary branch, while the other branches are secondary.

For example, the top branch is the probability that result = lived conditioned on the information that inoculated = yes. We may (and usually do) construct joint probabilities at the end of each branch in our tree by multiplying the numbers we come across as we move from left to right.

These joint probabilities are computed using the General Multiplication Rule:

\[P(\text{inoculated} = \text{yes and result} = \text{lived})\] \[= P(\text{inoculated} = \text{yes}) \times P(\text{result} = \text{lived} \mid \text{inoculated} = \text{yes})\] \[= 0.0392 \times 0.9754 = 0.0382\]

flowchart TD A[Population] --> B1[Inoculated Yes 0.0392] A --> B2[Inoculated No 0.9608] B1 --> C1[Lived 0.9754
0.0392 * 0.9754 = 0.03824] B1 --> C2[Died 0.0246
0.0392 * 0.0246 = 0.00096] B2 --> C3[Lived 0.8589
0.9608 * 0.8589 = 0.82523] B2 --> C4[Died 0.1411
0.9608 * 0.1411 = 0.13557]

Tree diagrams are annotated with marginal and conditional probabilities. This tree diagram splits the smallpox data by inoculation into the yes and no groups with respective marginal probabilities 0.0392 and 0.9608. The secondary branches are conditioned on the first, so we assign conditional probabilities to these branches. For example, the top branch is the probability that result = lived conditioned on the information that inoculated = yes. We may (and usually do) construct joint probabilities at the end of each branch in our tree by multiplying the numbers we come across as we move from left to right. These joint probabilities are computed using the General Multiplication Rule.

Bayes’ Theorem

In many situations, we are given a conditional probability of the form $P(A \mid B)$ but we are interested in finding the reversed conditional probability $P(B \mid A)$. Bayes’ Theorem provides a way to compute this when direct calculation is not possible or when a tree diagram becomes too cumbersome.

\[P(B \mid A) = \frac{P(A \mid B) \cdot P(B)}{P(A)}\]

This theorem is especially useful when we want to “invert” a conditional probability — i.e., to switch the condition and the outcome.

Example: Smallpox Inoculation and Survival

Suppose we want to compute the probability that someone was inoculated, given that they lived — in other words, $P(\text{inoculated = yes} \mid \text{lived})$.

We already have:

$P(\text{lived} \mid \text{inoculated = yes}) = 0.9754$
$P(\text{inoculated = yes}) = 0.0392$
$P(\text{lived}) = 0.03824 + 0.82523 = 0.86347$

Using Bayes’ Theorem:

\[P(\text{inoculated = yes} \mid \text{lived}) = \frac{P(\text{lived} \mid \text{inoculated = yes}) \cdot P(\text{inoculated = yes})}{P(\text{lived})}\] \[= \frac{0.9754 \cdot 0.0392}{0.86347} \approx \frac{0.03824}{0.86347} \approx 0.0443\]

So, even though the probability of surviving given inoculation is very high (97.54%), the probability of having been inoculated given that the person survived is only about 4.43%. This is because the vast majority of the population was not inoculated. Bayes’ Theorem lets us work backwards to compute the reverse conditional probability.

Bayes’ Theorem Question

The American Community Survey is an ongoing survey that provides data every year to help communities plan investments and services.

The 2010 American Community Survey estimates that:

14.6% of Americans live below the poverty line
20.7% speak a language other than English at home
4.2% fall into both categories

What percent of Americans live below the poverty line given that they speak a language other than English at home?

Using Bayes’ Theorem:

\[P(\text{Below PL} \mid \text{Speak non-Eng}) = \frac{P(\text{Below PL} \ \& \ \text{Speak non-Eng})}{P(\text{Speak non-Eng})}\] \[= \frac{0.042}{0.207} \approx 0.2\]

So, about 20% of Americans who speak a language other than English at home live below the poverty line.

Bayes’ Theorem Formula:

\[P(A \mid B) = \frac{P(A \ \& \ B)}{P(B)}\]

Mutually Exclusive vs Independent Events (Dice Example)

To explain the difference between mutually exclusive and independent events using a dice example, let’s first define these concepts and then illustrate them with scenarios involving a standard six-sided die.

Mutually Exclusive Events: Two events are mutually exclusive if they cannot occur at the same time. The occurrence of one event means the other cannot happen in the same trial.
\[P(A \text{ and } B) = 0\]
Independent Events: Two events are independent if the occurrence of one does not affect the probability of the other occurring.
\[P(A \text{ and } B) = P(A) \cdot P(B)\]

Example: Assume we are rolling a single fair six-sided die (with faces numbered 1 to 6) unless stated otherwise.

Mutually Exclusive Events

Example: Rolling an even number (2, 4, or 6) and rolling a 3 in a single roll of the die.

Event A: Roll an even number ${2, 4, 6}$
$P(A) = \frac{3}{6} = \frac{1}{2}$
Event B: Roll a 3 ${3}$
$P(B) = \frac{1}{6}$

Explanation: These events are mutually exclusive because a single roll cannot result in both an even number and a 3 (since 3 is odd). The outcomes ${2, 4, 6}$ and ${3}$ have no overlap.

Probability of both occurring:
$P(A \text{ and } B) = 0$
Probability of either occurring:
$P(A \text{ or } B) = P(A) + P(B) = \frac{1}{2} + \frac{1}{6} = \frac{2}{3}$

Independent Events

Example: Rolling an even number on the first roll and rolling a number greater than 4 (i.e., 5 or 6) on a second roll of the same die.

Event A: Roll an even number on the first roll ${2, 4, 6}$
$P(A) = \frac{3}{6} = \frac{1}{2}$
Event B: Roll a number greater than 4 on the second roll ${5, 6}$
$P(B) = \frac{2}{6} = \frac{1}{3}$

Explanation: These events are independent because the outcome of the first roll does not affect the outcome of the second roll. The die has no memory.

Probability of both occurring:
$P(A \text{ and } B) = P(A) \cdot P(B) = \frac{1}{2} \cdot \frac{1}{3} = \frac{1}{6}$
Probability of either occurring:
$\begin{align*} P(A \text{ or } B) &= P(A) + P(B) - P(A \text{ and } B) \\ &= \frac{1}{2} + \frac{1}{3} - \frac{1}{6} \\ &= \frac{3}{6} + \frac{2}{6} - \frac{1}{6} = \frac{4}{6} = \frac{2}{3} \end{align*}$
Mutually Exclusive: Cannot happen together (e.g., rolling a 3 and an even number in one roll is impossible).
Independent: Can happen together, and one event doesn’t influence the other (e.g., first roll even, second roll > 4).

Sampling from a Small Population

When sampling from a population, we typically assume the population is large, and sampling one observation doesn’t meaningfully affect the rest. However, when sampling without replacement from a small population, or when we sample more than 10% of the population, probabilities shift because each draw slightly alters the composition of the remaining population.

Example 3.47

A professor selects a student at random to answer a question. There are 15 students in class.

Probability that you are selected:

P (You are picked) = \frac{1}{15} \approx 0.067

Example 3.48

The professor asks 3 questions and does not repeat students. What's the probability you're not picked for any of them?

Let's compute the probability of not being picked for each question, step by step:

First question:

P (Q_{1} = not picked) = \frac{14}{15}

Second question (given you weren't picked before):

P (Q_{2} = not picked ∣ Q_{1} = not picked) = \frac{13}{14}

Third question (given you weren't picked in first two):

P (Q_{3} = not picked ∣ Q_{1} = not picked, Q_{2} = not picked) = \frac{12}{13}

To get the overall probability that you are not picked in all three questions, multiply:

\begin{array}{l} P (not picked in 3 questions) = P (Q_{1} = not picked) \cdot \\ P (Q_{2} = not picked ∣ Q_{1} = not picked) \cdot \\ P (Q_{3} = not picked ∣ Q_{1} = not picked, Q_{2} = not picked) \\ = \frac{14}{15} \cdot \frac{13}{14} \cdot \frac{12}{13} \\ = \frac{12}{15} = 0.80 \end{array}

What Rule Justified the Multiplication?

This is an application of the General Multiplication Rule, which says:

The probability of multiple events all occurring is the product of each event’s probability, conditioned on the events that came before it.

Mathematically, for three events $A$, $B$, and $C$:

\[P(A \text{ and } B \text{ and } C) = P(A) \cdot P(B \mid A) \cdot P(C \mid A \text{ and } B)\]

In Example 3.48:

$A = Q_1 = \text{not picked}$
$B = Q_2 = \text{not picked}$
$C = Q_3 = \text{not picked}$

So:

\[P(\text{not picked all 3 times}) = P(A) \cdot P(B \mid A) \cdot P(C \mid A \text{ and } B)\]

This rule allows us to break down complex, dependent events into smaller conditional steps that we can compute.

Random Variables

A random variable is a numerical outcome of a random process. We typically represent it using capital letters like $X$, $Y$, or $Z$. For instance, in a statistics class, the amount of money a student spends on books is a random variable $X$.

Example 3.54: Bookstore Book Sales

Let's say:

20% of students buy no books: $0
55% buy only the textbook: $137
25% buy both books: $170

These percentages remain stable across terms. If there are 100 students, the expected number of books sold is:

20 students buy 0 books → 0 books
55 students buy 1 book → 55 books
25 students buy 2 books → 50 books

Total expected books sold:

0 + 55 + 50 = 105 books

Expectation

Let $X$ be the amount spent by a single student. The values it can take are:

$x_1 = 0$, with $P(X = 0) = 0.20$
$x_2 = 137$, with $P(X = 137) = 0.55$
$x_3 = 170$, with $P(X = 170) = 0.25$

To compute the expected value, $E(X)$:

\[E(X) = \sum x_i \cdot P(X = x_i) = 0 \cdot 0.20 + 137 \cdot 0.55 + 170 \cdot 0.25 = 0 + 75.35 + 42.5 = 117.85\]

So the expected revenue from one student is:

\[\mu = E(X) = 117.85\]

Variability in Random Variables

To measure variability, we use variance and standard deviation. These indicate how much outcomes typically differ from the mean.

Let $X$ have outcomes $x_1, …, x_k$ with probabilities $P(X = x_1), …, P(X = x_k)$, and mean $\mu$.

The variance of $X$ is:

\[\sigma^2 = \sum_{j=1}^{k} (x_j - \mu)^2 \cdot P(X = x_j)\]

The standard deviation is:

\[\sigma = \sqrt{\sigma^2}\]

Example 3.58: Compute Variance and Standard Deviation

We already have:

E (X) = μ = 117.85

Now build a table:

i	$x$ _i	$P (X = x$ _i)	$x$ _i⋅P(X=x_i)	$x$ _i−μ	$(x$ _i−μ) 2	$(x$ _i−μ) 2⋅P(X=x_i)
1	0	0.20	0	-117.85	13888.62	2777.72
2	137	0.55	75.35	19.15	366.72	201.70
3	170	0.25	42.50	52.15	2719.62	679.91
Total			117.85			Variance = 3659.33

So:

Variance:

σ^{2} = 3659.33

Standard Deviation:

σ = \sqrt{3659.33} = 60.49

Expected Value $E(X)$ gives the average expected outcome.
Variance $\sigma^2$ and Standard Deviation $\sigma$ measure the spread of outcomes around the mean, indicating how much the values of a random variable deviate from the expected value on average. A larger variance or standard deviation means the outcomes are more dispersed and less predictable, while smaller values suggest the outcomes are more tightly clustered around the mean.
This framework allows the bookstore to anticipate average revenue and understand volatility.

Linear Combinations of Random Variables

In many real-world scenarios, a total quantity of interest is composed of multiple random variables. This total can often be represented as a linear combination of those variables. A linear combination is any sum of random variables where each variable may be multiplied by a constant.

For example, if $X_1, X_2, \dots, X_5$ represent the travel time on Monday through Friday, then the total weekly travel time $W$ is:

\[W = X_1 + X_2 + X_3 + X_4 + X_5\]

This representation helps us understand the total quantity by analyzing the components individually.

Example 3.60

Scenario:
John travels to work five days a week. Let $X_{i}$ represent his commute time on day $i$ , for $i = 1$ to $5$ .

His total weekly travel time is:

W = X_{1} + X_{2} + X_{3} + X_{4} + X_{5}

This is a linear combination of five random variables.

Expectation of a Linear Combination

A key property of expectation is linearity. That is, for any random variables $X_1, X_2, \dots, X_n$:

\[E(X_1 + X_2 + \dots + X_n) = E(X_1) + E(X_2) + \dots + E(X_n)\]

Even if the variables are dependent, this property still holds.

Example 3.61

Given:
Each daily commute has an expected value of $18$ minutes, i.e., $E (X_{i}) = 18$ .

To find:
Expected weekly commute time:

E (W) = E (X_{1} + X_{2} + X_{3}

+ X_{4} + X_{5}) = E (X_{1}) + E (X_{2}) + E (X_{3})

+ E (X_{4}) + E (X_{5})

Since each $E (X_{i}) = 18$ :

E (W) = 5 \times 18 = 90

So John is expected to spend 90 minutes commuting in total per week.

A linear combination of random variables lets us model total quantities that are sums (or scaled sums) of parts.
The expected value of a sum is always the sum of the expected values, regardless of whether the variables are dependent or independent.

Linear Combinations of Random Variables

For example, if $X_1, X_2, \dots, X_5$ represent the travel time on Monday through Friday, then the total weekly travel time $W$ is:

\[W = X_1 + X_2 + X_3 + X_4 + X_5\]

This representation helps us understand the total quantity by analyzing the components individually.

Guided Practice 3.63

Scenario:
Based on past auctions, Elena figures she should expect to make $175$ dollars on the TV and pay $23$ dollars for the toaster oven.

We let:

$X$ = revenue from the TV = $+ 175$
$Y$ = cost of the toaster oven = $- 23$

Elena's net gain is:

G = X - Y = 175 - 23 = 152

So, Elena should expect to make $152 in total.

Guided Practice 3.64

Question:
Would you be surprised if John's weekly commute wasn't exactly $90$ minutes or if Elena didn't make exactly $152$ ?

Answer:
No, we wouldn't be surprised. The expected value ( $90$ minutes for John, $152$ for Elena) represents the average over many repeated trials—not a guaranteed result for a single instance.

Due to variability (randomness), actual outcomes will often differ from expected values.

Two important concepts have been introduced:

A final quantity can often be represented as a sum of random components, i.e., a linear combination.
The expected value of a linear combination is the sum of the expected values of its components.

Linear Combinations of Random Variables and the Average Result

A linear combination of two random variables $X$ and $Y$ is:

\[aX + bY\]

where $a$ and $b$ are constants. To compute the expected value of a linear combination, use:

\[E(aX + bY) = a \cdot E(X) + b \cdot E(Y)\]

This rule holds regardless of whether $X$ and $Y$ are dependent or independent.

Example 3.65

Scenario:
Leonard has invested:

$6000$ in Caterpillar Inc (CAT)
$2000$ in Exxon Mobil Corp (XOM)

Let:

$X$ = monthly return (as a decimal) of CAT
$Y$ = monthly return of XOM

Then Leonard's portfolio change is:

Portfolio change = 6000 \cdot X + 2000 \cdot Y

A positive value indicates a gain, and a negative value indicates a loss.

Guided Practice 3.66

Suppose:

$E (X) = 0.02$ (CAT rises 2.0%)
$E (Y) = 0.002$ (XOM rises 0.2%)

Then the expected change in Leonard's portfolio is:

E (6000 \cdot X + 2000 \cdot Y) = 6000 \cdot E (X) + 2000 \cdot E (Y)

= 6000 \cdot 0.02 + 2000 \cdot 0.002 = 120 + 4 = 124

Answer:
Leonard should expect to make $124$ dollars next month from his investments.

Variability in Linear Combinations of Random Variables

Quantifying the average outcome from a linear combination of random variables is helpful, but it is also important to understand the uncertainty or variability of that outcome. In the case of Leonard’s stock portfolio (from Guided Practice 3.66), we calculated an expected monthly gain of $124. However, this gain is not guaranteed.

Figure 3.22 illustrates the monthly changes in a portfolio like Leonard’s over a three-year period. The gains and losses vary considerably, underscoring the need to quantify volatility.

To do this, we use variance and standard deviation, just as we have in earlier sections. The variances for the monthly returns of the stocks in Leonard’s portfolio are shown in Figure 3.23. We assume the stock returns are independent.

To compute the variance of a linear combination of independent random variables, we use the following rule:

\[\text{Var}(aX + bY) = a^2 \cdot \text{Var}(X) + b^2 \cdot \text{Var}(Y)\]

This rule assumes that $X$ and $Y$ are independent. If they are not, then the formula would require an additional covariance term, which is beyond the scope of this course.

Example: Leonard’s Portfolio

Leonard’s portfolio return is:

\[6000 \cdot X + 2000 \cdot Y\]

Using the formula:

\[\text{Var}(6000X + 2000Y) = 6000^2 \cdot \text{Var}(X) + 2000^2 \cdot \text{Var}(Y)\]

From Figure 3.23:

$\text{Var}(X) = 0.0057$ for CAT
$\text{Var}(Y) = 0.0021$ for XOM

Plugging in:

\[= 36,\!000,\!000 \cdot 0.0057 + 4,\!000,\!000 \cdot 0.0021\] \[= 205,\!200 + 8,\!400 = 213,\!600\]

Then:

\[\text{SD} = \sqrt{213,\!600} \approx 463\]

So, while Leonard expects to gain $124 each month, the standard deviation is $463, indicating a high level of volatility.

Summary Statistics from Historical Stock Data

Stock	Mean ($\bar{x}$)	SD ($s$)	Variance ($s^2$)
CAT	0.0204	0.0757	0.0057
XOM	0.0025	0.0455	0.0021

Variability of Linear Combinations (Summary)

For independent random variables $X$ and $Y$, the variance of a linear combination $aX + bY$ is:

\[\text{Var}(aX + bY) = a^2 \cdot \text{Var}(X) + b^2 \cdot \text{Var}(Y)\]

The standard deviation is the square root of the variance:

\[\text{SD}(aX + bY) = \sqrt{\text{Var}(aX + bY)}\]

Negative coefficients do not affect the variance, since they are squared.

Example 3.68: John's Weekly Commute

Suppose John's daily commute has a standard deviation of 4 minutes. His total commute time for the week is:

W = X_{1} + X_{2} + X_{3} + X_{4} + X_{5}

Each $X_{i}$ has:

$SD (X_{i}) = 4$
$Var (X_{i}) = 4 2^{= 16}$

Assuming independence, the total variance is:

Var (W) = 5 \cdot 16 = 80

So:

SD (W) = \sqrt{80} \approx 8.94 minutes

Guided Practice 3.69

The computation in Example 3.68 assumes that daily commute times are independent. Do you think this is reasonable?

Possibly not. If there's heavy traffic on Monday, it might also affect Tuesday's commute due to road conditions, or John might leave at a different time on subsequent days. In reality, daily commute times might be slightly correlated.

Guided Practice 3.70

From Guided Practice 3.62, Elena has:

$X$ = TV revenue, SD = $25
$Y$ = toaster oven cost, SD = $8
Net gain = $X - Y$

We compute:

$Var (X) = 25^{2} = 625$
$Var (Y) = 8^{2} = 64$

Then:

Var (X - Y) = 1^{2} \cdot 625 + {- 1}^{2} \cdot 64 = 625 + 64 = 689

So:

SD (X - Y) = \sqrt{689} \approx 26.26

Even though $Y$ has a negative coefficient, it does not affect the variance due to squaring.

Continuous Distributions

Example 3.72

What proportion of the sample is between 180 cm and 185 cm tall (about 5'11" to 6'1")?

We can add up the counts of the two histogram bins that fall in this range and divide by the total sample size. The two bins in this region have:

195,307 people in the first bin
156,239 people in the second bin

Then:

\frac{195307 + 156239}{3000000} = 0.1172

This means that approximately 11.72% of the sample falls between 180 cm and 185 cm in height. This proportion is also equal to the area under the histogram in that range.

We computed the proportion of individuals with heights 180 to 185 cm in Example 3.72 as a fraction:

\[\frac{\text{number of people between 180 and 185}}{\text{total sample size}}\]

We found the number of people with heights between 180 and 185 cm by determining the fraction of the histogram’s area in this region. Similarly, we can use the area in the shaded region under the curve to find a probability (with the help of a computer):

\[P(\text{height between } 180 \text{ and } 185) = \text{area between 180 and 185} = 0.1157\]

The probability that a randomly selected person is between 180 and 185 cm is $0.1157$. This is very close to the estimate from Example 3.72: $0.1172$.

Guided Practice 3.73

Three US adults are randomly selected. The probability a single adult is between 180 and 185 cm is 0.1157.

(a) What is the probability that all three are between 180 and 185 cm tall?

$$P(\text{all three}) = 0.1157^3 \approx 0.0015$$

(b) What is the probability that none are between 180 and 185 cm?

$$P(\text{none}) = (1 - 0.1157)^3 \approx 0.6966$$

Guided Practice 3.75

Suppose a person's height is rounded to the nearest centimeter. Is there a chance that a random person's measured height will be 180 cm?

Yes. When height is rounded, there is a nonzero probability that someone's recorded height is 180 cm, even though the true height may not be exactly 180 cm. Rounding introduces discrete outcomes, so the area under the curve over an interval (e.g., 179.5 to 180.5 cm) now corresponds to a positive probability.

Guided Practice: Chickenpox Probability

The National Vaccine Information Center estimates that $90 %$ of Americans have had the disease chickenpox by adulthood. What is the probability that exactly $92$ out of $100$ randomly sampled American adults had chickenpox during childhood?

(a) Using the Binomial Formula

Let $X$ be the number of adults (out of 100) who had chickenpox.
$X \sim Binomial (n = 100, p = 0.9)$ We want $P (X = 92)$ :

$$ P(X = 92) = \binom{100}{92} (0.9)^{92} (0.1)^8 \approx 0.1113 $$

(b) Using Python Code

from scipy.stats import binom
prob = binom.pmf(92, n=100, p=0.9)
print(round(prob, 4))  # Output: 0.1113

P (X = 92) \approx 0.1113

Since $n = 100$ and $p = 0.9$ , we approximate using a normal distribution:
Mean: $μ = np = 90$
Standard deviation: $σ = \sqrt{100 \cdot 0.9 \cdot 0.1} = 3$

When using the normal approximation to estimate binomial probabilities, we apply a continuity correction. This adjustment is necessary because the binomial distribution is discrete (only whole number outcomes are possible), while the normal distribution is continuous (all real numbers are possible).

For example, if we want to approximate: $P (X = 92)$ using the normal distribution, we can't calculate the probability at a single point. Instead, we estimate the area under the normal curve over an interval that surrounds 92. This is done by adjusting both the lower and upper bounds by 0.5:

Apply continuity correction: $P (91.5 < X < 92.5)$

$$ Z_1 = \frac{91.5 - 90}{3} = 0.5, \quad Z_2 = \frac{92.5 - 90}{3} = 0.8333 $$

$$ \begin{align*} P(91.5 < X < 92.5) &= P(0.5 < Z < 0.8333) \\ &\approx \Phi(0.8333) - \Phi(0.5) \\ &\approx 0.7977 - 0.6915 = 0.1062 \end{align*} $$

Approximate probability: $0.1062$

Summary: Bayesian Inference Game Using Dice

This video introduces Bayesian inference through a simple game involving two dice: a six-sided die and a twelve-sided die. One die is hidden in each hand, and the viewer’s objective is to determine which hand holds the “good” die (the 12-sided one, which has a higher chance of rolling a number ≥ 4).

Game Setup

The viewer selects a hand.
The instructor rolls the die in that hand and reports whether the outcome is ≥ 4.
The exact number is not revealed to prevent certainty about which die is in which hand.
The viewer may collect more data by asking for another roll, but each roll is “costly,” simulating real-world data collection constraints.

Initial Probabilities

Before any data is collected, the viewer assumes a 50% prior probability that the good die is in either hand. This prior may be updated based on observed outcomes.

Probability of Rolling ≥ 4

Six-sided die: $P(\geq 4) = \frac{3}{6} = 0.5$
Twelve-sided die: $P(\geq 4) = \frac{9}{12} = 0.75$

Updating Beliefs with Bayes’ Theorem

After a roll showing a value ≥ 4 from the right hand, the viewer should revise their belief that the good die is on the right. Using Bayes’ Theorem:

Prior: $P(\text{Good die on right}) = 0.5$
Likelihood: $P(\text{≥ 4} \mid \text{Good die}) = 0.75$
$P(\text{≥ 4} \mid \text{Bad die}) = 0.5$
Posterior:
$P(\text{Good die on right} \mid \text{≥ 4 observed}) = \frac{0.375}{0.375 + 0.25} = 0.6$

Posterior Probability

This 60% becomes the new prior if another round of data collection occurs. With each new data point, the prior is updated to the most recent posterior, illustrating the Bayesian process of iterative learning.

Bayesian vs Frequentist Thinking

Unlike frequentist inference, which uses p-values to evaluate the likelihood of data under the null hypothesis, the Bayesian approach focuses on the probability of a hypothesis being true given the observed data (posterior probability).

Key Takeaways

Bayesian inference allows the integration of prior beliefs with new data.
Posterior probabilities offer intuitive decision-making tools.
While priors matter, their influence diminishes with more data.
Bayesian thinking supports real-world constraints by balancing data costs with decision certainty.