Simple Probability
I’m reading OpenIntro Statistics - Fourth Edition (free to download). This is a summary of chapters 3 and 4, and covers topics like: probablity, condtional probability,
Definition of Probability
- Probability is the long-run proportion of times an outcome would occur if a random process were repeated infinitely.
- It ranges from 0 to 1 (or 0% to 100%).
Law of Large Numbers
- As the number of trials increases, the observed proportion of outcomes approaches the true probability.
- Example: With enough die rolls, the proportion of 1s will converge to 1/6.
Disjoint (Mutually Exclusive) Events
- Two events are disjoint if they cannot both occur at the same time.
- Example: Rolling a 1 and a 2 on a single die roll are disjoint.
- For disjoint events A and B: \(P(A \text{ or } B) = P(A) + P(B)\)
Addition Rule for Multiple Disjoint Events
- If outcomes A₁, A₂, …, Aₖ are all disjoint: \(P(A₁ \text{ or } A₂ \text{ or } \cdots \text{ or } Aₖ) = P(A₁) + P(A₂) + \cdots + P(Aₖ)\)
- Example: \(P(1 \text{ or } 2) = P(1) + P(2) = \frac{1}{6} + \frac{1}{6} = \frac{1}{3}\)
Complement Rule
- The probability of not A: \(P(\text{not } A) = 1 - P(A)\)
- Example: \(P(\text{not 2}) = 1 - \frac{1}{6} = \frac{5}{6}\)
Complementary vs Disjoint Events
- Complementary events are a special case of disjoint events:
They are disjoint and together cover the entire sample space. -
If A and B are complementary, then: \(P(A) + P(B) = 1\)
- All complementary events are disjoint, but not all disjoint events are complementary.
Example 1: Complementary Events
Let A = “roll a 2”, B = “not roll a 2” on a die:
- A = {2}, B = {1, 3, 4, 5, 6}
- A and B are disjoint (no common outcomes)
- A and B are complementary because they cover all outcomes.
Example 2: Disjoint but Not Complementary
Let A = “roll a 1”, B = “roll a 2”:
- A = {1}, B = {2}
- A and B are disjoint (cannot happen together)
- But they are not complementary because other outcomes (3–6) exist.
Probability of Independent Events
- For independent events A and B: \(P(A \text{ and } B) = P(A) \times P(B)\)
- Example: Probability both dice show 1: \(\frac{1}{6} \times \frac{1}{6} = \frac{1}{36}\)
General Addition Rule (For Any Two Events)
- If A and B are any events (disjoint or not): \(P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B)\)
- This avoids double-counting overlapping outcomes.
Example Using a Deck of Cards
- Probability of drawing a diamond or a face card:
- Diamonds: 13/52
- Face cards: 12/52
- Diamond face cards: 3/52 \(P(\text{diamond or face}) = \frac{13}{52} + \frac{12}{52} - \frac{3}{52} = \frac{22}{52} = \frac{11}{26}\)
Independence
- Two random processes are independent if the outcome of one does not affect the outcome of the other.
- Example: Flipping a coin and rolling a die. Knowing the coin landed on heads gives no clue about the die’s result.
- Non-example: Stock prices – they often move together, so they are not independent.
Dice Example: Independence in Action
- Consider rolling a red die and a white die.
- The probability both show a 1: \(P(\text{red} = 1 \text{ and } \text{white} = 1) = \frac{1}{6} \times \frac{1}{6} = \frac{1}{36}\)
Three Independent Dice
- Add a blue die, also independent of the others.
- Probability all three dice show a 1: \(P(\text{red} = 1 \text{ and } \text{white} = 1 \text{ and } \text{blue} = 1) = \frac{1}{6} \times \frac{1}{6} \times \frac{1}{6} = \frac{1}{216}\)
Multiplication Rule for Independent Processes
- For two independent events A and B: \(P(A \text{ and } B) = P(A) \times P(B)\)
- For multiple independent events A₁ through Aₖ: \(P(A_1 \text{ and } A_2 \text{ and } \cdots \text{ and } A_k) = P(A_1) \times P(A_2) \times \cdots \times P(A_k)\)
Guided Practice 3.22
About 9% of people are left-handed. Suppose 2 people are selected at random from the U.S. population. Because the sample size of 2 is very small relative to the population, it is reasonable to assume these two people are independent. (a) What is the probability that both are left-handed? (b) What is the probability that both are right-handed?
(a) The probability of two randomly selected people both being left-handed is calculated using the multiplication rule: 0.09 × 0.09 = 0.0081. This assumes that the handedness of the first person has no effect on the second, which is generally reasonable given a large population.
(b) The calculation for both being right-handed is 0.91 × 0.91 = 0.8281, again assuming independence, i.e, there are no people that are ambidextrous.
(d) The probability that 5 right-handed people are selected is: 0.91⁵ = 0.6240 (rounded to four decimal places), assuming independence.
(e) The probability that 5 left-handed people are selected is: 0.09⁵ = 0.0000059049 (about 0.000006), a very rare occurrence.
(f) The probability that not all of the people are right-handed is the complement of all five being right-handed: 1 − 0.91⁵ = 1 − 0.6240 = 0.3760
Guided Practice 3.24
Suppose the variables handedness and sex are independent, i.e. knowing someone's sex provides no useful information about their handedness and vice-versa. Then we can compute whether a randomly selected person is right-handed and female using the Multiplication Rule:
P(right-handed and female) = P(right-handed) × P(female) = 0.91 × 0.50 = 0.455
Three people are selected at random.
(a) What is the probability that the first person is male and right-handed?
0.50 × 0.91 = 0.455
(b) What is the probability that the first two people are male and right-handed?
(0.50 × 0.91)² = 0.455² = 0.207
(c) What is the probability that the third person is female and left-handed?
0.50 × 0.09 = 0.045
(d) What is the probability that the first two people are male and right-handed and the third
person is female and left-handed?
0.455 × 0.455 × 0.045 = 0.0093 (approximately)
Marginal, Joint and Conditional Probability
mach_learn | truth: fashion | truth: not | Total |
---|---|---|---|
pred_fashion | 197 | 22 | 219 |
pred_not | 112 | 1491 | 1603 |
Total | 309 | 1513 | 1822 |
Marginal Probability
Marginal probabilities are the totals for each row or column, representing the probability of a single event regardless of other variables.
- $P(\text{fashion}) = \frac{309}{1822} \approx 0.1695$
- $P(\text{not}) = \frac{1513}{1822} \approx 0.8305$
- $P(\text{pred_fashion}) = \frac{219}{1822} \approx 0.1202$
- $P(\text{pred_not}) = \frac{1603}{1822} \approx 0.8798$
Joint Probability
Joint probabilities represent the probability of two events occurring together, corresponding to the individual cells in the table.
- $P(\text{pred_fashion} \land \text{fashion}) = \frac{197}{1822} \approx 0.1081$
- $P(\text{pred_fashion} \land \text{not}) = \frac{22}{1822} \approx 0.0121$
- $P(\text{pred_not} \land \text{fashion}) = \frac{112}{1822} \approx 0.0615$
- $P(\text{pred_not} \land \text{not}) = \frac{1491}{1822} \approx 0.8183$
Conditional Probability
Conditional probabilities describe the probability of one event given that another event has occurred.
The conditional probability of outcome A given condition B is computed as the following:
\[P(A \mid B) = \frac{P(A \text{ and } B)}{P(B)}\]This reads as: the probability of event $A$ given $B$ is equal to the joint probability of $A$ and $B$ divided by the probability of $B$.
Examples:
- $P(\text{fashion} \, \mid \, \text{pred_fashion}) = \frac{197}{219} \approx 0.8995$
- $P(\text{not} \, \mid \, \text{pred_fashion}) = \frac{22}{219} \approx 0.1005$
- $P(\text{fashion} \, \mid \, \text{pred_not}) = \frac{112}{1603} \approx 0.0699$
- $P(\text{not} \, \mid \, \text{pred_not}) = \frac{1491}{1603} \approx 0.9301$
- $P(\text{pred_fashion} \, \mid \, \text{fashion}) = \frac{197}{309} \approx 0.6375$
- $P(\text{pred_not} \, \mid \, \text{fashion}) = \frac{112}{309} \approx 0.3625$
Marginal Probability Question
Subjective Social Class Identity | Working Class | Upper Middle Class | Total |
---|---|---|---|
Poor | 0 | 0 | 0 |
Working Class | 8 | 0 | 8 |
Middle Class | 32 | 13 | 45 |
Upper Middle Class | 8 | 37 | 45 |
Upper Class | 0 | 0 | 0 |
Total | 48 | 50 | 98 |
What is the probability that a student’s objective social class position is upper middle class?
\[P(\text{Objective = Upper Middle Class}) = \frac{50}{98} \approx 0.51\]Conditional Probability Question
What is the probability that a student who is objectively in the working class associates with upper middle class?
We are asked to find the probability that a student’s subjective identity is “upper middle class” given their objective class is “working class”:
\[P(\text{Subjective = Upper Middle Class} \mid \text{Objective = Working Class}) = \frac{8}{48} \approx 0.17\]Conditional Probability Question 2
Given:
- 14.6% of Americans live below the poverty line
→ $P(\text{Below PL}) = 0.146$ - 20.7% of Americans speak a language other than English at home
→ $P(\text{Speak non-Eng}) = 0.207$ - 4.2% fall into both categories
→ $P(\text{Below PL and Speak non-Eng}) = 0.042$
What percent of Americans who live below the poverty line also speak a language other than English at home?
We are looking for the conditional probability:
$P(\text{Speak non-Eng} \mid \text{Below PL})$
Using the formula:
\[P(\text{Speak non-Eng} \mid \text{Below PL}) = \frac{P(\text{Below PL and Speak non-Eng})}{P(\text{Below PL})}\] \[= \frac{0.042}{0.146} \approx 0.2877\]Approximately 28.8% of Americans who live below the poverty line speak a language other than English at home.
General Multiplication Rule
General Multiplication Rule for events that might not be independent states:
If $A$ and $B$ represent two outcomes or events, then:
\[P(A \text{ and } B) = P(A|B) \times P(B)\]It is useful to think of $A$ as the outcome of interest and $B$ as the condition.
Relationship to Conditional Probability
This General Multiplication Rule is simply a rearrangement of the conditional probability formula:
\[P(A|B) = \frac{P(A \text{ and } B)}{P(B)}\]Multiplying both sides by $P(B)$ gives:
\[P(A \text{ and } B) = P(A|B) \times P(B)\]Example: Smallpox Inoculation
Consider the smallpox dataset. Suppose we are given:
- $P(\text{inoculated} = \text{no}) = 0.9608$
- $P(\text{result} = \text{lived} \mid \text{inoculated} = \text{no}) = 0.8588$
We want to find the probability that a resident was not inoculated and lived:
\[P(\text{result} = \text{lived and inoculated} = \text{no})\] \[= P(\text{result} = \text{lived} \mid \text{inoculated} = \text{no}) \times P(\text{inoculated} = \text{no})\] \[= 0.8588 \times 0.9608 = 0.8251\]This shows how we can apply the General Multiplication Rule to compute joint probabilities when events are not necessarily independent.
Independence Considerations in Conditional Probability
If two events are independent, then knowing the outcome of one should provide no information about the other. This idea can be verified using conditional probability.
Guided Practice 3.38
Let and represent the outcomes of rolling two dice.
(a) What is the probability that the first die, , is 1?
⇒
(b) What is the probability that both and are 1?
⇒
(c) Use the formula for conditional probability to compute :
(d) What is
? Is this different from part (c)?
⇒
Conclusion:
No, the result is the same. This demonstrates that knowing
does not affect the probability of
,
meaning the two events are independent.
Even though the conditional probability equals , the joint probability
is much lower than either individual probability.
Conditional independence does not imply high joint probability; it simply means that knowing one event does not affect the probability of the other.
Guided Practice 3.39
Ron is watching a roulette table in a casino and notices that the last five outcomes were black. He figures that the chances of getting black six times in a row is very small (about 1/64) and puts his paycheck on red.
What is wrong with his reasoning?
Ron is falling victim to the gambler's fallacy—the mistaken belief that previous independent outcomes (such as the last five black spins) affect the probability of future outcomes. In a fair game of roulette, each spin is independent, meaning that the probability of black or red remains the same on every spin regardless of previous results. Roulette outcomes are independent events, meaning each spin is unaffected by previous results. The probability of black or red on any single spin remains approximately 18/38 (on a standard American roulette wheel), regardless of prior outcomes. Thus, the chance of red remains the same as always, and previous black spins do not increase its likelihood.
Tree Diagrams
Tree diagrams are a tool to organize outcomes and probabilities around the structure of the data. They are most useful when two or more processes occur in a sequence and each process is conditioned on its predecessors.
The smallpox data fit this description. We see the population as split by inoculation: yes and no. Following this split, survival rates were observed for each group. This structure is reflected in the tree diagram shown below. The first branch for inoculation is said to be the primary branch, while the other branches are secondary.
Tree diagrams are annotated with marginal and conditional probabilities. This tree diagram splits the smallpox data by inoculation into the yes and no groups with respective marginal probabilities 0.0392 and 0.9608. The secondary branches are conditioned on the first, so we assign conditional probabilities to these branches.
For example, the top branch is the probability that result = lived conditioned on the information that inoculated = yes. We may (and usually do) construct joint probabilities at the end of each branch in our tree by multiplying the numbers we come across as we move from left to right.
These joint probabilities are computed using the General Multiplication Rule:
\[P(\text{inoculated} = \text{yes and result} = \text{lived})\] \[= P(\text{inoculated} = \text{yes}) \times P(\text{result} = \text{lived} \mid \text{inoculated} = \text{yes})\] \[= 0.0392 \times 0.9754 = 0.0382\]0.0392 * 0.9754 = 0.03824] B1 --> C2[Died 0.0246
0.0392 * 0.0246 = 0.00096] B2 --> C3[Lived 0.8589
0.9608 * 0.8589 = 0.82523] B2 --> C4[Died 0.1411
0.9608 * 0.1411 = 0.13557]
Tree diagrams are annotated with marginal and conditional probabilities. This tree diagram splits the smallpox data by inoculation into the yes and no groups with respective marginal probabilities 0.0392 and 0.9608. The secondary branches are conditioned on the first, so we assign conditional probabilities to these branches. For example, the top branch is the probability that result = lived conditioned on the information that inoculated = yes. We may (and usually do) construct joint probabilities at the end of each branch in our tree by multiplying the numbers we come across as we move from left to right. These joint probabilities are computed using the General Multiplication Rule.
Bayes’ Theorem
In many situations, we are given a conditional probability of the form $P(A \mid B)$ but we are interested in finding the reversed conditional probability $P(B \mid A)$. Bayes’ Theorem provides a way to compute this when direct calculation is not possible or when a tree diagram becomes too cumbersome.
\[P(B \mid A) = \frac{P(A \mid B) \cdot P(B)}{P(A)}\]This theorem is especially useful when we want to “invert” a conditional probability — i.e., to switch the condition and the outcome.
Example: Smallpox Inoculation and Survival
Suppose we want to compute the probability that someone was inoculated, given that they lived — in other words, $P(\text{inoculated = yes} \mid \text{lived})$.
We already have:
- $P(\text{lived} \mid \text{inoculated = yes}) = 0.9754$
- $P(\text{inoculated = yes}) = 0.0392$
- $P(\text{lived}) = 0.03824 + 0.82523 = 0.86347$
Using Bayes’ Theorem:
\[P(\text{inoculated = yes} \mid \text{lived}) = \frac{P(\text{lived} \mid \text{inoculated = yes}) \cdot P(\text{inoculated = yes})}{P(\text{lived})}\] \[= \frac{0.9754 \cdot 0.0392}{0.86347} \approx \frac{0.03824}{0.86347} \approx 0.0443\]So, even though the probability of surviving given inoculation is very high (97.54%), the probability of having been inoculated given that the person survived is only about 4.43%. This is because the vast majority of the population was not inoculated. Bayes’ Theorem lets us work backwards to compute the reverse conditional probability.
Bayes’ Theorem Question
The American Community Survey is an ongoing survey that provides data every year to help communities plan investments and services.
The 2010 American Community Survey estimates that:
- 14.6% of Americans live below the poverty line
- 20.7% speak a language other than English at home
- 4.2% fall into both categories
What percent of Americans live below the poverty line given that they speak a language other than English at home?
Using Bayes’ Theorem:
\[P(\text{Below PL} \mid \text{Speak non-Eng}) = \frac{P(\text{Below PL} \ \& \ \text{Speak non-Eng})}{P(\text{Speak non-Eng})}\] \[= \frac{0.042}{0.207} \approx 0.2\]So, about 20% of Americans who speak a language other than English at home live below the poverty line.
Bayes’ Theorem Formula:
\[P(A \mid B) = \frac{P(A \ \& \ B)}{P(B)}\]Mutually Exclusive vs Independent Events (Dice Example)
To explain the difference between mutually exclusive and independent events using a dice example, let’s first define these concepts and then illustrate them with scenarios involving a standard six-sided die.
-
Mutually Exclusive Events: Two events are mutually exclusive if they cannot occur at the same time. The occurrence of one event means the other cannot happen in the same trial.
\[P(A \text{ and } B) = 0\] -
Independent Events: Two events are independent if the occurrence of one does not affect the probability of the other occurring.
\[P(A \text{ and } B) = P(A) \cdot P(B)\]
Example: Assume we are rolling a single fair six-sided die (with faces numbered 1 to 6) unless stated otherwise.
Mutually Exclusive Events
Example: Rolling an even number (2, 4, or 6) and rolling a 3 in a single roll of the die.
-
Event A: Roll an even number ${2, 4, 6}$
\(P(A) = \frac{3}{6} = \frac{1}{2}\) -
Event B: Roll a 3 ${3}$
\(P(B) = \frac{1}{6}\)
Explanation: These events are mutually exclusive because a single roll cannot result in both an even number and a 3 (since 3 is odd). The outcomes ${2, 4, 6}$ and ${3}$ have no overlap.
-
Probability of both occurring:
\(P(A \text{ and } B) = 0\) -
Probability of either occurring:
\(P(A \text{ or } B) = P(A) + P(B) = \frac{1}{2} + \frac{1}{6} = \frac{2}{3}\)
Independent Events
Example: Rolling an even number on the first roll and rolling a number greater than 4 (i.e., 5 or 6) on a second roll of the same die.
-
Event A: Roll an even number on the first roll ${2, 4, 6}$
\(P(A) = \frac{3}{6} = \frac{1}{2}\) -
Event B: Roll a number greater than 4 on the second roll ${5, 6}$
\(P(B) = \frac{2}{6} = \frac{1}{3}\)
Explanation: These events are independent because the outcome of the first roll does not affect the outcome of the second roll. The die has no memory.
-
Probability of both occurring:
\(P(A \text{ and } B) = P(A) \cdot P(B) = \frac{1}{2} \cdot \frac{1}{3} = \frac{1}{6}\) -
Probability of either occurring:
\(\begin{align*} P(A \text{ or } B) &= P(A) + P(B) - P(A \text{ and } B) \\ &= \frac{1}{2} + \frac{1}{3} - \frac{1}{6} \\ &= \frac{3}{6} + \frac{2}{6} - \frac{1}{6} = \frac{4}{6} = \frac{2}{3} \end{align*}\) - Mutually Exclusive: Cannot happen together (e.g., rolling a 3 and an even number in one roll is impossible).
- Independent: Can happen together, and one event doesn’t influence the other (e.g., first roll even, second roll > 4).
Sampling from a Small Population
When sampling from a population, we typically assume the population is large, and sampling one observation doesn’t meaningfully affect the rest. However, when sampling without replacement from a small population, or when we sample more than 10% of the population, probabilities shift because each draw slightly alters the composition of the remaining population.
Example 3.47
A professor selects a student at random to answer a question. There are 15 students in class.
Probability that you are selected:
Example 3.48
The professor asks 3 questions and does not repeat students. What's the probability you're not picked for any of them?
Let's compute the probability of not being picked for each question, step by step:
First question:
Second question (given you weren't picked before):
Third question (given you weren't picked in first two):
To get the overall probability that you are not picked in all three questions, multiply:
What Rule Justified the Multiplication?
This is an application of the General Multiplication Rule, which says:
The probability of multiple events all occurring is the product of each event’s probability, conditioned on the events that came before it.
Mathematically, for three events $A$, $B$, and $C$:
\[P(A \text{ and } B \text{ and } C) = P(A) \cdot P(B \mid A) \cdot P(C \mid A \text{ and } B)\]In Example 3.48:
- $A = Q_1 = \text{not picked}$
- $B = Q_2 = \text{not picked}$
- $C = Q_3 = \text{not picked}$
So:
\[P(\text{not picked all 3 times}) = P(A) \cdot P(B \mid A) \cdot P(C \mid A \text{ and } B)\]This rule allows us to break down complex, dependent events into smaller conditional steps that we can compute.
Random Variables
A random variable is a numerical outcome of a random process. We typically represent it using capital letters like $X$, $Y$, or $Z$. For instance, in a statistics class, the amount of money a student spends on books is a random variable $X$.
Example 3.54: Bookstore Book Sales
Let's say:
- 20% of students buy no books: $0
- 55% buy only the textbook: $137
- 25% buy both books: $170
These percentages remain stable across terms. If there are 100 students, the expected number of books sold is:
- 20 students buy 0 books → 0 books
- 55 students buy 1 book → 55 books
- 25 students buy 2 books → 50 books
Total expected books sold:
Expectation
Let $X$ be the amount spent by a single student. The values it can take are:
- $x_1 = 0$, with $P(X = 0) = 0.20$
- $x_2 = 137$, with $P(X = 137) = 0.55$
- $x_3 = 170$, with $P(X = 170) = 0.25$
To compute the expected value, $E(X)$:
\[E(X) = \sum x_i \cdot P(X = x_i) = 0 \cdot 0.20 + 137 \cdot 0.55 + 170 \cdot 0.25 = 0 + 75.35 + 42.5 = 117.85\]So the expected revenue from one student is:
\[\mu = E(X) = 117.85\]Variability in Random Variables
To measure variability, we use variance and standard deviation. These indicate how much outcomes typically differ from the mean.
Let $X$ have outcomes $x_1, …, x_k$ with probabilities $P(X = x_1), …, P(X = x_k)$, and mean $\mu$.
The variance of $X$ is:
\[\sigma^2 = \sum_{j=1}^{k} (x_j - \mu)^2 \cdot P(X = x_j)\]The standard deviation is:
\[\sigma = \sqrt{\sigma^2}\]Example 3.58: Compute Variance and Standard Deviation
We already have:
Now build a table:
i | ||||||
---|---|---|---|---|---|---|
1 | 0 | 0.20 | 0 | -117.85 | 13888.62 | 2777.72 |
2 | 137 | 0.55 | 75.35 | 19.15 | 366.72 | 201.70 |
3 | 170 | 0.25 | 42.50 | 52.15 | 2719.62 | 679.91 |
Total | 117.85 | Variance = 3659.33 |
So:
Variance:
Standard Deviation:
- Expected Value $E(X)$ gives the average expected outcome.
- Variance $\sigma^2$ and Standard Deviation $\sigma$ measure the spread of outcomes around the mean, indicating how much the values of a random variable deviate from the expected value on average. A larger variance or standard deviation means the outcomes are more dispersed and less predictable, while smaller values suggest the outcomes are more tightly clustered around the mean.
- This framework allows the bookstore to anticipate average revenue and understand volatility.
Linear Combinations of Random Variables
In many real-world scenarios, a total quantity of interest is composed of multiple random variables. This total can often be represented as a linear combination of those variables. A linear combination is any sum of random variables where each variable may be multiplied by a constant.
For example, if $X_1, X_2, \dots, X_5$ represent the travel time on Monday through Friday, then the total weekly travel time $W$ is:
\[W = X_1 + X_2 + X_3 + X_4 + X_5\]This representation helps us understand the total quantity by analyzing the components individually.
Example 3.60
Scenario:
John travels to work five days a week. Let
represent his commute time on day
, for
to
.
His total weekly travel time is:
This is a linear combination of five random variables.
Expectation of a Linear Combination
A key property of expectation is linearity. That is, for any random variables $X_1, X_2, \dots, X_n$:
\[E(X_1 + X_2 + \dots + X_n) = E(X_1) + E(X_2) + \dots + E(X_n)\]Even if the variables are dependent, this property still holds.
Example 3.61
Given:
Each daily commute has an expected value of
minutes, i.e.,
.
To find:
Expected weekly commute time:
Since each :
So John is expected to spend 90 minutes commuting in total per week.
- A linear combination of random variables lets us model total quantities that are sums (or scaled sums) of parts.
- The expected value of a sum is always the sum of the expected values, regardless of whether the variables are dependent or independent.
Linear Combinations of Random Variables
In many real-world scenarios, a total quantity of interest is composed of multiple random variables. This total can often be represented as a linear combination of those variables. A linear combination is any sum of random variables where each variable may be multiplied by a constant.
For example, if $X_1, X_2, \dots, X_5$ represent the travel time on Monday through Friday, then the total weekly travel time $W$ is:
\[W = X_1 + X_2 + X_3 + X_4 + X_5\]This representation helps us understand the total quantity by analyzing the components individually.
Guided Practice 3.63
Scenario:
Based on past auctions, Elena figures she should expect to make
dollars on the TV and pay
dollars for the toaster oven.
We let:
- = revenue from the TV =
- = cost of the toaster oven =
Elena's net gain is:
So, Elena should expect to make $152 in total.
Guided Practice 3.64
Question:
Would you be surprised if John's weekly commute wasn't exactly
minutes or if Elena didn't make exactly
?
Answer:
No, we wouldn't be surprised.
The expected value
( minutes for John,
for Elena)
represents the average over many repeated trials—not a guaranteed result for a single instance.
Due to variability (randomness), actual outcomes will often differ from expected values.
Two important concepts have been introduced:
- A final quantity can often be represented as a sum of random components, i.e., a linear combination.
- The expected value of a linear combination is the sum of the expected values of its components.
Linear Combinations of Random Variables and the Average Result
A linear combination of two random variables $X$ and $Y$ is:
\[aX + bY\]where $a$ and $b$ are constants. To compute the expected value of a linear combination, use:
\[E(aX + bY) = a \cdot E(X) + b \cdot E(Y)\]This rule holds regardless of whether $X$ and $Y$ are dependent or independent.
Example 3.65
Scenario:
Leonard has invested:
- in Caterpillar Inc (CAT)
- in Exxon Mobil Corp (XOM)
Let:
- = monthly return (as a decimal) of CAT
- = monthly return of XOM
Then Leonard's portfolio change is:
A positive value indicates a gain, and a negative value indicates a loss.
Guided Practice 3.66
Suppose:
- (CAT rises 2.0%)
- (XOM rises 0.2%)
Then the expected change in Leonard's portfolio is:
Answer:
Leonard should expect to make
dollars next month from his investments.
Variability in Linear Combinations of Random Variables
Quantifying the average outcome from a linear combination of random variables is helpful, but it is also important to understand the uncertainty or variability of that outcome. In the case of Leonard’s stock portfolio (from Guided Practice 3.66), we calculated an expected monthly gain of $124. However, this gain is not guaranteed.
Figure 3.22 illustrates the monthly changes in a portfolio like Leonard’s over a three-year period. The gains and losses vary considerably, underscoring the need to quantify volatility.
To do this, we use variance and standard deviation, just as we have in earlier sections. The variances for the monthly returns of the stocks in Leonard’s portfolio are shown in Figure 3.23. We assume the stock returns are independent.
To compute the variance of a linear combination of independent random variables, we use the following rule:
\[\text{Var}(aX + bY) = a^2 \cdot \text{Var}(X) + b^2 \cdot \text{Var}(Y)\]This rule assumes that $X$ and $Y$ are independent. If they are not, then the formula would require an additional covariance term, which is beyond the scope of this course.
Example: Leonard’s Portfolio
Leonard’s portfolio return is:
\[6000 \cdot X + 2000 \cdot Y\]Using the formula:
\[\text{Var}(6000X + 2000Y) = 6000^2 \cdot \text{Var}(X) + 2000^2 \cdot \text{Var}(Y)\]From Figure 3.23:
- $\text{Var}(X) = 0.0057$ for CAT
- $\text{Var}(Y) = 0.0021$ for XOM
Plugging in:
\[= 36,\!000,\!000 \cdot 0.0057 + 4,\!000,\!000 \cdot 0.0021\] \[= 205,\!200 + 8,\!400 = 213,\!600\]Then:
\[\text{SD} = \sqrt{213,\!600} \approx 463\]So, while Leonard expects to gain $124 each month, the standard deviation is $463, indicating a high level of volatility.
Summary Statistics from Historical Stock Data
Stock | Mean ($\bar{x}$) | SD ($s$) | Variance ($s^2$) |
---|---|---|---|
CAT | 0.0204 | 0.0757 | 0.0057 |
XOM | 0.0025 | 0.0455 | 0.0021 |
Variability of Linear Combinations (Summary)
For independent random variables $X$ and $Y$, the variance of a linear combination $aX + bY$ is:
\[\text{Var}(aX + bY) = a^2 \cdot \text{Var}(X) + b^2 \cdot \text{Var}(Y)\]The standard deviation is the square root of the variance:
\[\text{SD}(aX + bY) = \sqrt{\text{Var}(aX + bY)}\]Negative coefficients do not affect the variance, since they are squared.
Example 3.68: John's Weekly Commute
Suppose John's daily commute has a standard deviation of 4 minutes. His total commute time for the week is:
Each has:
Assuming independence, the total variance is:
So:
Guided Practice 3.69
The computation in Example 3.68 assumes that daily commute times are independent. Do you think this is reasonable?
Possibly not. If there's heavy traffic on Monday, it might also affect Tuesday's commute due to road conditions, or John might leave at a different time on subsequent days. In reality, daily commute times might be slightly correlated.
Guided Practice 3.70
From Guided Practice 3.62, Elena has:
- = TV revenue, SD = $25
- = toaster oven cost, SD = $8
- Net gain =
We compute:
Then:
So:
Even though has a negative coefficient, it does not affect the variance due to squaring.
Continuous Distributions
Example 3.72
What proportion of the sample is between 180 cm and 185 cm tall (about 5'11" to 6'1")?
We can add up the counts of the two histogram bins that fall in this range and divide by the total sample size. The two bins in this region have:
- 195,307 people in the first bin
- 156,239 people in the second bin
Then:
This means that approximately 11.72% of the sample falls between 180 cm and 185 cm in height. This proportion is also equal to the area under the histogram in that range.
We computed the proportion of individuals with heights 180 to 185 cm in Example 3.72 as a fraction:
\[\frac{\text{number of people between 180 and 185}}{\text{total sample size}}\]We found the number of people with heights between 180 and 185 cm by determining the fraction of the histogram’s area in this region. Similarly, we can use the area in the shaded region under the curve to find a probability (with the help of a computer):
\[P(\text{height between } 180 \text{ and } 185) = \text{area between 180 and 185} = 0.1157\]The probability that a randomly selected person is between 180 and 185 cm is $0.1157$. This is very close to the estimate from Example 3.72: $0.1172$.
Guided Practice 3.73
Three US adults are randomly selected. The probability a single adult is between 180 and 185 cm is 0.1157.
- (a) What is the probability that all three are between 180 and 185 cm tall?
- (b) What is the probability that none are between 180 and 185 cm?
$$P(\text{all three}) = 0.1157^3 \approx 0.0015$$
$$P(\text{none}) = (1 - 0.1157)^3 \approx 0.6966$$
Guided Practice 3.75
Suppose a person's height is rounded to the nearest centimeter. Is there a chance that a random person's measured height will be 180 cm?
Yes. When height is rounded, there is a nonzero probability that someone's recorded height is 180 cm, even though the true height may not be exactly 180 cm. Rounding introduces discrete outcomes, so the area under the curve over an interval (e.g., 179.5 to 180.5 cm) now corresponds to a positive probability.
Guided Practice: Chickenpox Probability
The National Vaccine Information Center estimates that of Americans have had the disease chickenpox by adulthood. What is the probability that exactly out of randomly sampled American adults had chickenpox during childhood?
- (a) Using the Binomial Formula
- (b) Using Python Code
Let
be the number of adults (out of 100) who had chickenpox.
We want
:
$$ P(X = 92) = \binom{100}{92} (0.9)^{92} (0.1)^8 \approx 0.1113 $$
from scipy.stats import binom
prob = binom.pmf(92, n=100, p=0.9)
print(round(prob, 4)) # Output: 0.1113
Result:
Since
and
, we approximate using a normal distribution:
Mean:
Standard deviation:
When using the normal approximation to estimate binomial probabilities, we apply a continuity correction. This adjustment is necessary because the binomial distribution is discrete (only whole number outcomes are possible), while the normal distribution is continuous (all real numbers are possible).
For example, if we want to approximate: using the normal distribution, we can't calculate the probability at a single point. Instead, we estimate the area under the normal curve over an interval that surrounds 92. This is done by adjusting both the lower and upper bounds by 0.5:
Apply continuity correction:
$$ Z_1 = \frac{91.5 - 90}{3} = 0.5, \quad Z_2 = \frac{92.5 - 90}{3} = 0.8333 $$
$$ \begin{align*} P(91.5 < X < 92.5) &= P(0.5 < Z < 0.8333) \\ &\approx \Phi(0.8333) - \Phi(0.5) \\ &\approx 0.7977 - 0.6915 = 0.1062 \end{align*} $$
Approximate probability: