Experts in LLMs

[Knowledge check] Standard distributions

Q1. Variance of the Bernoulli Distribution

1. Expected Value $E[X]$

The expected value (mean) is defined as: \(E[X] = \sum_{i=1}^{k} x_i \cdot p_i\), it represents the long-run average value of $X$ if the experiment is repeated many times. Here,

  • $x_i$ are the possible values of $X$.
  • $p_i = P(X = x_i)$ is the probability of each value occurring.
  • The probabilities must sum to 1: $p_1 + p_2 + \dots + p_k = 1$.

Consider a random variable $X$ representing the result of rolling a fair six-sided die. Each face of the die is equally likely, so the probability of each outcome is:

\[P(X = x_i) = \frac{1}{6}, \quad \text{for } i = 1, 2, \dots, 6.\]

Using the expectation formula:

\[E[X] = \frac{1 + 2 + 3 + 4 + 5 + 6}{6} = 3.5\]

2. Expected Value of $X$ for Bernoulli distribution

In a Bernoulli distribution a random variable $X$ takes values:

\[X = \begin{cases} 1, & \text{with probability } p \\ 0, & \text{with probability } 1 - p\end{cases}\]

So, the expectation is:

\[E[X] = (1 \cdot p) + (0 \cdot (1 - p)) = p\]

3. Expected Value of $X^2$ for Bernoulli distribution

  1. Compute $X^2$:
    • $X = 1 \implies X^2 = 1$
    • $X = 0 \implies X^2 = 0$
    • Thus, $X^2 = X$.
  2. Apply expectation formula: \(E[X^2] = (1^2 \cdot p) + (0^2 \cdot (1 - p)) = p\)

  3. Result:
    \(E[X^2] = p\)

This result makes sense because, for a Bernoulli variable, squaring does not change the value, meaning $E[X^2]$ is the same as $E[X]$.

4. Variance Formula

Variance is defined as:

\[\text{Var}(X) = E[X^2] - (E[X])^2\]

Substituting the values:

\[\text{Var}(X) = p - p^2 = p(1 - p)\]

4. Interpretation

  • The variance is highest when $p = 0.5$, meaning maximum uncertainty.
  • The variance is $0$ when $p = 0$ or $p = 1$, meaning no uncertainty (constant outcomes).
  • The formula $p(1 - p)$ represents the spread of probability between success $(1)$ and failure $(0)$.

Thus, the variance of a Bernoulli distribution is:

\[\text{Var}(X) = p(1 - p)\]

References:

  1. https://en.wikipedia.org/wiki/Expected_value

Probability Density Function (PDF) of a Univariate Gaussian Distribution

The probability density function (PDF) of a univariate Gaussian (normal) distribution with mean $\mu$ and variance $\sigma^2$ is given by:

\[f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)\]

Explanation:

  • $x$: The random variable.
  • $\mu$: The mean (expected value) of the distribution, representing its center.
  • $\sigma^2$: The variance, measuring the spread of the distribution.
  • $\sigma = \sqrt{\sigma^2}$: The standard deviation.
  • $\exp(-z)$: The exponential function, ensuring values remain positive.

Intuition:

  1. Normalization Factor $\frac{1}{\sqrt{2\pi\sigma^2}}$:
    • Ensures the total area under the curve is 1, making it a valid probability distribution.
  2. Exponential Term $\exp(-\frac{(x - \mu)^2}{2\sigma^2})$:
    • Determines how likely $x$ is based on its distance from $\mu$.
    • Values closer to $\mu$ have higher probability.
    • Larger variance ($\sigma^2$) results in a wider distribution.

Key Properties:

  • Bell-shaped curve: Symmetric around $\mu$.
  • 68-95-99.7 Rule: About 68% of values fall within $\mu \pm \sigma$, 95% within $\mu \pm 2\sigma$, and 99.7% within $\mu \pm 3\sigma$.
  • Peaks at $x = \mu$ and decreases as $x$ moves away from $\mu$.

This function models many natural phenomena like heights, test scores, and measurement errors.


Effect of Adding a Constant on Variance

For any random variable $X$ and constant $a$:

\[\text{Var}(X + a) = \text{Var}(X)\]

Key Insight:

  • Variance is unaffected by additive constants because it measures spread (deviation from the mean), not location.
  • Shifting all values by $a$ shifts the mean by $a$, but the deviations $(X + a) - E[X + a] = X - E[X]$ remain unchanged.

Proof:

\(\begin{align*} \text{Var}(X + a) &= E\left[ \left( (X + a) - E[X + a] \right)^2 \right] \\ &= E\left[ \left( X + a - E[X] - a \right)^2 \right] \\ &= E\left[ \left( X - E[X] \right)^2 \right] \\ &= \text{Var}(X) \end{align*}\)

Intuition:

Adding $a$ moves the entire distribution left/right without changing its shape or variability.

Effect of Multiplying by a Constant on Variance

For a random variable $X$ and constant $a$:

\[\text{Var}(aX) = a^2 \text{Var}(X)\]

Derivation:

  1. Definition of Variance: \(\text{Var}(aX) = E\left[ \left( aX - E[aX] \right)^2 \right]\)

  2. Linearity of Expectation ($E[aX] = aE[X]$): \(\text{Var}(aX) = E\left[ \left( aX - aE[X] \right)^2 \right] = E\left[ a^2(X - E[X])^2 \right]\)

  3. Factor out $a^2$ (constants move outside expectations): \(\text{Var}(aX) = a^2 E\left[ (X - E[X])^2 \right] = a^2 \text{Var}(X)\)

Intuition:

  • Multiplying by $a$ scales both the values and their deviations from the mean by $a$.
  • Squaring these deviations introduces the $a^2$ term.

Example:

If $\text{Var}(X) = 4$ and $a = 3$: \(\text{Var}(3X) = 9 \times 4 = 36\)

Key Property:

Variance is not linear—it scales quadratically with multiplicative constants.

That sentence that goes before giving my email to strangers: psymbio@gmail.com