Chapter 2 - Probability Univariate Models
https://faculty.washington.edu/fm1/394/Materials/2-3indep.pdf
2.9 Exercises
2.1 Conditional Indpendence a. Let $H \in {1, \ldots, K}$ be a discrete random variable, and let $e_1$ and $e_2$ be the observed values of two other random variables $E_1$ and $E_2$. Suppose we wish to calculate the vector
\[\vec{P}(H \mid e_1, e_2) = \big(P(H = 1 \mid e_1, e_2), \ldots, P(H = K \mid e_1, e_2)\big)\]Which of the following sets of numbers are sufficient for the calculation?
i. $P(e_1, e_2),\; P(H),\; P(e_1 \mid H),\; P(e_2 \mid H)$
ii. $P(e_1, e_2),\; P(H),\; P(e_1, e_2 \mid H)$
iii. $P(e_1 \mid H),\; P(e_2 \mid H),\; P(H)$
b. Now suppose we assume $E_1 \perp E_2 \mid H$ (i.e., $E_1$ and $E_2$ are conditionally independent given $H$). Which of the above 3 sets are sufficient now?
a) Using Bayes Theorem we have: $P(A \mid B) = \frac{P(B \mid A) P(A)}{P(B)}$, we get:
\[P(H \mid e_1, e_2) = \frac{P(e_1, e_2 \mid H) P(H)}{P(e_1, e_2)}\]therefore: only set(ii) is sufficient for calculation.
b) Under the assumption $E_1 \perp E_2 \mid H$, we have:
\[P(e_1,e_2 \mid H)=P(e_1 \mid H)\,P(e_2 \mid H).\]therefore set (i) and (ii) are sufficient
set (iii) becomes sufficient because $P(H)$ gives $P(H = k)$ and $P(e_1,e_2 \mid H)$ can be formed as a product, and the normalizer can be computed as \(P(e_1,e_2)=\sum_{k=1}^K P(e_1 \mid H=k)P(e_2 \mid H=k)P(H=k).\) (I need to do a bit more research on explaining this)
Answer: (i), (ii), and (iii) are all sufficient under $E_1 \perp E_2 \mid H$.
2.2 [Pairwise independence does not imply mutual independence] We say that two random variables are pairwise independent if \(p(X_2 \mid X_1) = p(X_2)\) and hence \(p(X_2, X_1) = p(X_1)p(X_2 \mid X_1) = p(X_1)p(X_2).\) We say that n random variables are mutually independent if \(p(X_i \mid X_S) = p(X_i) \quad \forall S \subseteq \{1,\ldots,n\}\setminus\{i\}\) and hence \(p(X_{1:n})=\prod_{i=1}^n p(X_i).\) Show that pairwise independence between all pairs of variables does not necessarily imply mutual independence. It suffices to give a counterexample.
I did a little google search - still not equipped with the knowledge to find counter examples of things - I actually asked couple LLMs to help me with the sort ideation process of this sort of thing but even starts to give you a very organic reasoning of taking three events $X, Y, Z$ (because that would be the least complex thing to do here and trust me we want anything but the complexity of too many events)
On this page: https://en.wikipedia.org/wiki/Independence_(probability_theory) scroll down to: Pairwise and mutual independence, to the diagram: Pairwise independent, but not mutually independent, events
\[P(A) = \frac{9+6+4+1}{40} = \frac{1}{2}\]\(P(A \mid B) = \frac{P(A \cap B)}{P(B)} = \frac{10/40}{20/40} = \frac{1}{2} = P(A)\) (so on and so forth of explainaing the pairwise independence)
\(P(A \mid BC) = \frac{P(A \cap B \cap C)}{P(BC)} = \frac{\frac{4}{40}}{\frac{4}{40} + \frac{1}{40}} = \frac{4}{5} \neq P(A)\) (so on and so forth of explainaing mutual independence doesn’t exist)
(If you are wondering why I have stated almost the sme thing with few things added from what was in the article and few things subtracted it was just to sort of explain the calculation to myself.)
but back to the crux of the problem - that venn diagram seems so perfectly crafted in fact how did they come up numbers to show this so beautifully…
[I am leaving at no solution but the obvious XOR one shown in the solutions of the book: https://probml.github.io/pml-book/solns-public.pdf (page 4)]
2.3 Let $X,Y,Z$ be discrete random variables. In the text we said $X \perp Y \mid Z$ iff \(p(x,y\mid z) = p(x\mid z)\,p(y\mid z)\) for all $x,y,z$ such that $p(z)>0$. Now prove the following alternative definition: $X \perp Y \mid Z$ iff there exist functions $g$ and $h$ such that \(p(x,y\mid z) = g(x,z)\,h(y,z)\) for all $x,y,z$ such that $p(z)>0$.
[need to review]
Proof:
($\Rightarrow$) If $X \perp Y \mid Z$ then by definition \(p(x,y\mid z)=p(x\mid z)\,p(y\mid z).\)
This is of the required product form with the choices
\[g(x,z)=p(x\mid z),\qquad h(y,z)=p(y\mid z).\]($\Leftarrow$) Now assume there exist functions $g(x,z)$ and $h(y,z)$ with \(p(x,y\mid z)=g(x,z)\,h(y,z)\)
for all $x,y$ and for every $z$ with $p(z)>0$.
Fix such a $z$. Sum the equation over $y$ to get the marginal $p(x\mid z)$: \(p(x\mid z)=\sum_y p(x,y\mid z)=\sum_y g(x,z)\,h(y,z)=g(x,z)\sum_y h(y,z).\)
Define \(c(z):=\sum_y h(y,z).\)
Because $p(z)>0$ the conditional distribution $p(\cdot,\cdot\mid z)$ is a valid probability distribution over $(x,y)$, hence for some $x$ there must be some $y$ with $p(x,y\mid z)>0$, which implies $c(z)>0$. Thus we can write \(g(x,z)=\frac{p(x\mid z)}{c(z)}.\)
Substitute this back into the factorization: \(p(x,y\mid z)=\frac{p(x\mid z)}{c(z)}\,h(y,z) = p(x\mid z)\,\frac{h(y,z)}{c(z)}.\)
But \(\frac{h(y,z)}{c(z)}=\frac{h(y,z)}{\sum_{y'}h(y',z)}\)
is a nonnegative function of $y$ that sums to $1$ over $y$, so it equals $p(y\mid z)$. Therefore \(p(x,y\mid z)=p(x\mid z)\,p(y\mid z)\)
for all $x,y$ and for every $z$ with $p(z)>0$, which is exactly $X\perp Y\mid Z$.
This establishes the equivalence.