This article is available in: 日本語
Introduction.
Last time, we studied how to handle univariate and multivariate data.
In this article, we will discuss events, probabilities, and random variables. The descriptions are mathematical and abstract, but what they say is very common sense, so it might be a good idea to take it without thinking too hard.
Event.
As an example, consider a situation where you throw a dice once. The possible outcomes of the dice are 1, 2, 3, 4, 5, or 6. At this time,
- A possible result is called a sample point.
- A set of sample points $\Omega = \{ 1, 2, 3, 4, 5, 6 \}$ is called a sample space.
- A subset of the sample space is called an event.
\begin{align*} \newcommand{\mat}[1]{\begin{pmatrix} #1 \end{pmatrix}} \newcommand{\f}[2]{\frac{#1}{#2}} \newcommand{\pd}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\d}[2]{\frac{{\rm d}#1}{{\rm d}#2}} \newcommand{\T}{\mathsf{T}} \newcommand{\(}{\left(} \newcommand{\)}{\right)} \newcommand{\{}{\left\{} \newcommand{\}}{\right\}} \newcommand{\[}{\left[} \newcommand{\]}{\right]} \newcommand{\dis}{\displaystyle} \newcommand{\eq}[1]{{\rm Eq}(\ref{#1})} \newcommand{\n}{\notag\\} \newcommand{\t}{\ \ \ \ } \newcommand{\tt}{\t\t\t\t} \newcommand{\argmax}{\mathop{\rm arg\, max}\limits} \newcommand{\argmin}{\mathop{\rm arg\, min}\limits} \def\l<#1>{\left\langle #1 \right\rangle} \def\us#1_#2{\underset{#2}{#1}} \def\os#1^#2{\overset{#2}{#1}} \newcommand{\case}[1]{\{ \begin{array}{ll} #1 \end{array} \right.} \newcommand{\s}[1]{{\scriptstyle #1}} \definecolor{myblack}{rgb}{0.27,0.27,0.27} \definecolor{myred}{rgb}{0.78,0.24,0.18} \definecolor{myblue}{rgb}{0.0,0.443,0.737} \definecolor{myyellow}{rgb}{1.0,0.82,0.165} \definecolor{mygreen}{rgb}{0.24,0.47,0.44} \newcommand{\c}[2]{\textcolor{#1}{#2}} \newcommand{\ub}[2]{\underbrace{#1}_{#2}} \end{align*}
Probabilities and random variables.
The probability of the occurrence of the event $A$ is expressed as $P(A)$. In this case, $P(A)$ has the following properties to satisfy.
A random variable is a variable that takes various values with a fixed probability. For example, the outcome of the dice is a random variable. If the dice are not distorted, the odds of getting any of them are the same: $1/6$. This can be expressed as follows, where $X$ is the outcome of the dice.
\begin{align*} P(X = x) = \f{1}{6}, \t x=1, 2, \dots, 6 \end{align*}
A random variable that takes on a variety of values, such as a dice, is called a discrete random variable.
On the other hand, if the value of a random variable is expected to change continuously, such as height or weight, it is called a continuous random variable.
Conditional probability and Bayes’ theorem.
The probability of the occurrence of the event $A$ under the condition that the event $B$ has occurred is called the conditional probability of $A$ in $B$, and is expressed as follows.
\begin{align*} P(A|B) = \f{P(A \cap B)}{P(B)} \end{align*}
Here, when the conditional probability of $A$ in $B$ is not affected by $B$, that is, when the following equation holds, the events $A, B$ are said to be independent.
\begin{align*} P(A|B) = \f{P(A \cap B)}{P(B)} = P(A) \end{align*}
The fact that $A, B$ are independent is also expressed as the following expression by rewriting the above expression.
\begin{align*} P(A \cap B) = P(A)P(B) \end{align*}
In the above conditional probability, if we swap the order of $A$ and $B$, we multiply $P(A \cap B) = P(A|B)P(B)$, $P(A \cap B) = P(B|A)P(A)$, respectively.
Also, Bayes’ theorem is extended in the following form.
The transformation in the last row uses the formula for all probabilities.
When conditioned by multiple random variables, the Bayesian formula can be written as follows.
The proof is as follows.
Proof.
Given the codistribution $P(X, Y, Z)$, from the definition of conditional probability,
\begin{align*} P(X, Y, Z) &= P(X|Y, Z)P(Y, Z) \n &=P(X|Y, Z)P(Y|Z)P(Z) \end{align*}
On the other hand, \begin{align*} P(X, Y, Z) &= P(Y|X, Z)P(X, Z) \n &=P(Y|X, Z)P(X|Z)P(Z) \end{align*}
so compare the two,
\begin{align*} P(X|Y, Z)P(Y|Z)P(Z) &= P(Y|X, Z)P(X|Z)P(Z) \n \therefore P(X | Y, Z) &= \f{P(Y|X, Z) P(X|Z)}{P(Y|Z)} \end{align*}
Expected value and variance of the probability distribution.
For random variables, it is often referred to as an expected value, not an average. The definitions of expectation and variance are somewhat different for discrete and continuous random variables.
The following is the definition of the random variable $X$: Expected value: $E[X]$, Variance: $V[X]$. Note that $\mu$ is the expected value.
Discrete type.
\begin{align*} E[X] &= \sum_{x} x P(X = x) \n V[X] &= \sum_{x} (x – \mu)^2 P(X = x) \end{align*}
Continuous type.
\begin{align*} E[X] &= \int_{\infty}^{\infty} x f(x) dx \n V[X] &= \int_{\infty}^{\infty} (x – \mu)^2 f(x) dx \end{align*}
There is also an important formula for Variance:
This can generally be proven as follows:
Proof.
Hereafter, $E[X] = \mu$.
\begin{align*} V[X] &= E[(X – \mu)^2] \n &= E[X^2 – 2\mu X + \mu^2] \n &= E[X^2] -2 \mu E[X] + \mu^2 \n &= E[X^2] -2 \mu^2 + \mu^2 \n &= E[X^2] – E[X]^2. \end{align*}
Next time: ▼ Transformation of random variables and product ratio matrix function.