About events, probabilities and random variables.


This article is available in: 日本語


Last time, we studied how to handle univariate and multivariate data.

【Multivariate Data】 Scatter Plots and Correlation Coefficients
In this article, I will discuss scatter plots and scatter plot matrices as a basic way to handle multivariate data, and correlation coefficients, rank correlation coefficients, and variance-covariance matrices as a method of summarization.

In this article, we will discuss events, probabilities, and random variables. The descriptions are mathematical and abstract, but what they say is very common sense, so it might be a good idea to take it without thinking too hard.


As an example, consider a situation where you throw a dice once. The possible outcomes of the dice are 1, 2, 3, 4, 5, or 6. At this time,

  • A possible result is called a sample point.
  • A set of sample points $\Omega = \{ 1, 2, 3, 4, 5, 6 \}$ is called a sample space.
  • A subset of the sample space is called an event.

\begin{align*} \newcommand{\mat}[1]{\begin{pmatrix} #1 \end{pmatrix}} \newcommand{\f}[2]{\frac{#1}{#2}} \newcommand{\pd}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\d}[2]{\frac{{\rm d}#1}{{\rm d}#2}} \newcommand{\T}{\mathsf{T}} \newcommand{\(}{\left(} \newcommand{\)}{\right)} \newcommand{\{}{\left\{} \newcommand{\}}{\right\}} \newcommand{\[}{\left[} \newcommand{\]}{\right]} \newcommand{\dis}{\displaystyle} \newcommand{\eq}[1]{{\rm Eq}(\ref{#1})} \newcommand{\n}{\notag\\} \newcommand{\t}{\ \ \ \ } \newcommand{\tt}{\t\t\t\t} \newcommand{\argmax}{\mathop{\rm arg\, max}\limits} \newcommand{\argmin}{\mathop{\rm arg\, min}\limits} \def\l<#1>{\left\langle #1 \right\rangle} \def\us#1_#2{\underset{#2}{#1}} \def\os#1^#2{\overset{#2}{#1}} \newcommand{\case}[1]{\{ \begin{array}{ll} #1 \end{array} \right.} \newcommand{\s}[1]{{\scriptstyle #1}} \definecolor{myblack}{rgb}{0.27,0.27,0.27} \definecolor{myred}{rgb}{0.78,0.24,0.18} \definecolor{myblue}{rgb}{0.0,0.443,0.737} \definecolor{myyellow}{rgb}{1.0,0.82,0.165} \definecolor{mygreen}{rgb}{0.24,0.47,0.44} \newcommand{\c}[2]{\textcolor{#1}{#2}} \newcommand{\ub}[2]{\underbrace{#1}_{#2}} \end{align*}

Probabilities and random variables.

The probability of the occurrence of the event $A$ is expressed as $P(A)$. In this case, $P(A)$ has the following properties to satisfy.

Definition of probability.
  1. $0 \leq P(A) \leq 1.$
  2. If $A$ is a certain event, $P(A) = 1.$
  3. If the events $A and B$ do not occur at the same time, $P(A\cup B) = P(A) + P(B). $
    *In the case of $A_1, A_2, and \dots$, which are mutually exclusive events that do not occur in an infinite number of cases at the same time. \begin{align*} P\(\bigcup^\infty_{i=1} A_i\) = \sum^\infty_{i=1}P(A_i). \end{align*}

A random variable is a variable that takes various values with a fixed probability. For example, the outcome of the dice is a random variable. If the dice are not distorted, the odds of getting any of them are the same: $1/6$. This can be expressed as follows, where $X$ is the outcome of the dice.

\begin{align*} P(X = x) = \f{1}{6}, \t x=1, 2, \dots, 6 \end{align*}

A random variable that takes on a variety of values, such as a dice, is called a discrete random variable.

On the other hand, if the value of a random variable is expected to change continuously, such as height or weight, it is called a continuous random variable.

Conditional probability and Bayes’ theorem.

The probability of the occurrence of the event $A$ under the condition that the event $B$ has occurred is called the conditional probability of $A$ in $B$, and is expressed as follows.

\begin{align*} P(A|B) = \f{P(A \cap B)}{P(B)} \end{align*}

Here, when the conditional probability of $A$ in $B$ is not affected by $B$, that is, when the following equation holds, the events $A, B$ are said to be independent.

\begin{align*} P(A|B) = \f{P(A \cap B)}{P(B)} = P(A) \end{align*}

The fact that $A, B$ are independent is also expressed as the following expression by rewriting the above expression.

\begin{align*} P(A \cap B) = P(A)P(B) \end{align*}

In the above conditional probability, if we swap the order of $A$ and $B$, we multiply $P(A \cap B) = P(A|B)P(B)$, $P(A \cap B) = P(B|A)P(A)$, respectively.

Bayes’ theorem.

\begin{align*} P(A|B) = \f{P(B|A)P(A)}{P(B)}. \end{align*}

Also, Bayes’ theorem is extended in the following form.

Extension of Bayes’ theorem.

If all events $\Omega$ are the sum of exclusive events $A_1, A_2, \dots, A_n$, Bayes’ theorem is expressed in the following form.

\begin{align*} \Omega = \bigcup^n_{i=1} A_i, \t A_i \cap A_j = \emptyset \t (i \neq j) \end{align*}

\begin{align*} P(A_i|B) = \f{P(B|A_i)P(A_i)}{P(B)} = \f{P(B|A_i)P(A_i)}{\sum^n_{i=1}P(B|A_i)P(A_i)} \end{align*}

The transformation in the last row uses the formula for all probabilities.

All-probability formula.

If $A_1, A_2, \dots, A_n$ are mutually exclusive and $P(A_i) > 0 \ (i=1, \dots, n)$,

\begin{align*} C = \bigcup^n_{i=1} A_i \end{align*}

The probability of the event $C$ is:

\begin{align*} P(C) &= \sum^n_{i=1} P(C \cap A_i) \n &= \sum^n_{i=1}P(C|A_i)P(A_i). \end{align*}

When conditioned by multiple random variables, the Bayesian formula can be written as follows.

Multivariable Bayesian formula.

The following equation holds for the random variables $X, Y, and Z $.

\begin{align*} P(X | Y, Z) = \f{P(Y|X, Z) P(X|Z)}{P(Y|Z)} \end{align*}

The proof is as follows.


Given the codistribution $P(X, Y, Z)$, from the definition of conditional probability,

\begin{align*} P(X, Y, Z) &= P(X|Y, Z)P(Y, Z) \n &=P(X|Y, Z)P(Y|Z)P(Z) \end{align*}

On the other hand, \begin{align*} P(X, Y, Z) &= P(Y|X, Z)P(X, Z) \n &=P(Y|X, Z)P(X|Z)P(Z) \end{align*}

so compare the two,

\begin{align*} P(X|Y, Z)P(Y|Z)P(Z) &= P(Y|X, Z)P(X|Z)P(Z) \n \therefore P(X | Y, Z) &= \f{P(Y|X, Z) P(X|Z)}{P(Y|Z)} \end{align*}

Expected value and variance of the probability distribution.

For random variables, it is often referred to as an expected value, not an average. The definitions of expectation and variance are somewhat different for discrete and continuous random variables.

The following is the definition of the random variable $X$: Expected value: $E[X]$, Variance: $V[X]$. Note that $\mu$ is the expected value.

Discrete type.

\begin{align*} E[X] &= \sum_{x} x P(X = x) \n V[X] &= \sum_{x} (x – \mu)^2 P(X = x) \end{align*}

Continuous type.

\begin{align*} E[X] &= \int_{\infty}^{\infty} x f(x) dx \n V[X] &= \int_{\infty}^{\infty} (x – \mu)^2 f(x) dx \end{align*}

There is also an important formula for Variance:

Variance formula.

\begin{align*} V[X] = E[X^2] – E[X]^2 \end{align*}

This can generally be proven as follows:


Hereafter, $E[X] = \mu$.

\begin{align*} V[X] &= E[(X – \mu)^2] \n &= E[X^2 – 2\mu X + \mu^2] \n &= E[X^2] -2 \mu E[X] + \mu^2 \n &= E[X^2] -2 \mu^2 + \mu^2 \n &= E[X^2] – E[X]^2. \end{align*}

Next time: ▼ Transformation of random variables and product ratio matrix function.