All knowledge degenerates into probability.

Top

Site Menu

What is a random variable?

In a raffle with 20 tickets, 6 tickets are drawn for prizes. The first prize winner gets $\$20$, 2 second prize winners get $\$10$, and three third prize winners get $\$5$.

What is the sample space for the possible outcomes for a given ticket? The sample space depends on what outcomes are of interest. Thus it could be S={you win money, you do not win money} or S={first prize, second prize, third prize, no prize}.

It is likely that a ticket holder is interested not just in whether or not they will win but in how much money they win. Thus, a relevant sample space is S={20,10,5,0}.

A diagram labeled “S” showing four prize categories and their corresponding point values. The categories are “First prize” with a value of 20, “Second prize” with 10, “Third prize” with 5, and “No prize” with 0. Each label inside the box is connected by a line to its value on the right.

The sample spaces S={first prize, second prize, third prize, no prize} and S={20,10,5,0} refer to the same outcomes in different ways. In the second sample space, prize status is indicated by the amount of money associated with each prize. Alternatively, we could define a variable X such that \[ X = \begin{cases} 20 & \text{if first prize} \\ 10 & \text{if second prize} \\ 5 & \text{if third prize}\\ 0 & \text{if no prize}\\ \end{cases} \] A variable like this, that assigns each outcome in a sample space to a real number is called a random variable. Using random variables makes it possible to speak mathematically about the experimental outcomes. In the case of the raffle, it is possible to describe how much money purchasers of the raffle tickets will win on average, how much that is likely to vary, or to show much someone might expect to win if they bought 3 raffle tickets or if the size of the prizes was doubled.

NOTATION
A random variable is usually denoted with a capital letter from near the end of the alphabet. X, Y, and Z are common.

Observed values of a random variable are denoted by corresponding lower-case letters, thus x or x1,x2...xn denote the specific values that the random variable X can take on.

The notation P(X = x) means 'the probability that random variable X assumes value x'.

$\rightarrow$ A random variable maps each outcome in a sample space to a real number.

Discrete and Continuous Random Variables

Random variables can be either discrete or continuous. Discrete random variables often describe things that can be counted. Continuous random variables often describe things that can be measured.

$\rightarrow$ A discrete random variable has countable (possibly infinite) possible outcomes.

$\rightarrow$ A continous random variable has uncountable (always infinite) possible outcomes.

Example: Which of the following describe discrete random variables? Which describe continuous random variables?
W: The outcome of a die roll.
X: The number of 6's in 10 die rolls.
Y: The weight of a randomly chosen die.
Z: The length of time a rolled die tumbles.

W and X are discrete random variables. Y and Z are continuous.

W and Y are discrete random variables. X and Z are continuous.



The Distribution of a Random Variable

A random variable's distribution indicates the values that the variable can take on and the probabilities of the outcomes. For instance, consider the variable: the outcome of a die roll. The distribution of this variable would indicate that the possible values are the numbers 1 - 6 and that these outcomes are all equally likely, each with probability 1/6. This information can be presented in a graph, table, or formula.

In the case of a discrete random variable, such as the outcome of a die roll, a table displays the distribution by listing the possible outcomes with their probabilities. A graph shows bars at each of the possible outcomes with height corresponding to the probability of that outcome.

A probability distribution graph showing six vertical orange bars of equal height at positions 1 through 6, each with a probability of 1/6. The x-axis ranges from 0 to 7, representing possible outcomes of rolling a fair six-sided die. A table showing the probability distribution for rolling a fair six-sided die. The top row lists outcomes 1 through 6, and the bottom row lists the probability for each outcome as 1/6.

The table and the graph both show the distribution of the outcome of a die roll.


$\rightarrow$ The distribution of a random variable describes the possible outcomes and their associated probabilities.

Since there are infinite possible outcomes for a continuous random variable, the distribution cannot be displayed in a table. It is usually described with a formula or shown in a graph.

Example: A machine that fills 15oz chip bags has been calibrated to slightly overfill. The distribution of the weight, W, of the chips in the bag might be described by a formula such as $\small{f(w) = 8.5-\frac{x}{2}}$ for $\small{15 \leq w \leq 17}$. The graph of this distribution is shown.

A simple line graph with orange segments connecting data points. The x-axis ranges from 14 to 18. The line remains flat from 14 to 15, rises steeply between 15 and 16, then slopes downward to reach zero at 17, and stays flat through 18.

Notice that for each of the above examples, the probability is 0 for all but the possible values. The probabilities are all non-negative, and the total probabilty is 1.