All knowledge degenerates into probability.

Top

Site Menu

Continuous Random Variables

Amusement park patrons, wanting to go on a log ride, might not have to wait in line at all, they might have to wait for hours, or the wait could be anywhere in between. For a random log rider, the wait time can be indicated by a continuous random variable.

Three people seated in a log-shaped boat on a water ride at an amusement park, surrounded by trees and wooden structures as the ride moves along a winding water channel.

A continuous random variable maps the outcomes of a chance process to an interval or intervals. It has an infinite number of possible outcomes. Variables corresponding to outcomes that are are measured are continuous.



Suppose the wait time for the log ride at an amusement park is 2 hours or less. Let W be a random variable that indicates the wait time of a randomly chosen customer.

W can assume any value between 0 and 2. The wait could be 0.5 hours or 0.6 hours or 0.55 hours, etc. - there are infinite possibilities.

W is a continuous random variable.

The Distribution of a Continuous Random Variable

Graphs and formulas describe the distribution of a continuous random variable. These indicate the observable values of the random variable and associated probabilities. As with discrete random variables, it is sometimes convenient to work with probabilities directly and sometimes to work with cumulative probabilities.


The formulas and graphs below all describe the same possible distribution of W, the wait time for the log ride of a randomly chosen amusement park visitor.

Two ways of describing one possible distribution of W are shown here in both formula and graphical forms.

Probability Density Function Cumulative Distribution Function
Formula$\small{f(x) = 1-\frac{x}{2}, \text{ for } 0\leq x \leq 2}$$\small{F(x) = x-\frac{x^2}{2/4}, \text{ for } 0\leq x \leq 2}$
GraphLine graph showing a decreasing linear relationship between time and probability. The x-axis is labeled “Time (hours)” from 0 to 2.2, and the y-axis ranges from 0 to 1. The green line starts at a value of 1 when time is 0 and decreases steadily to 0 at 2 hours, forming a straight downward slope.Line graph showing a cumulative relationship between time and probability. The x-axis is labeled “Time (hours)” from 0 to 2.2, and the y-axis ranges from 0 to 1. The green curve starts at 0, rises smoothly and concavely upward, and approaches 1 around 2 hours, representing a cumulative increase over time.


The Probability Density Function

The probability density function (pdf) describes the distribution of a continuous random variable. The probability that a random variable assumes an outcome in a given interval are computed by finding the area under the function over that interval. The pdf is usually indicated in function notation.

→ A probability density function (pdf) is a function that describes the distribution of a continuous random variable.


For a continuous random variable $X$ that can take values between $l$ and $u$, we denote the pdf of $X$ by $f(x)$, for $l \leq x \leq u$. The interval from $l$ to $u$ is called the support of the random variable, or the set of values where the probability is non-zero. If no specific interval is indicated, it is assumed that the support is all real numbers.


NOTATION: The probability density function is denoted 'f(x), l ≤ x ≤ u'.

→ The support of a random variable consists of the interval or intervals where the density function is non-zero.




Suppose the distribution of W, the wait time for the log ride of a randomly chosen park visitor, is described by the probability density function $\small{f(x) = 1-\frac{x}{2}, \text{ for } 0\leq x \leq 2}$

Line graph showing a decreasing linear relationship between time and probability. The x-axis is labeled “Time (hours)” from 0 to 2.2, and the y-axis ranges from 0 to 1. The green line starts at a value of 1 when time is 0 and decreases steadily to 0 at 2 hours, forming a straight downward slope.

From the graph, we can see that while all the wait times are between 0 and 2 hours, shorter wait times are more likely.

Notice that the pdf is 0 except between 0 and 2. That means that no wait will be less than 0 hours and none will be longer than 2 hours.


A pdf satisfies these properties:
Suppose, W, the wait time for the log ride of a randomly chosen park visitor has the probability density function $\small{f(w) = 1-\frac{w}{2}, \text{ for } 0\leq w \leq 2}$

Show that $f$ satisfies the above properties.




To participate in a wrestling tournament, high school wrestlers must measure below a specified weight for their weight class. Let X denote the difference between a randomly chosen wrestler's weight and the indicated weight for their class. If $f(x) = -A(x^2+2x-3) \text{ for } -2\leq x \leq 1$, what is the value of $A$ that makes this a pdf?

If $f(x)$ is a pdf then $\int_{-2}^1-A(x^2+2x-3d)x = 1$.

$\int_{-2}^1-A(x^2+2x-3)dx = 9A = 1$ so $A=\frac{1}{9}$.

Finding Probabilities with the Probability Density Function

The probability that a continuous random variable takes on a value in a given interval is equal to the area under the pdf over that interval. Thus, to find the probability that a continuous random variable takes on a value over a given interval, integrate over that interval.

$P(a\leq x \leq b)$ indicates


Use the applet to explore finding a probability with a pdf.

Drag the points labelled 'a' and 'b' to change the interval of interest.

'b' will adjust with 'a' so move 'a' first.

What happens when a < 0 or b > 2?



Find the probability that the waiting time for an amusement park visitor to ride the log ride is between 1 and 1.5 hours.

The probability density function for wait time is $\small{f(w) = 1-\frac{w}{2}, \text{ for } 0\leq w \leq 2}$

$\small{\begin{array} .P(1 \leq W \leq 1.5) &=& \int_1^{1.5}1 - \frac{1}{2}dw \\ &=& \left.w - \frac{w^2}{4}\right]_1^{1.5} \\ &=& (1.5 - \frac{1.5^2}{4}) - (1 - \frac{1^2}{4}) \\ &=& (1.5 - 0.5625) - (1 - 0.25) \\ &=& 0.1875\end{array}}$

The probability that someone waits between 1 and 1.5 hours for the log ride is 0.1875.

Use the applet to verify this answer!


To participate in a wrestling tournament, high school wrestlers must measure below a specified weight for their weight class. Let X denote the difference between a randomly chosen wrestler's weight and the indicated weight for their class. $f(x) = -\frac{1}{9}(x^2+2x-3) \text{ for } -2\leq x \leq 1$. What is the probability that a wrestler exceeds the stated weight?

A wrestler exceeds the given weight if the difference between their weight and that indicated weight is greater than 0.

$P(X > 0) = \int_{x=0}^1 -\frac{1}{9}(x^2+2x-3) dx = 0.19$.

The probability that a wrestler is over the weight is 0.19, that is, about 19% of wrestlers are over the weight.

To find the probability that a mature tree is taller than 150 feet, or that X is greater than 1.5, we integrate from 1.5 to 3 (the pdf is in terms of hundreds of feet).

$\small{P(X > 1.5) = \int_{1.5}^{3} -0.75x^2+3x-2.25dx = 0.844}$

Bell-shaped curve outlined in purple with the right portion of the area under the curve shaded light purple. A vertical line near x = 1.6 marks the left boundary of the shaded region. Inside the shaded area is the label “c = 0.844,” indicating the probability or cumulative area to the right of that point. The x-axis ranges from approximately 0.8 to 3.2.

A Comment on $P(X = x)$

If X is a continuous random variable then $P(X=x) = 0$.

The probability that a continuous random variable takes on a value in an interval, $(a,b)$, is the area under the density function between $a$ and $b$ and is found with integration, $P(a \leq X \leq b) = \int_a^b f(x)dx$. For a single value, $a$, $P(X = a) = \int_a^a f(x)dx = 0$.

Another way to think of this is that for a continuous random variable, there are infinite possible values so the probability that X takes on a specfic value is 0, i.e. $P(X=a) = \frac{1}{\infty} = 0$. Thus, if X is a continuous random variable, $P(X \leq a)$ and $P(X \lt a)$ are equivalent.

The Cumulative Distribution Function

To find a probability with a probability density function, integrate over the interval of interest. To find the probability over a different interval, integrate again. Like the probability density function, the cumulative distribution function describes the distribution of a random variable, however, we integrate the pdf to find the cdf, so the integration step is done up front and probabilities can be found simply by plugging values into the function and evaluating.

→ The cumulative distribution function (cdf) of a continuous random variable $X$ is a function that describes the distribution of a continuous random variable in terms of cumulative probabilities.


The notation $F(x)$ denotes the cdf of $X$. Using '$F$' for the cdf and '$f$' for the pdf suggests the relationship between the functions. Just as in calculus, we use the $'F'$ to denote the antiderivative of $'f'$.

As with the pdf, it is important to always indicate the support when reporting a cdf. If no support is given, assume that the support is all real numbers.

NOTATION: Both 'F(x)' and 'P(X ≤ x)' denote the cumulative distribution function of X. F(x) = P(X ≤ x).



W, the wait time for the log ride of a randomly chosen amusement park visitor has cumulative distribution function $\small{F(w) = w-\frac{w^2}{4}, \text{ for } 0\leq w \leq 2}$

Line graph showing a cumulative relationship between time and probability. The x-axis is labeled “Time (hours)” from 0 to 2.2, and the y-axis ranges from 0 to 1. The green curve starts at 0, rises smoothly and concavely upward, and approaches 1 around 2 hours, representing a cumulative increase over time.

The height of the graph at point w corresponds to the probability that the wait time is no longer that w hours.

The cdf is 0 below 0 since no wait time will be less than 0 hours. It is 1 above 2 since 2 hours is the longest wait time according to this model.

A cdf satisfies these properties:

Finding the CDF from the PDF

The cdf of $X$ is the integral of the pdf of $X$. If $X$ has pdf $f(x)$ for $l \leq x \leq u$, $F(y) = P(X \leq y)$. (Use '$y$' here for convenience). Since $F(y) = 0$ for $y \leq l$,
$$\small{\begin{array}{rcl} F(y) & = & P(X \leq y)\\ & = & P(l \leq x \leq y)\\ & = & \int_l^yf(x)dx\end{array}}$$



Suppose, W, the wait time for the log ride of a randomly chosen park visitor has the probability density function $\small{f(w) = 1-\frac{w}{2}, \text{ for } 0\leq w \leq 2}$. Find the cdf of w.

$\small{\begin{array} .F(y) &=& P(w \leq y) \\ &=& P(0 \leq w \leq y) \\ &=& \int_0^{y}1 - \frac{1}{2}dw \\ &=& \left.w - \frac{w^2}{4}\right]_0^{y} \\ &=& (y - \frac{y^2}{4}) - (0 - \frac{0^2}{4}) \\ &=& y - \frac{y^2}{4}\end{array}}$

Changing the variable back to w, $\small{F(w) = w - \frac{w^2}{4} \text{ for } 0 \leq w \leq 2}$.
Line graph showing a cumulative relationship between time and probability. The x-axis is labeled “Time (hours)” from 0 to 2.2, and the y-axis ranges from 0 to 1. The green curve starts at 0, rises smoothly and concavely upward, and approaches 1 around 2 hours, representing a cumulative increase over time.

Finding Probabilities with the CDF

Whereas to find $P(X\leq x)$ using the probability density function, it is necessary to integrate, finding $P(X\leq x)$ using the cumulative distribution function, entails plugging in a value for x and evaluating.


W, the wait time for the log ride of a randomly chosen park visitor has the probability density function $\small{f(w) = 1-\frac{w}{2}, \text{ for } 0\leq w \leq 2}$ and cumulative distribution function $\small{F(w) = w - \frac{w^2}{4} \text{ for } 0 \leq w \leq 2}$. Find the probability that the wait is less than 45 minutes (0.75 hours) using the pdf, then find it again using the cdf.

Using the pdf, $\small{f(w) = 1-\frac{w}{2}, \text{ for } 0\leq w \leq 2}$

$\small{\begin{array} .P(W \leq 0.75) &=& P(0 \leq W \leq 0.75)\\ &=& \int_0^{0.75} 1 - \frac{w}{2} dw \\ &=& \left.w - \frac{w^2}{4}\right]_0^{0.75} \\ &=& (0.75 - \frac{0.75^2}{4}) - (0 - \frac{0^2}{4}) \\ &=& (0.75 - 0.1406) \\ &=& 0.6094\end{array}}$
Using the cdf, $\small{F(w) = w - \frac{w^2}{4} \text{ for } 0 \leq w \leq 2}$

$\small{\begin{array} .P(W \leq 0.75) &=& F(0.75)\\ &=& 0.75 - \frac{0.75^2}{4} \\ &=& 0.6094\end{array}}$


Use the applet to explore finding a probability with a cdf.

Drag the point $w$ to use the applet to find $P(W \leq w)$.

Investigate what happens if w is less than 0 or greater than 2.

The cdf can also be used to find the probability that a random variable, $X$, takes on a value over an interval $(a, b)$, $P(a \leq X \leq b) = F(b)-F(a)$.


$P(X \leq a) = F(a)$
$P(a \leq X \leq b) = F(b)-F(a)$
$P(X > a) = 1-F(a)$




Use applet to find the $P(a \leq X \leq b)$ via the pdf or the cdf.

Drag the endpoints of the interval in either graph. The endpoints in both graphs will change together.

For a larger version of the applet click here.



To participate in a wrestling tournament, high school wrestlers must measure below a specified weight for their weight class. Let X denote the difference between a randomly chosen wrestler's weight and the indicated weight for their class. $f(x) = -\frac{1}{9}(x^2+2x-3) \text{ for } -2\leq x \leq 1$.
  1. Find the cdf of X.
  2. Use the cdf to find the probability that a wrestler's weight is within one half pound of the specified weight.
1. Find the cdf of X.

$\small{\begin{array}{rcl} F(y) &=& P(X \leq y)\\ &=& \int_{-2}^y -\frac{1}{9}(x^2+2x-3)dx \\ &=& \frac{-y^3}{27}+\frac{y^2}{9}+\frac{y}{3} + 5/27 \\ &=& 0.6094\end{array}}$

$\small{F(y) = P(X \leq y) = \int_{-2}^y -\frac{1}{9}(x^2+2x-3)dx}$

2.Use the cdf to find the probability that a wrestler's weight is within one half pound of the specified weight.

A wrestler is within one half pound of the specified weight if the difference between their weight and that indicated weight is between -0.5 and 0.5.

$\small{P(-0.5 \leq X \leq 0.5) = F(0.5) - F(-0.5) = 0.19}$.

The probability that a wrestler is over the weight is 0.19, that is, about 19% of wrestlers are over the weight.
  1. To find the cdf of $X$, integrate $f(x)$ from 1 to $y$: $\small{\int_1^y -0.75x^2+3x-2.25dx = -0.25y^3+1.5y^2-2.25y+1}$, so $\small{F(x) = -0.25x^3+1.5x^2-2.25x+1, 1 \leq x \leq 3}$.

  2. $\small{P(X \leq 1.75) = F(1.75) = -0.25(1.75^3)+1.5(1.75^2)-2.25(1.75)+1 = 0.316}$


  3. $P(1.25 \leq X \leq 2.75)$
    $\small{\begin{array}{rcl} &=& F(2.75) - F(1.25)\\ &=& [-0.25(2.75^3)+1.5(2.75^2)-2.25(2.75)+1] - [-0.25(1.25^3)+1.5(1.25^2)-2.25(1.25)+1]\\ &=& 0.914\end{array}}$

Smooth red curve showing a cumulative distribution function that increases gradually. The x-axis ranges from 0 to about 3.4 and is labeled “Time (hours).” The y-axis ranges from 0 to 1. The curve starts near 0 at x = 0, rises slowly, increases more steeply between x ≈ 1.4 and x ≈ 2.6, and levels off near 1 beyond x = 3.


Percentiles

A percentile is a value $T_p$ such that the random variable, $X$, takes on a value less than $T_p$ $p\%$ of the time, that is, $F(T_p) = \frac{p}{100}$. Divide $p$ by 100 to express the result as a probability rather than a percentage.

To find the pth percentile, solve the equation $F(T_p) = \frac{p}{100}$.

The pth percentile of the distribution of a random variable is the value, $T_p$ such that $P(X\leq T_p) = \frac{p}{100}$




W, the wait time for the log ride of a randomly chosen park visitor has cumulative distribution function $\small{F(w) = w - \frac{w^2}{4} \text{ for } 0 \leq w \leq 2}$. Find and interpret the 30th percentile.

Let $T_{30}$ denote the 30th percentile then $F(T_{30}) = 0.3$. To simplify notation, use T in place of $T_{30}$ in the computations.

$\small{\begin{array}{rcl} F(T) &=& 0.3\\ T- \frac{T^2}{4}&=& 0.3 \\ 0&=& \frac{T^2}{4} - T + 0.3\end{array}}$

Solving the resulting quadratic equation, T=3.673 or T=0.327. 3.673 is not within the support, $0 \leq w \leq 2$. Thus $T_{30} = 0.327.$

30% of patrons will wait 0.327 hours or less for the log ride.

To use the cdf applet to verify the solution; find the value of w such that F(W) = 0.3.

$F(x) = -0.25x^3+1.5x^2-2.25x+1$

$F(T_{70}) = 0.7$

$F(T_{70}) = -0.25T_{70}^3+1.5T_{70}^2-2.25T_{70}+1 = 0.7$.

Solving this, gives three possible solutions: 0.148, 2.274, and 3.579. Only one of those, 2.274, is within the support of $1 \leq X \leq 3$, so $T_{70} = 2.274$.

70% of the trees in the Redwood forest have a height of 227.4 feet or less.