Welcome to Calculus. I'm Professor Ghrist. We're about to begin lecture 44
on expectation and variance. >> Do you think that it's a coincidence
that we use the term probability density function? By no means. In this lesson, we'll explore the analogy
between masses and probabilities. And we'll consider some standard concepts
from statistics that of expectation, variance, and standard deviation. >> Let's say that you're given
a probability density function, rho of x over the. What might you be able to say
about a point chosen at random with respect to that density? Well you might want to know
what outcome is most likely? What do I expect to see on average? On the other hand, since obtaining
any particular value is very, very unlikely,
you might want some way to quantify your uncertainty of that expected value. These ways of characterizing a PDF
are the focus of this lecture. We begin with two standard
definitions from probability. The expectation and the variance. The expectation is also called the mean,
or the expected value, or the first moment. And it is defined as E = the integral
over the domain of x dP. In the one dimensional setting,
this is the integral of x times rho(x) dx. The expectation is telling you something
about the most likely value or the average value. Now, it may occur at the peak of
the probability density function or it may not. It depends on how that density
is distributed over the domain. The variance is sometimes called
the second central moment. It is denoted V, and
it is defined as the integral or the domain of quantity (x- E) squared dP. We can interpret x- E as
a distance from the mean. And so in the one-dimensional case, we can write this as quantity
(x- E) squared rho(x) dx. What happens when we expand
out that multiplication and split this into three integrals? We get some very nice simplification. Notice, that the middle integral and the last integral are things
we've seen before. The middle integral out of
x dP gives the expectation. The final integral, the integral of dP
over the domain is, of course, one. Thus, we can simplify the formula for
the variance to the integral over the domain of x squared dP,
minus the expectation squared. That is an important formula and
you're going to want to remember it. Maybe you've seen
something like it before. It will help us understand just how to
visualize or interpret this variation. By now, you've probably noticed
that there's a relationship between what we've done in probability and
what we've done in mass. If we consider a probability
density function rho over a domain, let's say an interval from a to b, you've
noticed that we've used the same symbol, rho, for the PDF and for a mass density,
in a more physical context. Indeed, the mass element, rho(x)dx is precisely what we use
as the probability element. In that case, what is the mass? Well, it's the integral
of the mass element that corresponds to the probability, P. Now, let's consider the expectation
that we have just defined. E, what is that? That's the integral of x dP. Now of course, by the definition of a PDF,
this is really the same thing as the integral of
x dP over the integral of 1 dP. And so what we're really doing
is computing the average x value with respect to
the probability measure. That is something that we have
seen before in the mass context. That is the centroid x bar, so you can think of the expectation
as a probabilistic centroid. Well if that's the case,
then how do we interpret the variants, that is the integral of (x-
E) quantity squared dP? Well, you no doubt recognize
that that is the moment of inertia but rather the moment
of inertia about the centroid. Now that's a very nice
physical interpretation. That means we can think of the variance
as something like a measure of resistance to rotating the domain about its mean. If that's the case, then what is
the probabilistic interpretation of the radius of gyration
at distance at which one could focus all of the mass and
have the same inertia. Well, that is a well known
object improbability called the standard deviation. Denoted by sigma,
it is defined to be the square root of the variance and
it is a measure of how the mass, if you will,
is distributed about the centroid. It's a measure of the spread of
the variance about the expectation. Now with this in hand,
let's compute some of these values. In the case of a simple PDF,
let's look at a uniform density over the interval from a to b,
what's the expectation? Well, it's the integral of x dP, that is the integral of x times
rho(x) dx as x goes from a to b. Rho of x being simply 1 over the length, b
minus a, gives us a very simple integral. When we evaluate it and do a little
bit of algebra, you won't be surprised to find that the expectation
is one-half (a + b). The average value is right
in the middle of the domain. What's the variance in this case? Well, we have to integrate
quantity (x- E) squared times rho(x) dx using the formula
that we derived a little bit ago, we can take advantage of the fact
that we already know the expectation. And so
we're reduced to integrating x squared dx. That much is simple enough, but
you'll find when you go to do this that there's a little bit of algebra
that's involved in order to simplify. I claim that with a little bit of work, one gets a variance of
one-twelfth (b- a) squared. Now, how do I know that I
haven't made a mistake here? According to our physical interpretation, this variance should be related
to the moment of inertia of a region about a centroid axis. Recall that for a rectangle,
we have the moment of inertia as one-twelfth M,
the mass, times l squared. In the probabilistic context, the mass
is one and this length is b minus a. So I'm confident that
we got this one right. The standard deviation is
the square root of the variance, that is the square root of
one-twelfth ( b- a) squared. In other words, (b- a) over 2 root 3. How about a different density function, an exponential density over
the domain from 0 to infinity? Well the expected value
is the interval of xdP. In this case, with an exponential density,
we have to integrate x times alpha e to the minus alpha x dx,
as x goes from 0 to infinity. This is going to take a little
bit of work, but not too much. Using integration by parts, we get -x e to the minus alpha x from 0
to infinity + the integral from 0 to infinity of e
to the minus alpha x dx. The former term vanishes
upon substitution and we're left with an easy enough integral. We obtain -1 over alpha e to
the minus alpha x from 0 to infinity. Evaluating, we obtain 1 over alpha. Now, that's a very,
very simple expectation. You'll probably going to want to remember
this since exponential density functions are fairly common. The variance, in this case,
is going to be a bit more work. We're going to have to
integrate x squared dP and then subtract off the expectation squared. This integral is going to, again, involve
integration by part, but we're going to have to do it twice, since there is an x
squared in front of the exponential term. I hope you'll trust me when
I claim that this variance reduces to 1 over alpha squared. This means that the standard
deviation is equal to 1 over alpha. Let's put this to use in a simple example. Let's say that a light bulb is
advertised as having an average lifespan of 2,000 hours. What are the odds that your light
bulb fails within 400 hours of use? You might think, well that's very unlikely since they say
the average lifespan is 2,000 hours. But let's see, if we assume
a probability density function that is exponential and
of the form alpha e to the minus alpha t, well t is time,
then what can we say about alpha? Well, knowing that the expected
value is 2,000 hours. And knowing that that expectation is
1 over alpha for an exponential PDF. What can we say? Well, we know that alpha is 1 over 2,000. And to compute the probability
of failure within 400 hours, we want the probability of landing
in the interval from 0 to 400. That is simply the integral
of alpha e to the minus alpha t dt, as t goes from 0 to 400. Now that integral is easy enough. It yields -e to the minus alpha t. Evaluating from 0 to 400, we obtain 1- e to the minus one-fifth because negative alpha times 400 is minus one-fifth. Now, that evaluates to about 0.18,
meaning that there's nearly a 1 in 5 chance of a failure
within the first 400 hours. And recall from our previous lesson,
the ubiquity of Gaussian or normal densities and their form
which looks bad but is not so much. Note that this is an even function in x. So that the expectation is 0. One of the reasons for
the coefficient out in front is so that the variance and
the standard deviation will both be 1. Now this formula for
Gaussian is not most general possible. There is a general Gaussian of the form
1 over sigma square root of 2 pi, times e to the minus (x- mu)
squared over 2 sigma squared. This general Gaussian allows you to tune
the mean and the standard deviation. The expectation is mu. The variance is precisely sigma squared. One of the reasons for using general
Gaussians is that many random variables are normal. They have a Gaussian density. If you look at the size of objects
such as people, or snowflakes, they tend to be distributed
according to a Gaussian. This is often used in statistics. Now Gaussians can be hard to work with,
given their form, but one can keep in mind a few properties. If you have a Gaussian of expectation,
mu, and standard deviation, sigma, then you're likely to be within one
standard deviation of the mean. 68% of the mass of that density
function lies within distance sigma. If you are within two standard deviations, then the amount of mass
goes up to about 95% and almost all of the mass, more than 99% lies within a distance of three standard
deviations of the expectation. Keeping these figures in mind might help
you if you ever take a statistics class. Note, however, that there's much more
to statistics and to probability than simply remembering a couple of numbers and
the general shape of a Gaussian. It's a large and beautiful subject that you'll be able to
handle with your knowledge of calculus. >> And so
we've come to the end of chapter four. It's worth taking a moment to reflect on
all the applications that we've covered from simple area under a curve all
the way up through probability. In our next and last chapter, we're
going to tie the entire course together by reconsidering what calculus means
in the context of discrete functions.