Welcome to Calculus. I'm Professor Ghrist. We're about to begin lecture 44 on expectation and variance. >> Do you think that it's a coincidence that we use the term probability density function? By no means. In this lesson, we'll explore the analogy between masses and probabilities. And we'll consider some standard concepts from statistics that of expectation, variance, and standard deviation. >> Let's say that you're given a probability density function, rho of x over the. What might you be able to say about a point chosen at random with respect to that density? Well you might want to know what outcome is most likely? What do I expect to see on average? On the other hand, since obtaining any particular value is very, very unlikely, you might want some way to quantify your uncertainty of that expected value. These ways of characterizing a PDF are the focus of this lecture. We begin with two standard definitions from probability. The expectation and the variance. The expectation is also called the mean, or the expected value, or the first moment. And it is defined as E = the integral over the domain of x dP. In the one dimensional setting, this is the integral of x times rho(x) dx. The expectation is telling you something about the most likely value or the average value. Now, it may occur at the peak of the probability density function or it may not. It depends on how that density is distributed over the domain. The variance is sometimes called the second central moment. It is denoted V, and it is defined as the integral or the domain of quantity (x- E) squared dP. We can interpret x- E as a distance from the mean. And so in the one-dimensional case, we can write this as quantity (x- E) squared rho(x) dx. What happens when we expand out that multiplication and split this into three integrals? We get some very nice simplification. Notice, that the middle integral and the last integral are things we've seen before. The middle integral out of x dP gives the expectation. The final integral, the integral of dP over the domain is, of course, one. Thus, we can simplify the formula for the variance to the integral over the domain of x squared dP, minus the expectation squared. That is an important formula and you're going to want to remember it. Maybe you've seen something like it before. It will help us understand just how to visualize or interpret this variation. By now, you've probably noticed that there's a relationship between what we've done in probability and what we've done in mass. If we consider a probability density function rho over a domain, let's say an interval from a to b, you've noticed that we've used the same symbol, rho, for the PDF and for a mass density, in a more physical context. Indeed, the mass element, rho(x)dx is precisely what we use as the probability element. In that case, what is the mass? Well, it's the integral of the mass element that corresponds to the probability, P. Now, let's consider the expectation that we have just defined. E, what is that? That's the integral of x dP. Now of course, by the definition of a PDF, this is really the same thing as the integral of x dP over the integral of 1 dP. And so what we're really doing is computing the average x value with respect to the probability measure. That is something that we have seen before in the mass context. That is the centroid x bar, so you can think of the expectation as a probabilistic centroid. Well if that's the case, then how do we interpret the variants, that is the integral of (x- E) quantity squared dP? Well, you no doubt recognize that that is the moment of inertia but rather the moment of inertia about the centroid. Now that's a very nice physical interpretation. That means we can think of the variance as something like a measure of resistance to rotating the domain about its mean. If that's the case, then what is the probabilistic interpretation of the radius of gyration at distance at which one could focus all of the mass and have the same inertia. Well, that is a well known object improbability called the standard deviation. Denoted by sigma, it is defined to be the square root of the variance and it is a measure of how the mass, if you will, is distributed about the centroid. It's a measure of the spread of the variance about the expectation. Now with this in hand, let's compute some of these values. In the case of a simple PDF, let's look at a uniform density over the interval from a to b, what's the expectation? Well, it's the integral of x dP, that is the integral of x times rho(x) dx as x goes from a to b. Rho of x being simply 1 over the length, b minus a, gives us a very simple integral. When we evaluate it and do a little bit of algebra, you won't be surprised to find that the expectation is one-half (a + b). The average value is right in the middle of the domain. What's the variance in this case? Well, we have to integrate quantity (x- E) squared times rho(x) dx using the formula that we derived a little bit ago, we can take advantage of the fact that we already know the expectation. And so we're reduced to integrating x squared dx. That much is simple enough, but you'll find when you go to do this that there's a little bit of algebra that's involved in order to simplify. I claim that with a little bit of work, one gets a variance of one-twelfth (b- a) squared. Now, how do I know that I haven't made a mistake here? According to our physical interpretation, this variance should be related to the moment of inertia of a region about a centroid axis. Recall that for a rectangle, we have the moment of inertia as one-twelfth M, the mass, times l squared. In the probabilistic context, the mass is one and this length is b minus a. So I'm confident that we got this one right. The standard deviation is the square root of the variance, that is the square root of one-twelfth ( b- a) squared. In other words, (b- a) over 2 root 3. How about a different density function, an exponential density over the domain from 0 to infinity? Well the expected value is the interval of xdP. In this case, with an exponential density, we have to integrate x times alpha e to the minus alpha x dx, as x goes from 0 to infinity. This is going to take a little bit of work, but not too much. Using integration by parts, we get -x e to the minus alpha x from 0 to infinity + the integral from 0 to infinity of e to the minus alpha x dx. The former term vanishes upon substitution and we're left with an easy enough integral. We obtain -1 over alpha e to the minus alpha x from 0 to infinity. Evaluating, we obtain 1 over alpha. Now, that's a very, very simple expectation. You'll probably going to want to remember this since exponential density functions are fairly common. The variance, in this case, is going to be a bit more work. We're going to have to integrate x squared dP and then subtract off the expectation squared. This integral is going to, again, involve integration by part, but we're going to have to do it twice, since there is an x squared in front of the exponential term. I hope you'll trust me when I claim that this variance reduces to 1 over alpha squared. This means that the standard deviation is equal to 1 over alpha. Let's put this to use in a simple example. Let's say that a light bulb is advertised as having an average lifespan of 2,000 hours. What are the odds that your light bulb fails within 400 hours of use? You might think, well that's very unlikely since they say the average lifespan is 2,000 hours. But let's see, if we assume a probability density function that is exponential and of the form alpha e to the minus alpha t, well t is time, then what can we say about alpha? Well, knowing that the expected value is 2,000 hours. And knowing that that expectation is 1 over alpha for an exponential PDF. What can we say? Well, we know that alpha is 1 over 2,000. And to compute the probability of failure within 400 hours, we want the probability of landing in the interval from 0 to 400. That is simply the integral of alpha e to the minus alpha t dt, as t goes from 0 to 400. Now that integral is easy enough. It yields -e to the minus alpha t. Evaluating from 0 to 400, we obtain 1- e to the minus one-fifth because negative alpha times 400 is minus one-fifth. Now, that evaluates to about 0.18, meaning that there's nearly a 1 in 5 chance of a failure within the first 400 hours. And recall from our previous lesson, the ubiquity of Gaussian or normal densities and their form which looks bad but is not so much. Note that this is an even function in x. So that the expectation is 0. One of the reasons for the coefficient out in front is so that the variance and the standard deviation will both be 1. Now this formula for Gaussian is not most general possible. There is a general Gaussian of the form 1 over sigma square root of 2 pi, times e to the minus (x- mu) squared over 2 sigma squared. This general Gaussian allows you to tune the mean and the standard deviation. The expectation is mu. The variance is precisely sigma squared. One of the reasons for using general Gaussians is that many random variables are normal. They have a Gaussian density. If you look at the size of objects such as people, or snowflakes, they tend to be distributed according to a Gaussian. This is often used in statistics. Now Gaussians can be hard to work with, given their form, but one can keep in mind a few properties. If you have a Gaussian of expectation, mu, and standard deviation, sigma, then you're likely to be within one standard deviation of the mean. 68% of the mass of that density function lies within distance sigma. If you are within two standard deviations, then the amount of mass goes up to about 95% and almost all of the mass, more than 99% lies within a distance of three standard deviations of the expectation. Keeping these figures in mind might help you if you ever take a statistics class. Note, however, that there's much more to statistics and to probability than simply remembering a couple of numbers and the general shape of a Gaussian. It's a large and beautiful subject that you'll be able to handle with your knowledge of calculus. >> And so we've come to the end of chapter four. It's worth taking a moment to reflect on all the applications that we've covered from simple area under a curve all the way up through probability. In our next and last chapter, we're going to tie the entire course together by reconsidering what calculus means in the context of discrete functions.