5.5 The mean value theorem

If f:If\colon I\to\mathbb{R} is differentiable, then the derivative f:If^{\prime}\colon I\to\mathbb{R} encodes a lot of information about the original function ff. Here we discuss a number of important results in this vein, which relate properties of ff^{\prime} (and possibly higher-order derivatives) to properties of ff.

Stationary points

Let II\subseteq\mathbb{R} be an open interval and suppose f:If\colon I\to\mathbb{R} is differentiable. Intuitively, it is clear that the tangent to the graph of ff should be horizontal at a point xMIx_{M}\in I where the function attains its maximum: see Figure 5.10. The same holds for a point where ff attains its minimum. Thus, our intuition tells us that the derivative (which corresponds to the gradient of the tangent line) should vanish wherever we encounter an extreme value of the function.33 3 But we are not content with intuition: we need formal proof! We will make this precise in Theorem 5.34 below. In fact, this observation should still hold if, rather than necessarily consider a maximum xMx_{M} for ff, we considering a local maximum, defined as follows.

Figure 5.10: The function f:(a,b)f\colon(a,b)\to\mathbb{R} has a maximum at xMx_{M} and a local maximum at xLx_{L}. At both these points the tangent line to the graph is horizontal (they are stationary points of ff).
Definition 5.33.

Let II\subseteq\mathbb{R} be an open interval, f:If\colon I\to\mathbb{R} be a function and aIa\in I.

  1. 1

    We say xLIx_{L}\in I is a local maximum for ff if there exists some open interval JIJ\subseteq I such that xLJx_{L}\in J and f(x)f(xL)f(x)\leq f(x_{L}) for all xJx\in J;

  2. 2

    We say xIx_{\ell}\in I is a local minimum for ff if there exists some open interval JIJ\subseteq I such that xJx_{\ell}\in J and f(x)f(x)f(x)\geq f(x_{\ell}) for all xJx\in J.

We illustrate the concept of a local maximum in Figure 5.10. Note that if xMx_{M} is a maximum for ff, then it is also a local maximum (since we can just take J:=IJ:=I). Similarly, if xmx_{m} is a minimum for ff, then it is also a local minimum.

The following theorem makes precise the intuitive link between local extrema and derivatives. The key to this link is considering the zeros of the derivative ff^{\prime}, which we call the stationary points of the function ff.

Theorem 5.34 (Stationary point theorem).

Let II\subseteq\mathbb{R} be an open interval and f:If\colon I\to\mathbb{R} be differentiable.

  1. 1

    If xLIx_{L}\in I is a local maximum for ff then f(xL)=0f^{\prime}(x_{L})=0.

  2. 2

    If xIx_{\ell}\in I is a local minimum for ff, then f(x)=0f^{\prime}(x_{\ell})=0.

In particular, the maximum and minimum of ff (if they exist) must occur at stationary points of ff.

Proof.

We shall only prove (1), since (2) can be proved using a similar argument, or can be derived from (1) by replacing ff with f-f.

Suppose ff has a local maximum at xLIx_{L}\in I, so that there exists some open interval JIJ\subseteq I with xLJx_{L}\in J such that f(x)f(xL)f(x)\leq f(x_{L}) for all xJx\in J. In particular, if h{0}h\in\mathbb{R}\setminus\{0\} is such that xL+hJx_{L}+h\in J, then f(xL+h)f(xL)0f(x_{L}+h)-f(x_{L})\leq 0. Consequently, the difference quotient satisfies

f(xL+h)f(xL)h>0if h<0andf(xL+h)f(xL)h<0if h>0.\frac{f(x_{L}+h)-f(x_{L})}{h}>0\quad\text{if $h<0$}\qquad\text{and}\qquad\frac% {f(x_{L}+h)-f(x_{L})}{h}<0\quad\text{if $h>0$.}

Since ff is differentiable at xLx_{L}, the left and right derivatives exist at xLx_{L} and are equal to f(xL)f^{\prime}(x_{L}). Furthermore,

f(xL)=limh0f(xL+h)f(xL)h0andf(xL)=limh0+f(xL+h)f(xL)h0.f^{\prime}(x_{L})=\lim_{h\to 0-}\frac{f(x_{L}+h)-f(x_{L})}{h}\geq 0\qquad\text% {and}\qquad f^{\prime}(x_{L})=\lim_{h\to 0+}\frac{f(x_{L}+h)-f(x_{L})}{h}\leq 0.

From these two inequalities we conclude that f(xL)=0f^{\prime}(x_{L})=0, as required. ∎

Exercise 5.35.

Sketch a figure to illustrate the ideas behind the proof of Theorem 5.34.

Theorem 5.34 is a useful tool for finding the maximum and minimum values of a function, since it limits the possibilities of where these values can occur. However, the following simple example shows that stationary points do not always correspond to local extrema.

Example 5.36.

The converse of Theorem 5.34 does not hold. For instance, the function f:f\colon\mathbb{R}\to\mathbb{R} given by f(x):=x3f(x):=x^{3} is differentiable with f(x)=3x2f^{\prime}(x)=3x^{2} for all xx\in\mathbb{R} and therefore has a stationary point at x=0x=0. However, since limxx3=\lim_{x\to\infty}x^{3}=\infty and limxx3=\lim_{x\to-\infty}x^{3}=-\infty, we see that 0 is neither a local maximum nor a local minimum for ff.

So, extreme values must occur at stationary points, but stationary points are not always extrema. We can use information from second-order derivatives to try to further diagnose whether a stationary point is an extremum; we shall return to this topic later.

Rolle’s theorem

As a simple consequence of the stationary point theorem, we deduce the following result.

Theorem 5.37 (Rolle’s Theorem).

Suppose aa, bb\in\mathbb{R} with a<ba<b. If f:[a,b]f\colon[a,b]\to\mathbb{R} is continuous on [a,b][a,b], differentiable on (a,b)(a,b) and f(a)=f(b)=0f(a)=f(b)=0, then there exists some c(a,b)c\in(a,b) such that f(c)=0f^{\prime}(c)=0.

Suppose f:f\colon\mathbb{R}\to\mathbb{R} is differentiable. Rolle’s theorem tells us that between every pair of zeros of ff there is a stationary point of ff, where the tangent line is horizontal. We illustrate this in Figure 5.11.

Figure 5.11: Rolle’s theorem: between any two zeros of ff, there exists a stationary point of ff.
Proof (of Theorem 5.37).

By the extreme value theorem from Theorem 4.106, the function ff attains its minimum value ymy_{m} and maximum value yMy_{M} somewhere on [a,b][a,b]. If ym=yMy_{m}=y_{M}, then ff is constant. In this case, we see from Definition 5.2 that f(c)=0f^{\prime}(c)=0 for all c(a,b)c\in(a,b) and so the claim follows. Thus, we may assume ym<yMy_{m}<y_{M}.

Since f(a)=0f(a)=0 and yMy_{M} is the maximum value of ff, we know yM0y_{M}\geq 0. Consider the case yM>0y_{M}>0. Then the value yMy_{M} is not attained by the function at either of the endpoints aa or bb of the interval, since f(a)=f(b)=0f(a)=f(b)=0. However, we know that the value yMy_{M} is attained by ff somewhere in the interval [a,b][a,b], and so there exists some xM(a,b)x_{M}\in(a,b) such that f(xM)=yMf(x_{M})=y_{M}. Thus, xMx_{M} is a maximum for ff and the stationary point theorem (Theorem 5.34) implies that f(c)=0f^{\prime}(c)=0 for c:=xMc:=x_{M}, as required.

It remains to consider the case yM=0y_{M}=0. Since ym<yMy_{m}<y_{M}, it follows that ym<0y_{m}<0. We can now use exactly the same argument as in the previous case to show that there exist some minimum xm(a,b)x_{m}\in(a,b) for ff and therefore f(c)=0f^{\prime}(c)=0 for c:=xmc:=x_{m}, as required. ∎

Exercise 5.38.

Show that each hypothesis of Rolle’s theorem is necessary as follows.

  1. (i)

    Show there exists some f:[0,1]f\colon[0,1]\to\mathbb{R} which is differentiable on (0,1)(0,1) and satisfies f(0)=f(1)f(0)=f(1) for which f(c)0f^{\prime}(c)\neq 0 for all c(0,1)c\in(0,1).

  2. (ii)

    Show that there exists some f:[1,1]f\colon[-1,1]\to\mathbb{R} which is continuous on [1,1][-1,1], differentiable on (1,1){0}(-1,1)\setminus\{0\}, and satisfies f(1)=f(1)f(-1)=f(1) but for which f(c)0f^{\prime}(c)\neq 0 for all c(1,1){0}c\in(-1,1)\setminus\{0\}.

Note that Rolle’s theorem asserts the existence of a stationary point, but does not say anything about uniqueness. In particular, there can be more than just one stationary point.

Exercise 5.39.

Consider the function f:[3,3]f\colon[-3,3]\to\mathbb{R} given by f(x):=x39xf(x):=x^{3}-9x. Observe that ff is continuous on [3,3][-3,3], differentiable on (3,3)(-3,3) and f(3)=f(3)=0f(3)=f(-3)=0. Sketch the graph of ff and show that ff has precisely two stationary points in (3,3)(-3,3).

Exercise 5.40.

Sketch an example of a function satisfying the hypotheses of Rolle’s theorem with precisely 55 stationary points. Here we are interested in a conceptional drawing: you do not need to derive a formula for the function, merely illustrate the concept.

The mean value theorem

Our next step is to prove an important and far-reaching upgrade of Rolle’s theorem.

Theorem 5.41 (Mean value theorem).

Suppose a,ba,b\in\mathbb{R} with a<ba<b. If f:[a,b]f\colon[a,b]\to\mathbb{R} is continuous on [a,b][a,b] and differentiable on (a,b)(a,b) then there exists some c(a,b)c\in(a,b) such that

(5.18) (5.18) f(b)f(a)ba=f(c).\frac{f(b)-f(a)}{b-a}=f^{\prime}(c).
Exercise 5.42.

Show that the mean value theorem implies Rolle’s theorem as a special case.

Before discussing the proof of Theorem 5.41, it’s helpful to spend some time developing intuition for what the result is telling us. We shall actually discuss three different interpretations of the mean value theorem: one here and two others in later sections.44 4 Unfortunately, none of our interpretations will fully explain why Theorem 5.41 is called the ‘mean value’ theorem. The answer is that the expression (5.18) corresponds to the average (or ‘mean’) rate of change for the function ff. However, this interpretation relies on integration theory and, in particular, the fundamental theorem of calculus. You will investigate these topics if you take the year 2 course Further Analysis and Several Variable Calculus.

MVT Interpretation 1: Parallel lines.

There is a simple geometric interpretation of the mean value theorem, which is illustrated in Figure 5.12. The figure shows the secant line SS through the points (a,f(a))(a,f(a)) and (b,f(b))(b,f(b)) on the graph of ff. The gradient of SS is given by

f(b)f(a)ba,\frac{f(b)-f(a)}{b-a},

which corresponds to left-hand side of (5.18). The mean value theorem tells us the following: there exists some c(a,b)c\in(a,b) such that the gradient of SS is equal to the gradient f(c)f^{\prime}(c) of the tangent line TT to the graph of ff at (c,f(c))(c,f(c)). In particular, the secant SS and the tangent line TT are parallel.

Figure 5.12: The mean value theorem.

In Exercise 5.42, we saw that the mean value theorem implies Rolle’s theorem. In fact, we can also use Rolle’s theorem to prove the mean value theorem (so the two results are equivalent)!

Sketch proof (of Theorem 5.41).

We shall only sketch the details of the proof here: you can fill in the details for yourself (see Exercise 5.43 below)!

The graph of the linear polynomial

xf(a)+f(b)f(a)ba(xa)x\mapsto f(a)+\frac{f(b)-f(a)}{b-a}(x-a)

is precisely the secant line through (a,f(a))(a,f(a)) and (b,f(b))(b,f(b)). The idea behind the proof is to subtract this linear polynomial from ff in order to reduce to a situation where Rolle’s theorem can be applied. More precisely, consider

(5.19) (5.19) h:[a,b],h(x):=f(x)f(a)f(b)f(a)ba(xa)for all x[a,b].h\colon[a,b]\to\mathbb{R},\qquad h(x):=f(x)-f(a)-\frac{f(b)-f(a)}{b-a}(x-a)% \qquad\text{for all $x\in[a,b]$.}

If we compare the graphs of ff and hh as in Figure 5.13, then intuitively the graph of hh is formed by sliding the graph of ff so that the secant line through (a,f(a))(a,f(a)) and (b,f(b))(b,f(b)) becomes the horizontal axis. This is exactly the situation where Rolle’s theorem applies.

Rolle’s theorem applied to hh tells us there exists some c(a,b)c\in(a,b) such that h(c)=0h^{\prime}(c)=0. Applying this to the definition (5.19) of hh and rearranging, we obtain the conclusion of the mean value theorem. ∎

Figure 5.13: Reducing the mean value theorem to Rolle’s theorem. The top curve is the graph of ff and the bottom curve is the graph of the transformed function hh, as defined in (5.19).
Exercise 5.43.

Prove the mean value theorem by applying Rolle’s theorem to hh as in (5.19).