5.5 The mean value theorem

If $f\colon I\to\mathbb{R}$ is differentiable, then the derivative $f^{\prime}\colon I\to\mathbb{R}$ encodes a lot of information about the original function $f$ . Here we discuss a number of important results in this vein, which relate properties of $f^{\prime}$ (and possibly higher-order derivatives) to properties of $f$ .

Stationary points

Let $I\subseteq\mathbb{R}$ be an open interval and suppose $f\colon I\to\mathbb{R}$ is differentiable. Intuitively, it is clear that the tangent to the graph of $f$ should be horizontal at a point $x_{M}\in I$ where the function attains its maximum: see Figure 5.10. The same holds for a point where $f$ attains its minimum. Thus, our intuition tells us that the derivative (which corresponds to the gradient of the tangent line) should vanish wherever we encounter an extreme value of the function.³³ 3 But we are not content with intuition: we need formal proof! We will make this precise in Theorem 5.34 below. In fact, this observation should still hold if, rather than necessarily consider a maximum $x_{M}$ for $f$ , we considering a local maximum, defined as follows.

Figure 5.10: The function

f\colon(a,b)\to\mathbb{R}

has a maximum at

x_{M}

and a local maximum at

x_{L}

. At both these points the tangent line to the graph is horizontal (they are stationary points of

f

Definition 5.33.

Let $I\subseteq\mathbb{R}$ be an open interval, $f\colon I\to\mathbb{R}$ be a function and $a\in I$ .

1

We say $x_{L}\in I$ is a local maximum for $f$ if there exists some open interval $J\subseteq I$ such that $x_{L}\in J$ and $f(x)\leq f(x_{L})$ for all $x\in J$ ;
2

We say $x_{\ell}\in I$ is a local minimum for $f$ if there exists some open interval $J\subseteq I$ such that $x_{\ell}\in J$ and $f(x)\geq f(x_{\ell})$ for all $x\in J$ .

We illustrate the concept of a local maximum in Figure 5.10. Note that if $x_{M}$ is a maximum for $f$ , then it is also a local maximum (since we can just take $J:=I$ ). Similarly, if $x_{m}$ is a minimum for $f$ , then it is also a local minimum.

The following theorem makes precise the intuitive link between local extrema and derivatives. The key to this link is considering the zeros of the derivative $f^{\prime}$ , which we call the stationary points of the function $f$ .

Theorem 5.34 (Stationary point theorem).

Let $I\subseteq\mathbb{R}$ be an open interval and $f\colon I\to\mathbb{R}$ be differentiable.

1

If $x_{L}\in I$ is a local maximum for $f$ then $f^{\prime}(x_{L})=0$ .
2

If $x_{\ell}\in I$ is a local minimum for $f$ , then $f^{\prime}(x_{\ell})=0$ .

In particular, the maximum and minimum of $f$ (if they exist) must occur at stationary points of $f$ .

Proof.

We shall only prove (1), since (2) can be proved using a similar argument, or can be derived from (1) by replacing $f$ with $-f$ .

Suppose $f$ has a local maximum at $x_{L}\in I$ , so that there exists some open interval $J\subseteq I$ with $x_{L}\in J$ such that $f(x)\leq f(x_{L})$ for all $x\in J$ . In particular, if $h\in\mathbb{R}\setminus\{0\}$ is such that $x_{L}+h\in J$ , then $f(x_{L}+h)-f(x_{L})\leq 0$ . Consequently, the difference quotient satisfies

\frac{f(x_{L}+h)-f(x_{L})}{h}>0\quad\text{if $h<0$}\qquad\text{and}\qquad\frac% {f(x_{L}+h)-f(x_{L})}{h}<0\quad\text{if $h>0$.}

Since $f$ is differentiable at $x_{L}$ , the left and right derivatives exist at $x_{L}$ and are equal to $f^{\prime}(x_{L})$ . Furthermore,

f^{\prime}(x_{L})=\lim_{h\to 0-}\frac{f(x_{L}+h)-f(x_{L})}{h}\geq 0\qquad\text% {and}\qquad f^{\prime}(x_{L})=\lim_{h\to 0+}\frac{f(x_{L}+h)-f(x_{L})}{h}\leq 0.

From these two inequalities we conclude that $f^{\prime}(x_{L})=0$ , as required. ∎

Exercise 5.35.

Sketch a figure to illustrate the ideas behind the proof of Theorem 5.34.

Theorem 5.34 is a useful tool for finding the maximum and minimum values of a function, since it limits the possibilities of where these values can occur. However, the following simple example shows that stationary points do not always correspond to local extrema.

Example 5.36.

The converse of Theorem 5.34 does not hold. For instance, the function $f\colon\mathbb{R}\to\mathbb{R}$ given by $f(x):=x^{3}$ is differentiable with $f^{\prime}(x)=3x^{2}$ for all $x\in\mathbb{R}$ and therefore has a stationary point at $x=0$ . However, since $\lim_{x\to\infty}x^{3}=\infty$ and $\lim_{x\to-\infty}x^{3}=-\infty$ , we see that $0$ is neither a local maximum nor a local minimum for $f$ .

So, extreme values must occur at stationary points, but stationary points are not always extrema. We can use information from second-order derivatives to try to further diagnose whether a stationary point is an extremum; we shall return to this topic later.

Rolle’s theorem

As a simple consequence of the stationary point theorem, we deduce the following result.

Theorem 5.37 (Rolle’s Theorem).

Suppose $a$ , $b\in\mathbb{R}$ with $a<b$ . If $f\colon[a,b]\to\mathbb{R}$ is continuous on $[a,b]$ , differentiable on $(a,b)$ and $f(a)=f(b)=0$ , then there exists some $c\in(a,b)$ such that $f^{\prime}(c)=0$ .

Suppose $f\colon\mathbb{R}\to\mathbb{R}$ is differentiable. Rolle’s theorem tells us that between every pair of zeros of $f$ there is a stationary point of $f$ , where the tangent line is horizontal. We illustrate this in Figure 5.11.

Figure 5.11: Rolle’s theorem: between any two zeros of

f

, there exists a stationary point of

f

Proof (of Theorem 5.37).

By the extreme value theorem from Theorem 4.106, the function $f$ attains its minimum value $y_{m}$ and maximum value $y_{M}$ somewhere on $[a,b]$ . If $y_{m}=y_{M}$ , then $f$ is constant. In this case, we see from Definition 5.2 that $f^{\prime}(c)=0$ for all $c\in(a,b)$ and so the claim follows. Thus, we may assume $y_{m}<y_{M}$ .

Since $f(a)=0$ and $y_{M}$ is the maximum value of $f$ , we know $y_{M}\geq 0$ . Consider the case $y_{M}>0$ . Then the value $y_{M}$ is not attained by the function at either of the endpoints $a$ or $b$ of the interval, since $f(a)=f(b)=0$ . However, we know that the value $y_{M}$ is attained by $f$ somewhere in the interval $[a,b]$ , and so there exists some $x_{M}\in(a,b)$ such that $f(x_{M})=y_{M}$ . Thus, $x_{M}$ is a maximum for $f$ and the stationary point theorem (Theorem 5.34) implies that $f^{\prime}(c)=0$ for $c:=x_{M}$ , as required.

It remains to consider the case $y_{M}=0$ . Since $y_{m}<y_{M}$ , it follows that $y_{m}<0$ . We can now use exactly the same argument as in the previous case to show that there exist some minimum $x_{m}\in(a,b)$ for $f$ and therefore $f^{\prime}(c)=0$ for $c:=x_{m}$ , as required. ∎

Exercise 5.38.

Show that each hypothesis of Rolle’s theorem is necessary as follows.

(i)

Show there exists some $f\colon[0,1]\to\mathbb{R}$ which is differentiable on $(0,1)$ and satisfies $f(0)=f(1)$ for which $f^{\prime}(c)\neq 0$ for all $c\in(0,1)$ .
(ii)

Show that there exists some $f\colon[-1,1]\to\mathbb{R}$ which is continuous on $[-1,1]$ , differentiable on $(-1,1)\setminus\{0\}$ , and satisfies $f(-1)=f(1)$ but for which $f^{\prime}(c)\neq 0$ for all $c\in(-1,1)\setminus\{0\}$ .

Note that Rolle’s theorem asserts the existence of a stationary point, but does not say anything about uniqueness. In particular, there can be more than just one stationary point.

Exercise 5.39.

Consider the function $f\colon[-3,3]\to\mathbb{R}$ given by $f(x):=x^{3}-9x$ . Observe that $f$ is continuous on $[-3,3]$ , differentiable on $(-3,3)$ and $f(3)=f(-3)=0$ . Sketch the graph of $f$ and show that $f$ has precisely two stationary points in $(-3,3)$ .

Exercise 5.40.

Sketch an example of a function satisfying the hypotheses of Rolle’s theorem with precisely $5$ stationary points. Here we are interested in a conceptional drawing: you do not need to derive a formula for the function, merely illustrate the concept.

The mean value theorem

Our next step is to prove an important and far-reaching upgrade of Rolle’s theorem.

Theorem 5.41 (Mean value theorem).

Suppose $a,b\in\mathbb{R}$ with $a<b$ . If $f\colon[a,b]\to\mathbb{R}$ is continuous on $[a,b]$ and differentiable on $(a,b)$ then there exists some $c\in(a,b)$ such that

(5.18) (5.18)

\frac{f(b)-f(a)}{b-a}=f^{\prime}(c).

Exercise 5.42.

Show that the mean value theorem implies Rolle’s theorem as a special case.

Before discussing the proof of Theorem 5.41, it’s helpful to spend some time developing intuition for what the result is telling us. We shall actually discuss three different interpretations of the mean value theorem: one here and two others in later sections.⁴⁴ 4 Unfortunately, none of our interpretations will fully explain why Theorem 5.41 is called the ‘mean value’ theorem. The answer is that the expression (5.18) corresponds to the average (or ‘mean’) rate of change for the function $f$ . However, this interpretation relies on integration theory and, in particular, the fundamental theorem of calculus. You will investigate these topics if you take the year 2 course Further Analysis and Several Variable Calculus.

MVT Interpretation 1: Parallel lines.

There is a simple geometric interpretation of the mean value theorem, which is illustrated in Figure 5.12. The figure shows the secant line $S$ through the points $(a,f(a))$ and $(b,f(b))$ on the graph of $f$ . The gradient of $S$ is given by

\frac{f(b)-f(a)}{b-a},

which corresponds to left-hand side of (5.18). The mean value theorem tells us the following: there exists some $c\in(a,b)$ such that the gradient of $S$ is equal to the gradient $f^{\prime}(c)$ of the tangent line $T$ to the graph of $f$ at $(c,f(c))$ . In particular, the secant $S$ and the tangent line $T$ are parallel.

Figure 5.12: The mean value theorem.

In Exercise 5.42, we saw that the mean value theorem implies Rolle’s theorem. In fact, we can also use Rolle’s theorem to prove the mean value theorem (so the two results are equivalent)!

Sketch proof (of Theorem 5.41).

We shall only sketch the details of the proof here: you can fill in the details for yourself (see Exercise 5.43 below)!

The graph of the linear polynomial

x\mapsto f(a)+\frac{f(b)-f(a)}{b-a}(x-a)

is precisely the secant line through $(a,f(a))$ and $(b,f(b))$ . The idea behind the proof is to subtract this linear polynomial from $f$ in order to reduce to a situation where Rolle’s theorem can be applied. More precisely, consider

(5.19) (5.19)

h\colon[a,b]\to\mathbb{R},\qquad h(x):=f(x)-f(a)-\frac{f(b)-f(a)}{b-a}(x-a)% \qquad\text{for all $x\in[a,b]$.}

If we compare the graphs of $f$ and $h$ as in Figure 5.13, then intuitively the graph of $h$ is formed by sliding the graph of $f$ so that the secant line through $(a,f(a))$ and $(b,f(b))$ becomes the horizontal axis. This is exactly the situation where Rolle’s theorem applies.

Rolle’s theorem applied to $h$ tells us there exists some $c\in(a,b)$ such that $h^{\prime}(c)=0$ . Applying this to the definition (5.19) of $h$ and rearranging, we obtain the conclusion of the mean value theorem. ∎

Figure 5.13: Reducing the mean value theorem to Rolle’s theorem. The top curve is the graph of

f

and the bottom curve is the graph of the transformed function

h

, as defined in (5.19).

Exercise 5.43.

Prove the mean value theorem by applying Rolle’s theorem to $h$ as in (5.19).