This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Introduction to Functional Analysis

Vladimir V. Kisil
School of Mathematics, University of Leeds, Leeds LS2 9JT, UK
email: kisilv@maths.leeds.ac.uk
Web: http://v-v-kisil.scienceontheweb.net/

February 16, 2025

Abstract: This is lecture notes for several courses on Functional Analysis at School of Mathematics of University of Leeds. They are based on the notes of Dr. Matt Daws, Prof. Jonathan R. Partington, Dr. David Salinger, and Prof. Alex Strohmaier used in the previous years. Some sections are borrowed from the textbooks, which I used since being a student myself. However all misprints, omissions, and errors are only my responsibility. I am very grateful to numerous students (Filipa Soares de Almeida, Eric Borgnet, Pasc Gavruta, …—just mention few of them) for pointing out some of them. Please let me know if you find more.
The notes are available also for download in PDF.
The suggested textbooks are []. The other nice books with many interesting problems are [].
Exercises with stars are not a part of mandatory material but are nevertheless worth to hear about. And they are not necessarily difficult, try to solve them!

0 Motivating Example: Fourier Series
- 0.1 Fourier series: basic notions
  - 0.1.1 2π-periodic functions
  - 0.1.2 Integrating the complex exponential function
- 0.2 The vibrating string
  - 0.2.1 Separation of variables
  - 0.2.2 Principle of Superposition
- 0.3 Historic: Joseph Fourier
1 Basics of Metric Spaces
- 1.1 Metric Spaces
- 1.2 Useful properties of metric spaces
  - 1.2.1 Cauchy sequences and completeness
  - 1.2.2 Compactness
2 Basics of Linear Spaces
- 2.1 Banach spaces (basic definitions only)
- 2.2 Hilbert spaces
- 2.3 Subspaces
- 2.4 Linear spans
3 Orthogonality
- 3.1 Orthogonal System in Hilbert Space
- 3.2 Bessel’s inequality
- 3.3 The Riesz–Fischer theorem
- 3.4 Construction of Orthonormal Sequences
- 3.5 Orthogonal complements
4 Duality of Linear Spaces
- 4.1 Dual space of a normed space
- 4.2 Self-duality of Hilbert space
5 Fourier Analysis
- 5.1 Fourier series
- 5.2 Fejér’s theorem
- 5.3 Parseval’s formula
- 5.4 Some Application of Fourier Series
6 Operators
- 6.1 Linear operators
- 6.2 Orthoprojections
- 6.3 B(H) as a Banach space (and even algebra)
- 6.4 Adjoints
- 6.5 Hermitian, unitary and normal operators
7 Spectral Theory
- 7.1 The spectrum of an operator on a Hilbert space
- 7.2 The spectral radius formula
- 7.3 Spectrum of Special Operators
8 Compactness
- 8.1 Compact operators
- 8.2 Hilbert–Schmidt operators
9 Compact normal operators
- 9.1 Spectrum of normal operators
- 9.2 Compact normal operators
10 Integral equations
11 Banach and Normed Spaces
- 11.1 Normed spaces
- 11.2 Bounded linear operators
- 11.3 Dual Spaces
- 11.4 Hahn–Banach Theorem
- 11.5 C(X) Spaces
12 Measure Theory
- 12.1 Basic Measure Theory
- 12.2 Extension of Measures
- 12.3 Complex-Valued Measures and Charges
- 12.4 Constructing Measures, Products
13 Integration
- 13.1 Measurable functions
- 13.2 Lebesgue Integral
- 13.3 Properties of the Lebesgue Integral
- 13.4 Integration on Product Measures
- 13.5 Absolute Continuity of Measures
14 Functional Spaces
- 14.1 Integrable Functions
- 14.2 Dense Subspaces in L_p
- 14.3 Continuous functions
- 14.4 Riesz Representation Theorem
15 Fourier Transform
- 15.1 Convolutions on Commutative Groups
- 15.2 Characters of Commutative Groups
- 15.3 Fourier Transform on Commutative Groups
- 15.4 The Schwartz space of smooth rapidly decreasing functions
- 15.5 Fourier Integral
16 Advances of Metric Spaces
- 16.1 The Stone–Weierstrass Theorem
- 16.2 Contraction mappings and fixed point theorems
- 16.3 The Baire Category Theorem and Applications
- 16.4 Semi-norms and locally convex topological vector spaces
A Tutorial Problems
- A.1 Tutorial problems I
- A.2 Tutorial problems II
- A.3 Tutorial Problems III
- A.4 Tutorial Problems IV
- A.5 Tutorial Problems V
- A.6 Tutorial Problems VI
- A.7 Tutorial Problems VII
B Solutions of Tutorial Problems
- B.1 Solution of Tuitorial Problem I
- B.2 Solutions of Tutorial Problems II
- B.3 Solutions of Tutorial Problems III
- B.4 Solutions of Tutorial Problems IV
- B.5 Solutions of Tutorial Problems V
- B.6 Solutions of Tutorial Problems VI
- B.7 Solutions of Tutorial Problems VII
C Course in the Nutshell
- C.1 Some useful results and formulae (1)
- C.2 Some useful results and formulae (2)
D Supplementary Sections
- D.1 Reminder from Complex Analysis

Notations and Assumptions

ℤ₊, ℝ₊ denotes non-negative integers and reals.
x,y,z,… denotes vectors.
λ,µ,ν,… denotes scalars.
ℜ z, ℑ z stand for real and imaginary parts of a complex number z.

Integrability conditions

In this course, the functions we consider will be real or complex valued functions defined on the real line which are locally Riemann integrable. This means that they are Riemann integrable on any finite closed interval [a,b]. (A complex valued function is Riemann integrable iff its real and imaginary parts are Riemann-integrable.) In practice, we shall be dealing mainly with bounded functions that have only a finite number of points of discontinuity in any finite interval. We can relax the boundedness condition to allow improper Riemann integrals, but we then require the integral of the absolute value of the function to converge.

We mention this right at the start to get it out of the way. There are many fascinating subtleties connected with Fourier analysis, but those connected with technical aspects of integration theory are beyond the scope of the course. It turns out that one needs a “better” integral than the Riemann integral: the Lebesgue integral, and I commend the module, Linear Analysis 1, which includes an introduction to that topic which is available to MM students (or you could look it up in Real and Complex Analysis by Walter Rudin). Once one has the Lebesgue integral, one can start thinking about the different classes of functions to which Fourier analysis applies: the modern theory (not available to Fourier himself) can even go beyond functions and deal with generalized functions (distributions) such as the Dirac delta function which may be familiar to some of you from quantum theory.

From now on, when we say “function”, we shall assume the conditions of the first paragraph, unless anything is stated to the contrary.

0 Motivating Example: Fourier Series

0.1 Fourier series: basic notions

Before proceed with an abstract theory we consider a motivating example: Fourier series.

0.1.1 2π-periodic functions

In this part of the course we deal with functions (as above) that are periodic.

We say a function f:ℝ→ℂ is periodic with period T>0 if f(x+T)= f(x) for all x∈ ℝ. For example, sinx, cosx, e^ix(=cos x+i sinx) are periodic with period 2π. For k∈ R∖{0}, sinkx, coskx, and e^ikx are periodic with period 2π/|k|. Constant functions are periodic with period T, for any T>0. We shall specialize to periodic functions with period 2π: we call them 2π-periodic functions, for short. Note that cosnx, sinnx and e^inx are 2π-periodic for n∈ℤ. (Of course these are also 2π/|n|-periodic.)

Any half-open interval of length T is a fundamental domain of a periodic function f of period T. Once you know the values of f on the fundamental domain, you know them everywhere, because any point x in ℝ can be written uniquely as x=w+nT where n∈ ℤ and w is in the fundamental domain. Thus f(x) = f(w+(n−1)T +T)=⋯ =f(w+T) =f(w).

For 2π-periodic functions, we shall usually take the fundamental domain to be ]−π, π]. By abuse of language, we shall sometimes refer to [−π, π] as the fundamental domain. We then have to be aware that f(π)=f(−π).

0.1.2 Integrating the complex exponential function

We shall need to calculate ∫_a^b e^ikx d x, for k∈ℝ. Note first that when k=0, the integrand is the constant function 1, so the result is b−a. For non-zero k, ∫_a^b e^ikx d x= ∫_a^b (coskx+isinkx) d x = (1/k)[ (sinkx − icoskx)]_a^b = (1/ik)[(coskx+isinkx)]_a^b = (1/ik)[e^ikx]_a^b = (1/ik)(e^ikb−e^ika). Note that this is exactly the result you would have got by treating i as a real constant and using the usual formula for integrating e^ax. Note also that the cases k=0 and k≠0 have to be treated separately: this is typical.

Definition 1 Let f:ℝ→ℂ be a 2π-periodic function which is Riemann integrable on [−π, π]. For each n∈ℤ we define the Fourier coefficient f(n) by

f(n) =

2π

∫

−π

f(x) e^−inx d x .

Remark 2

f(n) is a complex number whose modulus is the amplitude and whose argument is the phase (of that component of the original function).
If f and g are Riemann integrable on an interval, then so is their product, so the integral is well-defined.
The constant before the integral is to divide by the length of the interval.
We could replace the range of integration by any interval of length 2π, without altering the result, since the integrand is 2π-periodic.
Note the minus sign in the exponent of the exponential. The reason for this will soon become clear.

Example 3

f(x) = c then f(0) =c and f(n) =0 when n≠0.
f(x) = e^ikx, where k is an integer. f(n) = δ_nk.

f is 2π periodic and f(x) = x on ]−π, π]. (Diagram) Then f(0) = 0 and, for n≠0,

f(n) =

2π

∫

−π

xe^−inx d x =

⎡
⎢
⎢
⎣

−xe^−inx

2π in

⎤
⎥
⎥
⎦

−π

2π

∫

−π

e^inx d x =

(−1)ⁿi

Proposition 4 (Linearity) If f and g are 2π-periodic functions and c and d are complex constants, then, for all n∈ℤ,

(c f + d g6) (n) = cf(n) + dĝ(n) .

Corollary 5 If p(x) is a trigonometric polynomial , p(x)= ∑_−k^k c_ne^inx, then p(n) = c_n for |n|≤ k and =0, for |n|≥ k.

p(x) =

∑

n∈ℤ

p(n)e^inx .

This follows immediately from Ex. 3(2) and Prop.4.

Remark 6

This corollary explains why the minus sign is natural in the definition of the Fourier coefficients.
The first part of the course will be devoted to the question of how far this result can be extended to other 2π-periodic functions, that is, for which functions, and for which interpretations of infinite sums is it true that
f(x) =

∑

n∈ℤ

f(n)e^inx . (1)

Definition 7 ∑_n∈ℤ f(n)e^inx is called the Fourier series of the 2π-periodic function f.

For real-valued functions, the introduction of complex exponentials seems artificial: indeed they can be avoided as follows. We work with (1) in the case of a finite sum: then we can rearrange the sum as

f(0) +

∑

n>0

(f(n) e^inx +f(−n)e^−inx)

f(0) +

∑

n>0

[(f(n)+f(−n))cosnx +i(f(n)−f(−n))sin nx]

a₀

∑

n>0

(a_ncosnx +b_nsinnx)

Here

a_n

(f(n)+f(−n)) =

2π

∫

−π

f(x)(e^−inx+e^inx) d x

∫

−π

f(x)cosnx d x

for n>0 and

b_n =i((f(n)−f(−n))=

∫

−π

f(x)sin nx d x

for n>0. a₀ = 1/π∫_−π^πf(x) d x, the constant chosen for consistency.

The a_n and b_n are also called Fourier coefficients: if it is necessary to distinguish them, we may call them Fourier cosine and sine coefficients, respectively.

We note that if f is real-valued, then the a_n and b_n are real numbers and so ℜ f(n) = ℜ f(−n), ℑ f(−n) = −ℑf(n): thus f(−n) is the complex conjugate of f(n). Further, if f is an even function then all the sine coefficients are 0 and if f is an odd function, all the cosine coefficients are zero. We note further that the sine and cosine coefficients of the functions coskx and sinkx themselves have a particularly simple form: a_k=1 in the first case and b_k=1 in the second. All the rest are zero.

For example, we should expect the 2π-periodic function whose value on ]−π,π] is x to have just sine coefficients: indeed this is the case: a_n=0 and b_n=i(f(n)−f(−n)) = (−1)ⁿ⁺¹2/n for n>0.

The above question can then be reformulated as “to what extent is f(x) represented by the Fourier series a₀/2 + ∑_n>0(a_ncosx + b_nsinx)?” For instance how well does ∑(−1)ⁿ⁺¹(2/n)sinnx represent the 2π-periodic sawtooth function f whose value on ]−π, π] is given by f(x) = x. The easy points are x=0, x=π, where the terms are identically zero. This gives the ‘wrong’ value for x=π, but, if we look at the periodic function near π, we see that it jumps from π to −π, so perhaps the mean of those values isn’t a bad value for the series to converge to. We could conclude that we had defined the function incorrectly to begin with and that its value at the points (2n+1)π should have been zero anyway. In fact one can show (ref. ) that the Fourier series converges at all other points to the given values of f, but I shan’t include the proof in this course. The convergence is not at all uniform (it can’t be, because the partial sums are continuous functions, but the limit is discontinuous.) In particular we get the expansion

= 2(1−1/3+1/5−⋯)

which can also be deduced from the Taylor series for tan⁻¹.

0.2 The vibrating string

In this subsection we shall discuss the formal solutions of the wave equation in a special case which Fourier dealt with in his work.

We discuss the wave equation

∂²y

∂ x²

K²

∂²y

∂ t²

, (2)

subject to the boundary conditions

y(0, t) = y(π, t) = 0, (3)

for all t≥0, and the initial conditions

y(x,0)	=	F(x),
y_t(x,0)	=	0.

This is a mathematical model of a string on a musical instrument (guitar, harp, violin) which is of length π and is plucked, i.e. held in the shape F(x) and released at time t=0. The constant K depends on the length, density and tension of the string. We shall derive the formal solution (that is, a solution which assumes existence and ignores questions of convergence or of domain of definition).

0.2.1 Separation of variables

We first look (as Fourier and others before him did) for solutions of the form y(x,t) = f(x)g(t). Feeding this into the wave equation (2) we get

f^′′(x) g(t) =

K²

f(x) g^′′(t)

and so, dividing by f(x)g(t), we have

f^′′(x)

f(x)

K²

g^′′(t)

g(t)

. (4)

The left-hand side is an expression in x alone, the right-hand side in t alone. The conclusion must be that they are both identically equal to the same constant C, say.

We have f^′′(x) −Cf(x) =0 subject to the condition f(0) = f(π) =0. Working through the method of solving linear second order differential equations tells you that the only solutions occur when C = −n² for some positive integer n and the corresponding solutions, up to constant multiples, are f(x) = sinnx.

Returning to equation (4) gives the equation g^′′(t)+K²n²g(t) =0 which has the general solution g(t) = a_ncosKnt + b_nsinKnt. Thus the solution we get through separation of variables, using the boundary conditions but ignoring the initial conditions, are

y_n(x,t) = sinnx(a_n cosKnt + b_n sinKnt) ,

for n≥ 1.

0.2.2 Principle of Superposition

To get the general solution we just add together all the solutions we have got so far, thus

y(x,t) =

∞

∑

n=1

sinnx(a_n cosKnt + b_n sin Knt) (5)

ignoring questions of convergence. (We can do this for a finite sum without difficulty because we are dealing with a linear differential equation: the iffy bit is to extend to an infinite sum.)

We now apply the initial condition y(x,0) = F(x) (note F has F(0) =F(π) =0). This gives

F(x) =

∞

∑

n=1

a_nsinnx .

We apply the reflection trick: the right-hand side is a series of odd functions so if we extend F to a function G by reflection in the origin, giving

G(x):=

⎧
⎨
⎩

F(x)	, if 0≤ x≤π;
−F(−x)	, if −π<x<0.

we have

G(x) =

∞

∑

n=1

a_nsinnx ,

for −π≤ x ≤ π.

If we multiply through by sinrx and integrate term by term, we get

a_r =

∫

−π

G(x)sinrx d x

so, assuming that this operation is valid, we find that the a_n are precisely the sine coefficients of G. (Those of you who took Real Analysis 2 last year may remember that a sufficient condition for integrating term-by -term is that the series which is integrated is itself uniformly convergent.)

If we now assume, further, that the right-hand side of (5) is differentiable (term by term) we differentiate with respect to t, and set t=0, to get

0=y_t(x,0) =

∞

∑

n=1

b_n K n sinnx. (6)

This equation is solved by the choice b_n=0 for all n, so we have the following result

Proposition 8 (Formal) Assuming that the formal manipulations are valid, a solution of the differential equation (2) with the given boundary and initial conditions is

y(x,t) =

∞

∑

a_n sinnx cosKnt ,(2.11)

where the coefficients a_n are the Fourier sine coefficients

a_n =

∫

−π

G(x)sinnx d x

of the 2π periodic function G, defined on ]−π, π] by reflecting the graph of F in the origin.

Remark 9 This leaves us with the questions

For which F are the manipulations valid?
Is this the only solution of the differential equation? (which I’m not going to try to answer.)
Is b_n=0 all n the only solution of (6)? This is a special case of the uniqueness problem for trigonometric series.

0.3 Historic: Joseph Fourier

Joseph Fourier, Civil Servant, Egyptologist, and mathematician, was born in 1768 in Auxerre, France, son of a tailor. Debarred by birth from a career in the artillery, he was preparing to become a Benedictine monk (in order to be a teacher) when the French Revolution violently altered the course of history and Fourier’s life. He became president of the local revolutionary committee, was arrested during the Terror, but released at the fall of Robespierre.

Fourier then became a pupil at the Ecole Normale (the teachers’ academy) in Paris, studying under such great French mathematicians as Laplace and Lagrange. He became a teacher at the Ecole Polytechnique (the military academy).

He was ordered to serve as a scientist under Napoleon in Egypt. In 1801, Fourier returned to France to become Prefect of the Grenoble region. Among his most notable achievements in that office were the draining of some 20 thousand acres of swamps and the building of a new road across the alps.

During that time he wrote an important survey of Egyptian history (“a masterpiece and a turning point in the subject”).

In 1804 Fourier started the study of the theory of heat conduction, in the course of which he systematically used the sine-and-cosine series which are named after him. At the end of 1807, he submitted a memoir on this work to the Academy of Science. The memoir proved controversial both in terms of his use of Fourier series and of his derivation of the heat equation and was not accepted at that stage. He was able to resubmit a revised version in 1811: this had several important new features, including the introduction of the Fourier transform. With this version of his memoir, he won the Academy’s prize in mathematics. In 1817, Fourier was finally elected to the Academy of Sciences and in 1822 his 1811 memoir was published as “Théorie de la Chaleur”.

For more details see Fourier Analysis by T.W. Körner, 475-480 and for even more, see the biography by J. Herivel Joseph Fourier: the man and the physicist.

What is Fourier analysis. The idea is to analyse functions (into sine and cosines or, equivalently, complex exponentials) to find the underlying frequencies, their strengths (and phases) and, where possible, to see if they can be recombined (synthesis) into the original function. The answers will depend on the original properties of the functions, which often come from physics (heat, electronic or sound waves). This course will give basically a mathematical treatment and so will be interested in mathematical classes of functions (continuity, differentiability properties).

1 Basics of Metric Spaces

1.1 Metric Spaces

1.1.1 Metric spaces: definition and examples

In Analysis and Calculus the definition of convergence was based on the notion of a distance between points, namely the standard distance between two real numbers is given by

d(x,y)=|x−y|.

Similarly, the distance between two points in the plane, given by

d(x,y)=d((x₁,x₂),(y₁,y₂))=

√

(x₁−y₁)²+(x₂−y₂)²

A metric space formalises this notion. This will give us the flexibility to talk about distances on function spaces, for example, or introduce other notions of distance on spaces.

Definition 1 (Metric Space) A metric space (X,d) is a set X together with a function d: X × X → ℝ that satisfies the following properties

d(x,y) ≥ 0; and d(x,y)=0 ⇐⇒ x=y (positive definite);
d(x,y)=d(y,x) (symmetric);
d(x,z) ≤ d(x,y)+d(y,z) (triangle inequality).

The function d is called the metric. The word distance will be used interchangeably with the same meaning.

In this course we are assuming that metric spaces are non-empty.

Example 2

X=ℝ. The standard metric is given by d₁(x,y)=|x−y|. There are many other metrics on ℝ, for example
d(x,y)=|e^x−e^y|;

d(x,y)=
⎧
⎨
⎩
|x−y| if |x−y| ≤ 1,

1 if |x−y| ≥ 1.
Let X be any set whatsoever, then we can define the discrete metric
d(x,y) =
⎧
⎨
⎩
1 if x ≠ y,

0 if x=y.
X=ℝ^m. The standard metric is the Euclidean metric: if x=(x₁,x₂,…,x_m) and y=(y₁,y₂,…,y_m) then
d₂(x,y)= √

(x₁−y₁)²+(x₂−y₂)²+…+(x_m−y_m)²

.

This is linked to the inner-product (scalar product), x ·y=x₁ y₁+ x₂ y₂+…+x_m y_m, since it is just √(x−y).(x−y). We will study inner products more carefully later, so for the moment we won’t prove the (well-known) fact that it is indeed a metric.
Other possible metrics include
d_∞(x,y)=max{|x₁−y₁|,|x₂−y₂|,…,|x_m−y_m|}.

Another metric on ℝ^m comes from the generalisation of our first example:
d₁(x,y)=|x₁−y₁|+|x₂−y₂|+…+|x_m−y_m| .

These metrics d₁, d₂, d_∞ are all translation-invariant (i.e., d(x+z,y+z)=d(x,y)), and positively homogeneous (i.e., d(kx,ky)=|k|d(x,y)), see Ex. 8 for further discussion.
Take X=C[a,b]. Here are three metrics similar to above ones:
d₂(f,g)=

√

b

∫

a

| f(x)−g(x) |²   d x

.

Again, this is linked to the idea of an inner product, so we will delay proving that it is a metric.
d₁(f,g)=
b

∫

a

|f(x)−g(x)|   d x,

the area between two graphs
d_∞(f,g)=max{ |f(x)−g(x)|: a ≤ x ≤ b},

the maximum vertical separation between two graphs.

Example 3 On C[0,1] take f(x)=x and g(x)=x² and calculate

d₂(f,g)

⎛
⎜
⎜
⎝

∫

(x−x²)²   d x

⎞
⎟
⎟
⎠

1/2

√

1/30

d₁(f,g)

∫

|x−x²|   d x = 1/6, and

d_∞(f,g)

max

x ∈ [0,1]

|x−x²|=1/4.

Remark 4 Any subset of a metric space is again a metric space its own right, by restricting the distance function to the subset.

Example 5

The interval [a,b] with d(x,y)=|x−y| is a subspace of ℝ.
The unit circle {(x₁,x₂) ∈ ℝ²: x₁²+x₂²=1 } with d₂(x,y)=√(x₁−y₁)²+(x₂−y₂)² is a subspace of ℝ².
The space of polynomials P is a metric space with any of the metrics inherited from C[a,b] above.

Definition 6 A normed space (V,||· ||) is a real vector space V with a map ||·||: V → ℝ (called norm) satisfying

||v|| ≥ 0, and (||v||=0 ⇔ v=0),
||λ v|| = |λ | || v|| ,
||v+w|| ≤ ||v||+ ||w||.

Exercise 7 Prove that V is a metric space with metric d(v,w):=||v−w||.

Exercise 8

Write norms ||·||₁, ||·||₂, ||·||_∞ on ℝ^m which produces metrics d₁, d₂, d_∞ from Ex. 2.2(3).
Hint: see (11) and (9) below.

Show, that the following are norms on the vector space V=C[a,b]:

|| f ||₁

∫

|f(x)|   d x,

|| f ||₂

∫

|f(x)|²   d x,

|| f ||_∞

sup

x∈[a,b]

|f(x)|.

Furthermore, these norms generate the respective metrics d₁, d₂ and d_∞ from Ex. 2(2(4)) as indicated in the previous exercise.

Definition 9 An inner product space (V,⟨·, ·⟩) is a real vector space V with a map ⟨·, ·⟩: V × V → ℝ (called inner product) satisfying

⟨ λ v,w ⟩ = λ ⟨ v,w ⟩,
⟨ v₁ +v₂ ,w ⟩ = ⟨ v₁,w ⟩+⟨ v₂,w ⟩,
⟨ v,w ⟩ = ⟨ w,v ⟩,
⟨ v,v ⟩ ≥ 0, and (⟨ v,v ⟩ =0 ⇔ v=0).

Exercise 10

Prove that the Cauchy–Schwarz inequality | ⟨ v,w ⟩ |² ≤ ⟨ v,v ⟩ ⟨ w,w ⟩ holds.
Hint: start by considering the expression ⟨ v + λ w ,v + λ w⟩ ≥ 0 and analyse the discriminant of the quadratic expression for λ.
Then prove that V is a normed space with norm || v||:= ⟨ v,v ⟩^1/2.
Which of the above norms ||·||₁, ||·||₂, ||·||_∞ from Ex. 8 can be obtained from an inner product as described in the previous item?

There is a natural name for a class of maps, which preserve metrics:

Definition 11 (Isometry) Let (X, d_X) and (Y, d_Y) be two metric spaces. A map φ: X → Y is an isometry if

d_Y(φ(x₁), φ(x₂)) = d_X(x₁, x₂) for all x₁, x₂ ∈ X.

A metric space (X,d_X) is isometric to a metric space (Y,d_Y) if there is an isometry bijection between X and Y.

1.1.2 Open and closed sets

Definition 12 (Open and closed balls) Let (X,d) be a metric space, let x ∈ X and let r>0. The open ball centred at x, with radius r, is the set

B_r(x)={y ∈ X: d(x,y)<r },

and the closed ball is the set

B_r(x)

={y ∈ X: d(x,y) ≤ r }.

Trivial but useful observations are:

x ∈ B_r(x) ⊂ B_r(x) for all x∈ X and r>0, so neither ball is empty and every point is covered by all balls it centres;
B_r(x) ⊂ B_r+ε(x) for all x∈ X, r>0 and whatever small ε>0.

Note, that in ℝ with the usual metric the open ball is B_r(x)=(x−r,x+r), an open interval, and the closed ball is B_r(x)=[x−r,x+r], a closed interval.

For the d₂ metric on ℝ², the unit ball, B₁(0), is disc centred at the origin, excluding the boundary. You may like to think about what you get for other metrics on ℝ². What are balls in the discrete metric, Ex. 2.2(2)?

Definition 13 (Open sets) A subset U of a metric space (X,d) is said to be open , if for each point x ∈ U there is an r>0 such that the open ball B_r(x) is contained in U (“room to swing a cat").

Clearly X itself is an open set, that is the whole metric space is open in itself. Also the empty set ∅ is also considered to be open in a trivial way.

Remark 14 Note that the property “be open” of a set depends on the metric space. For example if we consider the set [0,1] it is open in the metric space [0,1] with the standard metric, but not open in the set ℝ with standard metric.

Proposition 15 Every “open ball" B_r(x) is an open set.

Proof. For if y ∈ B_r(x), choose δ=r−d(x,y). We claim that B_δ(y) ⊂ B_r(x).

If z ∈ B_δ(y), i.e., d(z,y)<δ, then by the triangle inequality

d(z,x) ≤ d(z,y)+d(y,x) < δ + d(x,y) = r.

So z ∈ B_r(x). □

Definition 16 (Closed set) A subset F of (X,d) is said to be closed , if its complement X ∖ F is open.

Note that closed does not mean “not open". In a metric space the sets ∅ and X are both open and closed. In ℝ we have:

(a,b) is open.
[a,b] is closed, since its complement (−∞,a) ∪ (b,∞) is open.
[a,b) is not open, since there is no open ball B(a,r) contained in the set. Nor it is closed, since its complement (−∞,a) ∪ [b,∞) isn’t open (no ball centred at b can be contained in the set).

Remark 17 As it can be seen from the definitions the property of a subset F to be open or closed depends from the surrounding space X. For example:

The interval [0,1) is open as a subset of the space [0,2] and is not open as a subset of ℝ (both are taken with the usual metric).
The same interval [0,1) is closed as a subset of the space (−1,1) and is not open as subset ℝ (again, both are taken with the usual metric).

Example 18 If we take the discrete metric,

d(x,y)=

⎧
⎨
⎩

1	if x ≠ y,
0	if x=y,

then each point {x}=B_1/2(x) so is an open set. Hence every set U is open, since for x ∈ U we have B_1/2(x) ⊆ U. Hence, by taking complements, every set is also closed.

Theorem 19 In a metric space, every one-point set {x₀} is closed.

Proof. We need to show that the set U={x ∈ X: x ≠ x₀} is open, so take a point x ∈ U. Now d(x,x₀)>0, and the ball B_r(x) is contained in U for every 0<r< d(x,x₀). □

Theorem 20 Let (U_α)_{α ∈ A} be any collection of open subsets of a metric space (X,d) (not necessarily finite!). Then ∪_{α ∈ A} U_α is open. Let U and V be open subsets of a metric space (X,d). Then U ∩ V is open. Hence (by induction) any finite intersection of open subsets is open.

Proof. If x ∈ ∪_{α ∈ A} U_α then there is an α with x ∈ U_α. Now U_α is open, so B_r(x) ⊂ U_α for some r>0. Then B_r(x) ⊂ ∪_{α ∈ A} U_α so the union is open.

If now U and V are open and x ∈ U ∩ V, then ∃ r>0 and s>0 such that B_r(x) ⊂ U and B(x,s) ⊂ V, since U and V are open. Then B(x,t) ⊂ U ∩ V if t ≤ min(r,s). □

Remark 21 Here we used a common property, which is helpful to remember: the minimum of a finite set of positive numbers is always positive (bigger than 0). However, the infimum of an infinite set of positive numbers can be zero, e.g. inf{1/n: n∈ ℕ}=0. Therefore, a transition from a given infinite set to a suitable finite set will be a reacquiring theme in our course, cf. compact set later in the course.

Thereafter, the collection of open sets is preserved by arbitrary unions and finite intersections.

However, an arbitrary intersection of open sets is not always open; for example (−1/n,1/n) is open for each n=1,2,3,…, but ∩_n=1^∞(−1/n,1/n)= {0}, which is not an open set.

For closed sets we swap union and intersection.

Theorem 22 Let (F_α)_{α ∈ A} be any collection of closed subsets of a metric space (X,d) (not necessarily finite!). Then ∩_{α ∈ A} F_α is closed. Let F and G be closed subsets of a metric space (X,d). Then F ∪ G is closed. Hence (by induction) any finite union of closed subsets is closed.

Proof.

To prove this we recall de Morgan’s laws. We use the notation S^c for the complement X ∖ S of a set S ⊂ X.

x ∉

∪

A_α

⇐⇒

x ∉A_α for all α, so (∪A_α)^c = ∩A_α^c.

x ∉

∩

A_α

⇐⇒

x ∉A_α for some α, so (∩A_α)^c = ∪A_α^c.

Write U_α= F_α^c =X ∖ F_α which is open. So ∪_{α ∈ A} U_α is open by Theorem 20. Now, by de Morgan’s laws, (∩_{α ∈ A} F_α)^c = ∪_{α ∈ A} F_α^c. This is just ∪_{α ∈ A} U_α. Since the complement of ∩_{α ∈ A} F_α is open, it is closed.

Similarly, the complement of F ∪ G is F^c ∩ G^c, which is the intersection of two open sets and hence open by Theorem 20. Hence F ∪ G is closed. □

Infinite unions of closed sets do not need to be closed. An example is

∞

∪

n=1

[

,∞)=(0,∞),

which is open but not closed in ℝ with standard metric.

Definition 23 (Closure of a set) The closure of S, written S, is the smallest closed set containing S, and is contained in all other closed sets containing S.

The above smallest closed set containing S does exist, because we can define

= ∩{F: F ⊃ S and F closed },

the intersection of all closed sets containing S. There is at least one closed set containing S, namely X itself.

Example 24 In the metric space ℝ the closure of S=[0,1) is [0,1]. This is closed, and there is nothing smaller that is closed and contains S.

Exercise 25 Give an example of an open ball B_r(x) and the respective closed ball B_r(x) with the same centre and radius in a metric space X, such that B_r(x) is not the closure of B_r(x). Note the slight discontent on our notations, which shall not mislead us in future.

Definition 26 (Dense subset) Let X be a metric space. A subset S⊂ X is dense in X if S=X.

More generally, we say that a subset S of a set Y⊂ X is dense in Y if every point of Y is the limit of some convergent sequence consisting of elements of S.

Theorem 27 The set ℚ of rationals is dense in ℝ, with the usual metric.

Proof. Suppose that F is a closed subset of ℝ which contains ℚ: we claim that it F=ℝ.

For U=ℝ ∖ F is open and contains no points of ℚ. But an open set U (unless it is empty) must contain an interval B_r(x) for some x ∈ U, and hence a rational number within it.

Our only conclusion is that U=∅ and F=ℝ, so that ℚ=ℝ. □

Definition 28 (Neighbourhood) We say that V is a neighbourhood (nbh) of x if there is an open set U such that x ∈ U ⊆ V; this means that ∃ δ>0 s.t. B_δ(x) ⊆ V. Thus, a set is open precisely when it is a neighbourhood of each of its points.

Example 29 The half-open interval [0,1) is a neighbourhood of every point in it except for 0.

Theorem 30 For a subset S of a metric space X, we have x∈ S iff V ∩ S ≠ ∅ for all nhds V of x (i.e., all neighbourhoods of x meet S).

Proof. If there is a neighbourhood of x that doesn’t meet S, then there is an open subset U with x ∈ U and U ∩ S=∅.

But then X ∖ U is a closed set containing S and so S ⊂ X ∖ U, and then x ∉ S because x ∈ U.

Conversely, if every neighbourhood of x does meet S, then x ∈ S, as otherwise X ∖ S is as open neighbourhood of x that doesn’t meet S. □

Definition 31 (Interior) The interior of S, intS, is the largest open set contained in S, and can be written as

intS = ∪{ U: U ⊂ S and U open }.

the union of all open sets contained in S. There is at least open set within S, namely ∅.

We see that S is open exactly when S=intS, otherwise intS is smaller.

Example 32

In the metric space ℝ we have int[0,1)=(0,1); clearly this is open and there is no larger open set contained in [0,1).
intℚ = ∅. For any non-empty open set must contain an interval B_r(x) and then it contains an irrational number, so isn’t contained in ℚ.

Proposition 33 intS=X ∖ (X∖ S).

Proof. By De Morgan’s laws,

intS

∪{ U: U ⊂ S and U open }

X ∖ ∩{U^c: U ⊂ S and U open }

X ∖ ∩{F: F ⊃ (X ∖ S) and F closed }

X ∖ (

X∖ S

This is because U ⊂ S if and only if U^c= (X ∖ U) ⊃ (X ∖ S). Also F=U^c is closed precisely when U is open. That is, there is a correspondence between open sets contained in S and closed sets containing its complement. □

1.1.3 Convergence and continuity

Let (x_n) be a sequence in a metric space (X,d), i.e., x₁,x₂,…. (Sometimes we may start counting at x₀.)

Definition 34 (Convergence) We say x_n → x (i.e., x_n converges to x) if d(x_n,x) → 0 as n → ∞.

In other words: x_n → x if for any ε>0 there exists N∈ℕ such that for all n>N we have d(x,x_n) < ε.

This is the usual notion of convergence if we think of points in ℝ^d with the Euclidean metric.

Theorem 35 Let (x_n) be a sequence in a metric space (X,d). Then the following are equivalent:

x_n → x;
for every open U with x ∈ U, there exists an N>0 such that (n>N) x_n ∈ U;
for every ε>0 there exists an N>0 such that (n>N) x_n ∈ B_ε(x).

Proof. 35(1) ⇒ 35(2) If x_n → x and x ∈ U, then there is a ball B_ε(x) ⊂ U, since U is open. But x_n → x so d(x_n,x) < ε for n sufficiently large, i.e., x_n ∈ U for n sufficiently large.

35(2) ⇒ 35(3) is obvious.

Finally, 35(3) ⇒ 35(1). If the 35(3) condition works for a given ε>0 and large n the inclusion x_n ∈ B_ε(x) implies d(x_n,x)<ε. □

Theorem 36 Let S be a subset of the metric space X. Then x ∈ S if and only if there is a sequence (x_n) of points of S with x_n → x.

Proof. If x ∈ S, then for each n we have B_1/n(x) ∩ S ≠ ∅ by Theorem 30. So choose x_n ∈ B_1/n(x) ∩ S. Clearly d(x_n,x) → 0, i.e., x_n → x.

Conversely, if x ∉S, then there is a neighbourhood U of x with U ∩ S=∅. Now no sequence in S can get into U so it cannot converge to x. □

This can also be phrased as follows, characterising closed set in terms of sequences.

Corollary 37 (Closedness under taking limits) A subset Y ⊂ X of a metric space (X,d) is closed if and only if for every sequence (x_n) in Y that is convergent in X its limit is also in Y.

Hence, the closure S is obtained from S by adding all possible limit points of sequences in S.

Example 38

Take (ℝ²,d₁), where d₁(x,y)=|x₁−y₁|+|x₂−y₂|, where x=(x₁,x₂) and y=(y₁,y₂), and consider the sequence (1/n,2n+1/n+1). We guess its limit is (0,2). To see if this is right, look at
d₁ ⎛
⎜
⎜
⎝ ⎛
⎜
⎜
⎝
1

n

,
2n+1

n+1

⎞
⎟
⎟
⎠ ,(0,2) ⎞
⎟
⎟
⎠ = ⎪
⎪
⎪
⎪
1

n

⎪
⎪
⎪
⎪ + ⎪
⎪
⎪
⎪
2n+1

n+1

−2 ⎪
⎪
⎪
⎪ =
1

n

+
1

n+1

→ 0

as n → ∞. So the limit is (0,2).
In C[0,1] let f_n(t)=tⁿ and f(t)=0 for 0 ≤ t ≤ 1. Does f_n → f, (a) in d₁, and (b) in d_∞?
(a)
d₁(f_n,f)=
1

∫

0

tⁿ   dt =
1

n+1

→ 0

as n → ∞. So f_n → f in d₁.
(b)
d_∞(f_n,f)=max{tⁿ: 0 ≤ t ≤ 1}=1 ¬→0

as n → ∞. So f_n ¬→f in d_∞.
Note: Say g_n → g pointwise on [a,b] as n → ∞ if g_n(x) → g(x) for all x ∈ [a,b]. If we define g(x)= {
0 for 0 ≤ x < 1,

1 for x=1,

then f_n → g pointwise on [0,1]. But g ∉C[0,1], as it is not continuous at 1.
Take the discrete metric
d₀(x,y)=
⎧
⎨
⎩
1 if x ≠ y,

0 if x=y.

Then x_n → x ⇐⇒ d₀(x_n,x) → 0. But since d₀(x_n,x)=0 or 1, this happens if and only if d₀(x_n,x)=0 for n sufficiently large. That is, there is an n₀ such that x_n=x for all n ≥ n₀.
All convergent sequences in this metric are eventually constant. So, for example d₀(1/n,0) ¬→0.

A result on convergence in ℝ^m.

Proposition 39 Take ℝ² with any of the metrics d₁, d₂ and d_∞. Then a sequence x_n=(a_n,b_n) converges to x=(a,b) if and only if a_n → a and b_n → b.

Proof. A useful observation is that for any x_n and x:

d₁(x_n,x) ≥ d₂(x_n,x) ≥ d_∞(x_n,x).

If a_n → a and b_n → b, then for any ε>0 there are N_a and N_b such that for N> N_a we have | a_n−a |<ε/2 and for n>N_b | b_n−b |<ε/2. Thus for any n > N=max(N_a, N_b):

ε >

⎪
⎪

a_n−a

⎪
⎪

b_n−b

⎪
⎪

= d₁(x_n,x) ≥ d₂(x_n,x) ≥ d_∞(x_n,x) ,

which shows the convergence in all three metrics.

To show the opposite, WLOG assume towards a contradiction that a_n ¬→a, that is, there exits ε>0 such that for any N there exists n>N such that | a_n−a |>ε. Then:

d₁(x_n,x) ≥ d₂(x_n,x) ≥ d_∞(x_n,x)= max{

⎪
⎪

a_n−a

⎪
⎪

b_n−b

⎪
⎪

a_n−a

⎪
⎪

>ε

showing the divergence in all three norms.

□

A similar result holds for ℝ^m in general.

Now let’s look at continuous functions again.

Theorem 40 If f_n → f in (C[a,b],d_∞), then f_n → f in (C[a,b],d₁).

Informally speaking, d_∞ convergence is stronger than d₁ convergence.

Proof. d_∞(f_n,f)=max{|f_n(x)−f(x)|:   a ≤ x ≤ b} → 0 as n → ∞, so, given ε>0 there is an N so that d_∞(f_n,f)<ε for n ≥ N. It follows that if n ≥ N then

d₁(f_n,f) =

∫

|f_n(x)−f(x)|   d x ≤

∫

ε   d x = ε(b−a),

so d₁(f_n,f) → 0 as n → ∞. □

Remark 41 It is also true that if d_∞(f_n,f) → 0 then f_n → f point-wise on [a,b]. The converse is false, cf. 38(38(2)).

Now we look at continuous functions between general metric spaces.

Definition 42 (Continuity) Let f: (X,d_X) → (Y,d_Y) be a map between metric spaces. We say that f is continuous at x ∈ X if for each ε>0 there is a δ_ε,x>0 such that d_Y(f(x′),f(x)) < ε for all x′∈ X whenever d_X(x′,x) < δ_ε,x.

Another way of saying the same is that for every ε>0 there exists a δ>0 such that

f(B_δ(x)) ⊂ B_ε(f(x)).

The map f is continuous, if it is continuous at all points of X.

Theorem 43 (Sequential continuity) For f as above, f is continuous at a if and only if, whenever a sequence x_n → a, then f(x_n) → f(a).

In short, f is continuous at a if and only if f permutes with the limit:

⎛
⎜
⎜
⎝

lim

n→ ∞

x_n

⎞
⎟
⎟
⎠

lim

n→ ∞

⎛
⎝

x_n

⎞
⎠

(7)

for any sequence x_n → a.

Proof. Same proof as in real analysis, more or less. If f is continuous at a and x_n → a, then for each ε>0 we have a δ>0 such that d_Y(f(x),f(a)) < ε whenever d_X(x,a) < δ.

Then there’s an n₀ with d(x_n,a)<δ for all n ≥ n₀, and so d(f(x_n),f(a))<ε for all n ≥ n₀. Thus f(x_n) → f(x).

Conversely, if f is not continuous at a, then there is an ε for which no δ will do, so we can find x_n with d(x_n,a)<1/n, but d(f(x_n),f(a)) ≥ ε. Then x_n → a but f(x_n) ¬→f(a). □

But there is a nicer way to define continuity. For a mapping f: X → Y and a set U ⊂ Y, let f⁻¹(U) be the set, called pre-image or inverse image

f⁻¹(U)={ x ∈ X: f(x) ∈ U }.

This makes sense even if f⁻¹ is not defined as a function.

Theorem 44 (Continuity and open sets) A function f: X → Y is continuous if and only if f⁻¹(U) is open in X for every open subset U ⊂ Y. In short: the inverse image of an open set is open.

Proof. Suppose that f is continuous, that U⊂ Y is open, and that x₀ ∈ f⁻¹(U), so f(x₀) ∈ U. Now there is a ball B_ε(f(x₀)) ⊂ U, since U is open, and then by continuity there is a δ>0 such that d_Y(f(x),f(x₀)) < ε whenever d_X(x,x₀) < δ. This means that for d(x,x₀)<δ, f(x) ∈ U and so x ∈ f⁻¹(U). That is, f⁻¹(U) is open.

Conversely, if the inverse image of an open set is open, and x₀ ∈ X, let ε>0 be given. We know that B_ε(f(x₀)) is open, so f⁻¹(B(f(x₀),ε)) is open, and contains x₀. So it contains some B_δ(x₀) with δ>0.

But now if d(x,x₀)<δ, we have x ∈ B_δ(x₀) ⊂ f⁻¹(B_ε(f(x₀))) so f(x) ∈ B_ε(f(x₀)) and we have d(f(x),f(x₀))<ε. □

Remark 45 Note that for f continuous we do not expect f(U) to be open for all open subsets of X, for example f: ℝ → ℝ, f ≡ 0, then f(ℝ)={0}, not open.

Example 46 Let X=ℝ with the discrete metric, and Y any metric space. Then all functions f: X → Y are continuous! Indeed, in either way:

Because the inverse image of an open set is an open set, since all sets are open.
Because whenever x_n → x₀ we have x_n=x₀ for n large, so obviously f(x_n) → f(x₀).

Exercise 47 Which functions from a metric space X to the discrete metric space are continuous? Which function from the discrete metric space to ℝ are continuous?

Proposition 48 Let X and Y be metric spaces.

A function f : X → Y is continuous if and only if f⁻¹(F) is closed whenever F is a closed subset of Y.
If f: X → Y and g: Y → Z are continuous, then so is the composition g ∘ f: X → Z defined by (g ∘ f)(x) = g(f(x)).

Proof.

We can do this by complements, as if F is closed, then U=F^c is open, and f⁻¹(F)=f⁻¹(U)^c (a point is mapped into F if and only if it isn’t mapped into U).
Then f⁻¹(F) is always closed when F is closed ⇐⇒ f⁻¹(U) is always open when U is open.
Take U ⊂ Z open; then (g ′ f)⁻¹(U) = f⁻¹(g⁻¹(U)); for these are the points which map under f into g⁻¹(U) so that they map under g ′ f into U.
Now g⁻¹(U) is open in Y, as g is continuous, and then f⁻¹(g⁻¹(U)) is open in X since f is continuous.

□

In many cases we may need a stronger notion.

Definition 49 (Uniform continuity) A function f: (X,d_X) → (Y, d_Y) is called uniformly continuous if for each ε>0 there exists δ_ε>0 such that whenever x,x′∈ X satisfy d_X(x,x′)≤δ_ε, we have that d_Y(f(x),f(x′))≤ε.

Note, that here the same δ_ε shall work for all x∈ X. Thus any uniformly continuous function is continuous at every point. On the other hand the function f(x)=1/x on (0,1) is continuous but not uniformly continuous.

1.2 Useful properties of metric spaces

Metric spaces may or may not have some useful properties which we are discussing in the following subsections: completeness and compactness.

1.2.1 Cauchy sequences and completeness

Recall that if (X,d) is a metric space, then a sequence (x_n) of elements of X converges to x∈ X if d(x_n,x) → 0, i.e., if given ε>0 there exists N such that d(x_n,x)< ε whenever n ≥ N. Thus, to show that a sequence is convergent from the definition we need to present its limit x which may not belong to the sequence (x_n). It would be convenient to deduce convergence of (x_n) just through its own properties without a reference to extraneous x. This is possible for complete metric spaces studied in this subsection.

Often we think of convergent sequences as ones where x_n and x_m are close together when n and m are large. This is almost, but not quite, the same thing in a general metric space.

Definition 50 (Cauchy Sequence) A sequence (x_n) in a metric space (X,d) is a Cauchy sequence if for any ε>0 there is an N such that d(x_n,x_m)<ε for all n, m ≥ N.

Example 51 Take x_n=1/n in ℝ with the usual metric. Now d(x_n,x_m)=|1/n−1/m|. Suppose that n and m are both at least as big as N; then d(x_n,x_m) ≤ 1/N. Hence if ε>0 and we take N>1/ε, we have d(x_n,x_m)≤ 1/N <ε whenever n and m are both ≥ N.

In fact all convergent sequences are Cauchy sequences, by the following result.

Theorem 52 Suppose that (x_n) is a convergent sequence in a metric space (X,d), i.e., there is a limit point x such that d(x_n,x) → 0. Then (x_n) is a Cauchy sequence.

Proof. Take ε>0. Then there is an N such that d(x_n,x)<ε/2 whenever n ≥ N. Now suppose both n≥ N and m ≥ N. Then

d(x_n,x_m) ≤ d(x_n,x)+d(x,x_m) = d(x_n,x)+d(x_m,x) < ε/2+ε/2=ε,

and we are done. □

Proposition 53 Every subsequence of a Cauchy sequence is a Cauchy sequence.

Proof. If (x_n) is Cauchy and (x_{n_k}) is a subsequence, then given ε>0 there is an N such that d(x_n,x_m) < ε whenever n, m ≥ N. Now there is a K such that n_k ≥ N whenever k ≥ K. So d(x_{n_k},x_{n_l})<ε whenever k, l ≥ K. □

Does every Cauchy sequence converge?

Example 54

(X,d)=ℚ, as a subspace of ℝ with the usual metric. Take x₀=2 and define x_n+1=x_n/2+1/x_n. The sequence continues 3/2, 17/12, 577/408,… and indeed the sequence converges in ℝ as x_n → x where x=x/2+1/x, i.e., x²=2. But this isn’t in ℚ.
Thus (x_n) is Cauchy in ℝ, since it converges to √2 when we think of it as a sequence in ℝ. So it is Cauchy in ℚ, but doesn’t converge to a point of ℚ.
Easier. Take (X,d)=(0,1). Then (1/n) is a Cauchy sequence in X (since it is Cauchy in ℝ, as seen above), and has no limit in X.

In each case there are “points missing from X”.

Definition 55 (Completeness) A metric space (X,d) is complete if every Cauchy sequence in X converges to a limit in X.

Theorem 56 The metric space ℝ is complete.

Remark 57 In parts of the literature ℝ is simply defined as the completion of ℚ. In this case one does not have to prove that ℝ is complete, but it is complete by construction. One then has to work a bit to show that it is also a field.

This is a result from the first year. Since its proof depends on the definition of ℝ we will not demonstrate it here.

Example 58

Open intervals in ℝ are not complete; closed intervals are complete.

What about C[a,b] with d₁, d₂ or d_∞?

Following our consideration in Ex. 38.38(2), define f_n in C[0,2] by

f_n(x)=

⎧
⎨
⎩

xⁿ	for 0 ≤ x ≤ 1,
1	for 1 ≤ x ≤ 2.

[DIAGRAM]

Then

d₁(f_n,f_m)

∫

|f_n(x)−f_m(x)|   d x

∫

|xⁿ−x^m|   d x

∫

(x^m−xⁿ)   d x if n ≥ m

m+1

−

n+1

≤

m+1

→ 0,

and hence (f_n) is Cauchy in (C[0,2],d₁). Does the sequence converge?

If there is an f ∈ C[0,2] with f_n → f as n → ∞, then ∫₀² |f_n(x)−f(x)|   d x → 0, so ∫₀¹ and ∫₁² both tend to zero. So f_n → f in (C[0,1],d₁), which means that f(x)=0 on [0,1] (from an example we did earlier). Likewise, f=1 on [1,2], which doesn’t give a continuous limit.

Similarly, (C[a,b],d₁) is incomplete in general. Also it is incomplete in the d₂ metric, as the same example shows (a similar calculation with squares of functions). We will see later that it is complete in the d_∞ metric.

Remark 59 Note that ℝ² is also complete with any of the metrics d₁, d₂ and d_∞; since a Cauchy/ convergent sequence (v_n)=(x_n,y_n) in ℝ² is just one in which both (x_n) and (y_n) are Cauchy/ convergent sequences in ℝ (cf. Prop. 39).

Similar arguments show that ℝ^k is also complete for k=1,2,3,…, and (with the same proof as for Corollary) all closed subsets of ℝ^k are complete.

If a metric space (X,d) is not complete one can always pass to its abstract completion in the following sense.

Proposition 60 (Abstract completion) Any metric space (X,d) is isometric to a dense subspace of a complete metric space, which is called its abstract completion if (X,d).

Proof.[Sketch of proof] We describe a metric space (X^′,d^′) in which X is isometric to a dense subset. Consider the space X′ of Cauchy sequences of X. We define an equivalence relation ∼ on X′ by

(x_n) ∼ (y_n) ⇔ d(x_n,y_n) → 0.

The set X^′ is defined to be the set of equivalence classes [(x_n)]. It has a well defined metric given by

d^′([(x_n)],[(y_n)]):=

lim

n → ∞

d(x_n,y_n).

One checks easily that this is metric and is well defined (does not depend on the chosen representative x_n of [(x_n)]). Now there is an injective map X → X^′ defined by sending x to the constant sequence (x,x,x,…). This map is an isometry. We can therefore think of (X,d) as a subset of (X^′,d^′). This subset is dense because every Cauchy sequence can be approximated by a sequence of constant sequences. So the only difficult bit in this construction is to show that (X^′,d^′) is complete. We will sketch the construction of a limit here. It turns out that it verifies completeness on a dense set.

Lemma 61 Suppose that (X,d) is a metric space and let Y ⊂ X be a dense set with the property that every Cauchy sequence in Y has a limit in X. Then (X,d) is complete.

Proof. Let (x_n) be a Cauchy sequence in X. Now replace x_n with another sequence y_n in Y such that d(x_n,y_n)<1/n. Then, by the triangle inequality, y_n is again a Cauchy sequence and converges, by assumption, to some x ∈ X. Then also x_n converges to x. □

Let us turn to the proof of completeness of X′. Suppose that (x_n) is a Cauchy sequence in X. Then, in X′ this sequence has the form ((x₁,x₁,…),(x₂,x₂,…),(x₃,x₃,…),…). This sequence has a limit, namely, (x_n) itself. □

Exercise 62 (Extension by continuity) Let (X,d) be a metric space and X₁ be a dense subset of X. Let f: X₁ → Y be a uniformly continuous function to a complete metric space (Y,d′). Show that there is a unique function f′: X → Y which satisfies two properties:

restriction of f′ to X₁ coincides with f, that is f′(x)=f(x) for all x∈ X₁;
f′ is continuous on X.

Furthermore, it can be shown that f′ is uniformly continuous on X. We will call f′ the extension of f by continuity and will often keep the same letter f to denote f′.

There are many important consequences of Ex. 62, in particular the following.

Corollary 63 All abstract completions of a metric space (X,d) are isometric, in other words, the abstract completions is unique up to isometry.

1.2.2 Compactness

Accordingly to a dictionary: compact—closely and firmly united or packed together. For a metric space a meaning of “closely and firmly united” can be defined in several different forms—through open coverings or convergent subsequences—and we will see that these interpretations are equivalent.

An open cover of a metric space (X,d) is a family of open sets (U_α)_{α ∈ I} such that

∪

α ∈ I

U_α=X.

A subcover of a cover is a subset I′ ⊂ I of the index set such that (U_α)_{α ∈ I′} is still a cover.

Definition 64 (Compactness) A metric space (X,d) is called compact if every open cover has a finite subcover.

Informally: a space is compact if any infinite open covering is excessive and can be reduced just to a finite one. An example of a compact set is [0,1] and example of non-compact—all reals or the open interval (0,1). An importance of this concept is clarified by Rem. 21.

Definition 65 (Sequential Compactness) A metric space (X,d) is called sequentially compact if every sequence (x_n)_{n ∈ ℕ} in X has a convergent subsequence.

The limit of a convergent sequence is called the accumulation point of {x_n}. It is instructive to compare the definitions of:

x is the limit of {x_n}:	∀ ε>0 ∃ N ∀ n>N: d(x, x_n) < ε;
x is an accumulation point of {x_n}:	∀ ε>0 ∀ N ∃ n>N: d(x, x_n) < ε.

Thereafter, x is not an accumulation point of {x_n} if for some ε>0 and some N for all subsequent n>N we have d(x, x_n)>ε.

Informally: a space is sequentially compact if there is no room to place infinite number of points sufficiently apart from each other to avoid their condensation to a limit. Taking the sequence x_n=n shows that the set of all reals is not sequentially compact. On the other hand, we know from previous years that bounded closed set in ℝⁿ every sequence has a convergent subsequence. Therefore, bounded closed sets in ℝⁿ are sequentially compact.

Exercise 66 What are compact sets in a discrete metric space? What are sequentially compact sets in a discrete metric space?

Lemma 67 Let (X,d) be a sequentially compact metric space. Then for every ε >0 there exist finitely many points x₁,…,x_n such that {B_ε(x_i)∣ i=1,…,n} is a cover.

Proof. Suppose this were not the case. Then there would exist an ε>0 such that for any finite number of points x₁,…,x_n the collection of balls B_ε(x_i) does not cover, i.e.

∪

i=1

B_ε(x_i) ≠ X.

Starting with n=1 and then inductively adding points that are in the complement of ∪_i=1ⁿ B_ε(x_i) we end up with an infinite sequence of points x_i such that d(x_i,x_k) ≥ ε. This sequence cannot have a Cauchy subsequence (required for convergence) in contradiction with the sequential compactness of X. □

Theorem 68 A metric space (X,d) is compact if and only if it is sequentially compact.

Proof. We show the two directions separately.

Compactness implies sequential compactness: Suppose that X is compact and let (x_i)_{i ∈ ℕ} be a sequence. We want to show that it has a convergent subsequence. Suppose (x_i) did not have a convergent subsequence. Then no point x is an accumulation point. Therefore, for each x ∈ X there exists an ε(x)>0 such that only finitely many i ∈ ℕ for which x_i ∈ B_ε(x). Since (B_ε(x))_{x ∈ X} is an open cover it has a finite subcover, that is a finite number of balls with a finite number of x_i in each. This contradicts to the infinite number of elements in the sequence (x_i).

Sequential compactness implies compactness: This implication is quite tricky. The proof is again by contradiction. Let us assume our space is sequentially compact and there exists a cover U_α that does not have a finite subcover. By the above lemma there are finitely many points x₁,…,x_N₁ such that B₁(x_i) is a cover. Each of the balls B₁(x_i) is covered by U_α as well. Since our cover does not have a finite subcover one of the balls B₁(x_i) does not have a finite subcover. Denote the relevant point x_i by z₁.

Again there are finitely many points x′₁,…,x′_N₂ such that B_1/2(x′_i) is a cover of X. The collection of sets B₁(z₁) ∩ B_1/2(x′_i), with i=1,…,N₂ is also a covering of B₁(z₁). In the same way as before there is at least one of the x′_i (which we will again call z₂), such that B₁(z₁) ∩ B_1/2(z₂) can not be covered by a finite subcover of U_α. Continuing like this we construct a sequence of points z_i such that none of the sets

B₁(z₁) ⋂ B

(z₂) ⋂ … ⋂ B

(z_N)

can be covered by a finite subcover of U_α.

By assumption the sequence (z_i) has a convergent subsequence. Say z is a limit point of that subsequence. Since U_α is an open cover the point z is contained in one of the U_α and of course that means that an open ball B_ε(z) around z is contained in U_α for some ε>0.

Now we show that there exits an N ∈ ℕ such that B_1/N(z_N) is a subset of U_α (this will be the desired contradiction!). Indeed, choose N large enough so that d(z_N,z) + 1/N<ε. Then x ∈ B_1/N(z_N) implies that d(x,z) ≤ d(z_N,z) + d(x,z_N) < d(z_N,z) + 1/N<ε. This means in particular that

B₁(z₁) ⋂ B

(z₂) ⋂ … ⋂ B

(z_N)

is a subset of U_α. Thus, there is a subcover of the set B₁(z₁) ∩ … ∩ B_1/N(z_N) consisting of one element U_α. This is a contradiction as we constructed the sequence of balls in such a way that these sets cannot be covered by a finite number of the U_α. □

Definition 69 (Boundedness) A subset A ⊂ X of a metric space is called bounded if there exists x₀ ∈ X and C>0 such that for all x ∈ A we have d(x₀,x) ≤ C.

Remark 70 One can easily see, using the triangle inequality, that the reference point x₀ can be chosen as any point in X. This means if A ⊂ X is bounded and x₀ ∈ X, then there exist a C>0 such that d(x₀,x) ≤ C for any x ∈ A.

Theorem 71 Suppose that A ⊂ X is a compact subset of a metric space. Then A is closed and bounded.

Proof. First we show A is bounded. Choose any x₀ ∈ X and note that the set B_n(x₀) indexed by n ∈ ℕ is an open cover of A. Hence, there exists a finite sub-cover< B_n₁(x₀),…,B_{n_N}(x₀). Hence, A ⊂ B_C(x₀), where C= max{n₁,…,c_N}. Hence, A is bounded.

Next assume that (x_k) is a sequence in A that converges in X. Since A is compact there exists a subsequence that converges in A. Hence, the limit of x_k must also be in A. Therefore, A is closed. □

The converse of this statement is not correct in general. It is however famously correct in ℝ^m.

Theorem 72 (Heine–Borel) A subset K ⊂ ℝ^m is compact if and only if it is closed and bounded.

Proof. We just need to combine the above statements. We have already shown that compactness implies closedness and boundedness. If K is closed and bounded we know from Analysis that it is sequentially compact. Therefore it is compact. □

As an illustration of further nice properties of compact spaces we mention the following result:

Exercise 73

Any continuous function on a compact set is bounded.
Any continuous function f: K→ X from a compact space K to a metric space X is uniformly continuous.

Remark 74 Note that there are two different sorts of properties of metric spaces:

the first sort of absolute properties can be verified on a metric space itself;
the second sort of relative properties is meaningful only for subsets of another metric spaces. Such a property may be true for X as a subspace of X but false if X is considered as a subspace of a different space Z.

Completeness and compactness are of the first sort, closedness is of the second, cf. Rem 17.

2 Basics of Linear Spaces

A person is solely the concentration of an infinite set of interrelations with another and others, and to separate a person from these relations means to take away any real meaning of the life.

Vl. Soloviev

A space around us could be described as a three dimensional Euclidean space. To single out a point of that space we need a fixed frame of references and three real numbers, which are coordinates of the point. Similarly to describe a pair of points from our space we could use six coordinates; for three points—nine, end so on. This makes it reasonable to consider Euclidean (linear) spaces of an arbitrary finite dimension, which are studied in the courses of linear algebra.

The basic properties of Euclidean spaces are determined by its linear and metric structures. The linear space (or vector space) structure allows to add and subtract vectors associated to points as well as to multiply vectors by real or complex numbers (scalars).

The metric space structure assign a distance—non-negative real number—to a pair of points or, equivalently, defines a length of a vector defined by that pair. A metric (or, more generally a topology) is essential for definition of the core analytical notions like limit or continuity. The importance of linear and metric (topological) structure in analysis sometime encoded in the formula:

Analysis = Algebra + Geometry . (8)

On the other hand we could observe that many sets admit a sort of linear and metric structures which are linked each other. Just few among many other examples are:

The set of convergent sequences;
The set of continuous functions on [0,1].

It is a very mathematical way of thinking to declare such sets to be spaces and call their elements points.

But shall we lose all information on a particular element (e.g. a sequence {1/n}) if we represent it by a shapeless and size-less “point” without any inner configuration? Surprisingly not: all properties of an element could be now retrieved not from its inner configuration but from interactions with other elements through linear and metric structures. Such a “sociological” approach to all kind of mathematical objects was codified in the abstract category theory.

Another surprise is that starting from our three dimensional Euclidean space and walking far away by a road of abstraction to infinite dimensional Hilbert spaces we are arriving just to yet another picture of the surrounding space—that time on the language of quantum mechanics.

The distance from Manchester to Liverpool is 35 miles—just about the mileage in the opposite direction!

A tourist guide to England

2.1 Banach spaces (basic definitions only)

The following definition generalises the notion of distance known from the everyday life.

Definition 1 A metric (or distance function) d on a set M is a function d: M× M →ℝ₊ from the set of pairs to non-negative real numbers such that:

d(x,y)≥0 for all x, y ∈ M, d(x,y)=0 implies x=y .
d(x,y)=d(y,x) for all x and y in M.
d(x,y)+d(y,z)≥ d(x,z) for all x, y, and z in M (triangle inequality ).

Exercise 2 Let M be the set of UK’s cities are the following function are metrics on M:

d(A,B) is the price of 2nd class railway ticket from A to B.
d(A,B) is the off-peak driving time from A to B.

The following notion is a useful specialisation of metric adopted to the linear structure.

Definition 3 Let V be a (real or complex) vector space. A norm on V is a real-valued function, written ||x||, such that

||x||≥ 0 for all x∈ V, and ||x||=0 implies x=0.
||λ x|| = | λ | ||x|| for all scalar λ and vector x.
||x+y||≤ ||x||+||y|| (triangle inequality ).

A vector space with a norm is called a normed space .

The connection between norm and metric is as follows:

Proposition 4 If ||·|| is a norm on V, then it gives a metric on V by d(x,y)=||x−y||.

(a) (b)
Figure 1: Triangle inequality in metric (a) and normed (b) spaces.

Proof. This is a simple exercise to derive items 1(1)–1(3) of Definition 1 from corresponding items of Definition 3. For example, see the Figure 1 to derive the triangle inequality. □

An important notions known from real analysis are limit and convergence. Particularly we usually wish to have enough limiting points for all “reasonable” sequences.

Definition 5 A sequence {x_k} in a metric space (M,d) is a Cauchy sequence , if for every є>0, there exists an integer n such that k,l>n implies that d(x_k,x_l)<є.

Definition 6 (M,d) is a complete metric space if every Cauchy sequence in M converges to a limit in M.

For example, the set of integers ℤ and reals ℝ with the natural distance functions are complete spaces, but the set of rationals ℚ is not. The complete normed spaces deserve a special name.

Definition 7 A Banach space is a complete normed space.

Exercise^* 8 A convenient way to define a norm in a Banach space is as follows. The unit ball U in a normed space B is the set of x such that ||x||≤ 1. Prove that:

U is a convex set , i.e. x, y∈ U and λ∈ [0,1] the point λ x +(1−λ)y is also in U.
||x||=inf{ λ∈ℝ₊  ∣  λ⁻¹x ∈ U}.
U is closed if and only if the space is Banach.

(i) (ii) (iii)
Figure 2: Different unit balls defining norms in ℝ² from Example 9.

Example 9 Here is some examples of normed spaces.

l₂ⁿ is either ℝⁿ or ℂⁿ with norm defined by
⎪⎪
⎪⎪ (x₁,…,x_n) ⎪⎪
⎪⎪ ₂ = √

⎪
⎪ x₁ ⎪
⎪ ²+ ⎪
⎪ x₂ ⎪
⎪ ²+ ⋯+ ⎪
⎪ x_n ⎪
⎪ ²

. (9)
l₁ⁿ is either ℝⁿ or ℂⁿ with norm defined by
⎪⎪
⎪⎪ (x₁,…,x_n) ⎪⎪
⎪⎪ ₁ =
⎪
⎪ x₁ ⎪
⎪ + ⎪
⎪ x₂ ⎪
⎪ + ⋯+ ⎪
⎪ x_n ⎪
⎪

. (10)
l_∞ⁿ is either ℝⁿ or ℂⁿ with norm defined by
⎪⎪
⎪⎪ (x₁,…,x_n) ⎪⎪
⎪⎪ _∞ = max(
⎪
⎪ x₁ ⎪
⎪ , ⎪
⎪ x₂ ⎪
⎪ , ⋯, ⎪
⎪ x_n ⎪
⎪

). (11)
Let X be a topological space, then C_b(X) is the space of continuous bounded functions f: X→ℂ with norm ||f||_∞=sup_X | f(x) |.
Let X be any set, then l_∞(X) is the space of all bounded (not necessarily continuous) functions f: X→ℂ with norm ||f||_∞=sup_X | f(x) |.

All these normed spaces are also complete and thus are Banach spaces. Some more examples of both complete and incomplete spaces shall appear later.

—We need an extra space to accommodate this product!

A manager to a shop assistant

2.2 Hilbert spaces

Although metric and norm capture important geometric information about linear spaces they are not sensitive enough to represent such geometric characterisation as angles (particularly orthogonality). To this end we need a further refinements.

From courses of linear algebra known that the scalar product ⟨ x,y ⟩= x₁ y₁ + ⋯ + x_n y_n is important in a space ℝⁿ and defines a norm ||x||²=⟨ x,x ⟩. Here is a suitable generalisation:

Definition 10 A scalar product (or inner product ) on a real or complex vector space V is a mapping V× V → ℂ, written ⟨ x,y ⟩, that satisfies:

⟨ x,x ⟩ ≥ 0 and ⟨ x,x ⟩ =0 implies x=0.
⟨ x,y ⟩ = ⟨ y,x ⟩ in complex spaces and ⟨ x,y ⟩ = ⟨ y,x ⟩ in real ones for all x, y∈ V.
⟨ λ x,y ⟩=λ ⟨ x,y ⟩, for all x, y∈ V and scalar λ. (What is ⟨ x,λ y ⟩?).
⟨ x+y,z ⟩=⟨ x,z ⟩ + ⟨ y,z ⟩, for all x, y, and z∈ V. (What is ⟨ x, y+z ⟩?).

Last two properties of the scalar product is oftenly encoded in the phrase: “it is linear in the first variable if we fix the second and anti-linear in the second if we fix the first”.

Definition 11 An inner product space V is a real or complex vector space with a scalar product on it.

Example 12 Here is some examples of inner product spaces which demonstrate that expression ||x||=√⟨ x,x ⟩ defines a norm.

The inner product for ℝⁿ was defined in the beginning of this section. The inner product for ℂⁿ is given by ⟨ x,y ⟩=∑₁ⁿ x_j ȳ_j. The norm ||x||=√∑₁ⁿ | x_j |² makes it l₂ⁿ from Example 9(1).
The extension for infinite vectors: let l₂ be
l₂={ sequences {x_j}₁^∞ ∣ 
∞

∑

1

⎪
⎪ x_j ⎪
⎪ ² < ∞}. (12)

Let us equip this set with operations of term-wise addition and multiplication by scalars, then l₂ is closed under them. Indeed it follows from the triangle inequality and properties of absolutely convergent series. From the standard Cauchy–Bunyakovskii–Schwarz inequality follows that the series ∑₁^∞x_jȳ_j absolutely converges and its sum defined to be ⟨ x,y ⟩.
Let C_b[a,b] be a space of continuous functions on the interval [a,b]∈ℝ. As we learn from Example 9(4) a normed space it is a normed space with the norm ||f||_∞=sup_[a,b]| f(x) |. We could also define an inner product:
⟨ f,g ⟩=
b

∫

a

f(x)ḡ(x) d x and ⎪⎪
⎪⎪ f ⎪⎪
⎪⎪ ₂= ⎛
⎜
⎜
⎝
b

∫

a

⎪
⎪ f(x) ⎪
⎪ ² d x ⎞
⎟
⎟
⎠
1/2

. (13)

Now we state, probably, the most important inequality in analysis.

Theorem 13 (Cauchy–Schwarz–Bunyakovskii inequality) For vectors x and y in an inner product space V let us define ||x||=√⟨ x,x ⟩ and ||y||=√⟨ y,y ⟩ then we have

⎪
⎪

⟨ x,y ⟩

⎪
⎪

≤

⎪⎪
⎪⎪

, (14)

with equality if and only if x and y are scalar multiple each other.

Proof. For simplicity we start from a real vector space. Let we have two vectors u and v and want to define an inner product on the two-dimensional vector space spanned by them. That is we need to know a value of ⟨ au+bv, cu+dv ⟩ for all possible scalars a, b, c, d.

By the linearity ⟨ au+bv, cu+dv ⟩ = ac⟨ u,u ⟩ + (bc+ad)⟨ u,v ⟩ + db⟨ v,v ⟩, thus everything is defined as soon as we know three inner products ⟨ u,u ⟩, ⟨ u,v ⟩ and ⟨ v,v ⟩. First of all we need to demand ⟨ u,u ⟩ ≥ 0 and ⟨ v,v ⟩ ≥ 0.

Furthermore, they shall be such that ⟨ au+bv, au+bv ⟩ ≥ 0 for all scalar a and b. If a=0, that is reduced to the previous case ⟨ v,v ⟩ ≥ 0. If a is non-zero we note ⟨ au+bv, au+bv ⟩ = a² ⟨ u+(b/a)v, u+(b/a)v ⟩ and letting λ = b/a we reduce our consideration to the quadratic expression

⟨ u+λ v, u+λ v ⟩ = λ ²⟨ v,v ⟩+2λ ⟨ u,v ⟩+⟨ u,u ⟩.

The graph of this function of λ is an upward parabolabecause ⟨ v,v ⟩ ≥ 0. Thus, it will be non-negative for all λ if its lowest value is non-negative. From the theory of quadratic expressions, the latter is achieved at λ =−⟨ u,v ⟩/⟨ v,v ⟩ and is equal to

⟨ u,v ⟩²

⟨ v,v ⟩²

⟨ v,v ⟩ − 2

⟨ u,v ⟩

⟨ v,v ⟩

⟨ u,v ⟩+⟨ u,u ⟩=−

⟨ u,v ⟩²

⟨ v,v ⟩

+⟨ u,u ⟩

If −⟨ u,v ⟩²/⟨ v,v ⟩+⟨ u,u ⟩ ≥ 0 then ⟨ v,v ⟩⟨ u,u ⟩ ≥ ⟨ u,v ⟩².

Therefore, the Cauchy-Schwarz inequality is necessary and sufficient condition for the non-negativity of the inner product defined by the three values ⟨ u,u ⟩, ⟨ u,v ⟩ and ⟨ v,v ⟩.

After the previous discussion it is easy to get the result for complex vector space as well. For any x, y∈ V and any t∈ℝ we have:

0< ⟨ x+t y,x+t y ⟩= ⟨ x,x ⟩+2t ℜ ⟨ y,x ⟩+t²⟨ y,y ⟩),

Thus, the discriminant of this quadratic expression in t is non-positive: (ℜ ⟨ y,x ⟩)²−||x||²||y||²≤ 0, that is | ℜ ⟨ x,y ⟩ |≤||x||||y||. Replacing y by e^iαy for an arbitrary α∈[−π,π] we get | ℜ (e^iα⟨ x,y ⟩) | ≤||x||||y||, this implies the desired inequality.

□

Corollary 14 Any inner product space is a normed space with norm ||x||=√⟨ x,x ⟩ (hence also a metric space, Prop. 4).

Proof. Just to check items 3(1)–3(3) from Definition 3. □

Again complete inner product spaces deserve a special name

Definition 15 A complete inner product space is Hilbert space .

The relations between spaces introduced so far are as follows:

Hilbert spaces	⇒	Banach spaces	⇒	Complete metric spaces
⇓		⇓		⇓
inner product spaces	⇒	normed spaces	⇒	metric spaces.

How can we tell if a given norm comes from an inner product?

Figure 3: To the parallelogram identity.

Theorem 16 (Parallelogram identity) In an inner product space H we have for all x and y∈ H (see Figure 3):

⎪⎪
⎪⎪

x+y

⎪⎪
⎪⎪

²+

⎪⎪
⎪⎪

x−y

⎪⎪
⎪⎪

²=2

⎪⎪
⎪⎪

²+2

⎪⎪
⎪⎪

². (15)

Proof. Just by linearity of inner product:

⟨ x+y,x+y ⟩+⟨ x−y,x−y ⟩=2⟨ x,x ⟩+2⟨ y,y ⟩,

because the cross terms cancel out. □

Exercise 17 Show that (15) is also a sufficient condition for a norm to arise from an inner product. Namely, for a norm on a complex Banach space satisfying to (15) the formula

⟨ x,y ⟩

⎛
⎝

⎪⎪
⎪⎪

x+y

⎪⎪
⎪⎪

²−

⎪⎪
⎪⎪

x−y

⎪⎪
⎪⎪

²+i

⎪⎪
⎪⎪

x+iy

⎪⎪
⎪⎪

² −i

⎪⎪
⎪⎪

x−iy

⎪⎪
⎪⎪

⎞
⎠

(16)

∑

i^k

⎪⎪
⎪⎪

x+i^ky

⎪⎪
⎪⎪

defines an inner product. What is a suitable formula for a real Banach space?

Divide and rule!

Old but still much used recipe

2.3 Subspaces

To study Hilbert spaces we may use the traditional mathematical technique of analysis and synthesis: we split the initial Hilbert spaces into smaller and probably simpler subsets, investigate them separately, and then reconstruct the entire picture from these parts.

As known from the linear algebra, a linear subspace is a subset of a linear space is its subset, which inherits the linear structure, i.e. possibility to add vectors and multiply them by scalars. In this course we need also that subspaces inherit topological structure (coming either from a norm or an inner product) as well.

Definition 18 By a subspace of a normed space (or inner product space) we mean a linear subspace with the same norm (inner product respectively). We write X⊂ Y or X ⊆ Y.

Example 19

C_b(X) ⊂ l_∞(X) where X is a metric space.
Any linear subspace of ℝⁿ or ℂⁿ with any norm given in Example 9(1)–9(3).
Let c₀₀ be the space of finite sequences, i.e. all sequences (x_n) such that exist N with x_n=0 for n>N. This is a subspace of l₂ since ∑₁^∞| x_j |² is a finite sum, so finite.

We also wish that the both inhered structures (linear and topological) should be in agreement, i.e. the subspace should be complete. Such inheritance is linked to the property be closed.

A subspace need not be closed—for example the sequence

x=(1, 1/2, 1/3, 1/4, …)∈ l₂ because ∑1/k² < ∞

and x_n=(1, 1/2,…, 1/n, 0, 0,…)∈ c₀₀ converges to x thus x∈ c₀₀ ⊂ l₂.

Proposition 20

Any closed subspace of a Banach/Hilbert space is complete, hence also a Banach/Hilbert space.
Any complete subspace is closed.
The closure of subspace is again a subspace.

Proof.

This is true in any metric space X: any Cauchy sequence from Y has a limit x ∈ X belonging to Ȳ, but if Y is closed then x ∈ Y.
Let Y is complete and x∈ Ȳ, then there is sequence x_n→ x in Y and it is a Cauchy sequence. Then completeness of Y implies x∈ Y.
If x, y∈ Ȳ then there are x_n and y_n in Y such that x_n→ x and y_n→ y. From the triangle inequality:
⎪⎪
⎪⎪ (x_n+y_n)−(x+y) ⎪⎪
⎪⎪ ≤ ⎪⎪
⎪⎪ x_n−x ⎪⎪
⎪⎪ + ⎪⎪
⎪⎪ y_n−y ⎪⎪
⎪⎪ → 0,

so x_n+y_n→ x+y and x+y∈ Ȳ. Similarly x∈Ȳ implies λ x ∈Ȳ for any λ.

□

Hence c₀₀ is an incomplete inner product space, with inner product ⟨ x,y ⟩=∑₁^∞x_k ȳ_k (this is a finite sum!) as it is not closed in l₂.

(a) (b)
Figure 4: Jump function on (b) as a L₂ limit of continuous functions from (a).

Similarly C[0,1] with inner product norm ||f||=(∫₀¹ | f(t) |² dt)^1/2 is incomplete—take the large space X of functions continuous on [0,1] except for a possible jump at 1/2 (i.e. left and right limits exists but may be unequal and f(1/2)=lim_t→1/2+ f(t). Then the sequence of functions defined on Figure 4(a) has the limit shown on Figure 4(b) since:

⎪⎪
⎪⎪

f−f_n

⎪⎪
⎪⎪

∫

−

⎪
⎪

f−f_n

⎪
⎪

² dt <

→ 0.

Obviously f∈C[0,1]∖C[0,1].

Exercise 21 Show alternatively that the sequence of function f_n from Figure 4(a) is a Cauchy sequence in C[0,1] but has no continuous limit.

Similarly the space C[a,b] is incomplete for any a<b if equipped by the inner product and the corresponding norm:

⟨ f,g ⟩

∫

f(t)ḡ(t) d t

(17)

⎪⎪
⎪⎪

₂

⎛
⎜
⎜
⎝

∫

⎪
⎪

f(t)

⎪
⎪

²  d t

⎞
⎟
⎟
⎠

1/2

(18)

Definition 22 Define a Hilbert space L₂[a,b] to be the smallest complete inner product space containing space C[a,b] with the restriction of inner product given by (17).

It is practical to realise L₂[a,b] as a certain space of “functions” with the inner product defined via an integral. There are several ways to do that and we mention just two:

Elements of L₂[a,b] are equivalent classes of Cauchy sequences f⁽ⁿ⁾ of functions from C[a,b].
Let integration be extended from the Riemann definition to the wider Lebesgue integration (see Section 13). Let L be a set of square integrable in Lebesgue sense functions on [a,b] with a finite norm (18). Then L₂[a,b] is a quotient space of L with respect to the equivalence relation f∼ g ⇔ ||f−g||₂=0 .
Example 23 Let the Cantor function on [0,1] be defined as follows:
f(t)= ⎧
⎨
⎩
1, t∈ ℚ;

0, t∈ ℝ∖ℚ.

This function is not integrable in the Riemann sense but does have the Lebesgue integral. The later however is equal to 0 and as an L₂-function the Cantor function equivalent to the function identically equal to 0.
The third possibility is to map L₂(ℝ) onto a space of “true” functions but with an additional structure. For example, in quantum mechanics it is useful to work with the Segal–Bargmann space of analytic functions on ℂ with the inner product []:
⟨ f₁,f₂ ⟩= ∫

ℂ

f₁(z) f₂(z)e
− ⎪
⎪ z ⎪
⎪ ²

 d z.

Theorem 24 The sequence space l₂ is complete, hence a Hilbert space.

Proof. Take a Cauchy sequence x⁽ⁿ⁾∈l₂, where x⁽ⁿ⁾=(x₁⁽ⁿ⁾, x₂⁽ⁿ⁾, x₃⁽ⁿ⁾, … ). Our proof will have three steps: identify the limit x; show it is in l₂; show x⁽ⁿ⁾→ x.

If x⁽ⁿ⁾ is a Cauchy sequence in l₂ then x_k⁽ⁿ⁾ is also a Cauchy sequence of numbers for any fixed k:
⎪
⎪ x_k⁽ⁿ⁾−x_k^(m) ⎪
⎪ ≤ ⎛
⎜
⎜
⎝
∞

∑

k=1

⎪
⎪ x_k⁽ⁿ⁾−x_k^(m) ⎪
⎪ ² ⎞
⎟
⎟
⎠
1/2

= ⎪⎪
⎪⎪ x⁽ⁿ⁾−x^(m) ⎪⎪
⎪⎪ → 0.

Let x_k be the limit of x_k⁽ⁿ⁾.
For a given є>0 find n₀ such that ||x⁽ⁿ⁾−x^(m)||<є for all n,m>n₀. For any K and m:

K

∑

k=1

⎪
⎪ x_k⁽ⁿ⁾−x_k^(m) ⎪
⎪ ² ≤ ⎪⎪
⎪⎪ x⁽ⁿ⁾−x^(m) ⎪⎪
⎪⎪ ²<є².

Let m→ ∞ then ∑_k=1^K | x_k⁽ⁿ⁾−x_k |² ≤ є².
Let K→ ∞ then ∑_k=1^∞| x_k⁽ⁿ⁾−x_k |² ≤ є². Thus x⁽ⁿ⁾−x∈l₂ and because l₂ is a linear space then x = x⁽ⁿ⁾−(x⁽ⁿ⁾−x) is also in l₂.
We saw above that for any є >0 there is n₀ such that ||x⁽ⁿ⁾−x||<є for all n>n₀. Thus x⁽ⁿ⁾→ x.

Consequently l₂ is complete. □

All good things are covered by a thick layer of chocolate (well, if something is not yet–it certainly will)

2.4 Linear spans

As was explained into introduction 2, we describe “internal” properties of a vector through its relations to other vectors. For a detailed description we need sufficiently many external reference points.

Let A be a subset (finite or infinite) of a normed space V. We may wish to upgrade it to a linear subspace in order to make it subject to our theory.

Definition 25 The linear span of A, write Lin(A), is the intersection of all linear subspaces of V containing A, i.e. the smallest subspace containing A, equivalently the set of all finite linear combination of elements of A. The closed linear span of A write CLin(A) is the intersection of all closed linear subspaces of V containing A, i.e. the smallest closed subspace containing A.

Exercise^* 26

Show that if A is a subset of finite dimension space then Lin(A)=CLin(A).
Show that for an infinite A spaces Lin(A) and CLin(A)could be different. (Hint: use Example 19(3).)

Proposition 27 Lin(A)=CLin(A).

Proof. Clearly Lin(A) is a closed subspace containing A thus it should contain CLin(A). Also Lin(A)⊂ CLin(A) thus Lin(A)⊂ CLin(A)=CLin(A). Therefore Lin(A)= CLin(A). □

Consequently CLin(A) is the set of all limiting points of finite linear combination of elements of A.

Example 28 Let V=C[a,b] with the sup norm ||·||_∞. Then:
Lin{1,x,x²,…}={all polynomials}
CLin{1,x,x²,…}=C[a,b] by the Weierstrass approximation theorem proved later.

Remark 29 Note, that the relation P ⊂ CLin(Q) between two sets P and Q is transitive: if P ⊂ CLin(Q) and Q ⊂ CLin(R) then P ⊂ CLin(R). This observation is often used in the following way. To show that P ⊂ CLin(R) we introduce some intermediate sets Q₁, …, Q_n such that P ⊂ CLin(Q₁), Q_j ⊂ CLin(Q_j+1) and Q_n ⊂ CLin(R), see the proof of Weierstrass Approximation Thm. 17 or § 14.2 for an illustration.

The following simple result will be used later many times without comments.

Lemma 30 (about Inner Product Limit) Suppose H is an inner product space and sequences x_n and y_n have limits x and y correspondingly. Then ⟨ x_n,y_n ⟩→⟨ x,y ⟩ or equivalently:

lim

n→∞

⟨ x_n,y_n ⟩=⟨

lim

n→∞

x_n,

lim

n→∞

y_n ⟩.

Proof. Obviously by the Cauchy–Schwarz inequality:

⎪
⎪

⟨ x_n,y_n ⟩−⟨ x,y ⟩

⎪
⎪

⟨ x_n−x,y_n ⟩+⟨ x,y_n−y ⟩

⎪
⎪

≤

⎪
⎪

⟨ x_n−x,y_n ⟩

⎪
⎪

⟨ x,y_n−y ⟩

⎪
⎪

≤

⎪⎪
⎪⎪

x_n−x

⎪⎪
⎪⎪

y_n

⎪⎪
⎪⎪

y_n−y

⎪⎪
⎪⎪

→ 0,

since ||x_n−x||→ 0, ||y_n−y||→ 0, and ||y_n|| is bounded. □

3 Orthogonality

Pythagoras is forever!

The catchphrase from TV commercial of Hilbert Spaces course

As was mentioned in the introduction the Hilbert spaces is an analog of our 3D Euclidean space and theory of Hilbert spaces similar to plane or space geometry. One of the primary result of Euclidean geometry which still survives in high school curriculum despite its continuous nasty de-geometrisation is Pythagoras’ theorem based on the notion of orthogonality¹.

So far we was concerned only with distances between points. Now we would like to study angles between vectors and notably right angles. Pythagoras’ theorem states that if the angle C in a triangle is right then c²=a²+b², see Figure 5 .

Figure 5: The Pythagoras’ theorem c²=a²+b²

It is a very mathematical way of thinking to turn this property of right angles into their definition, which will work even in infinite dimensional Hilbert spaces.

Look for a triangle, or even for a right triangle

A universal advice in solving problems from elementary geometry.

3.1 Orthogonal System in Hilbert Space

In inner product spaces it is even more convenient to give a definition of orthogonality not from Pythagoras’ theorem but from an equivalent property of inner product.

Definition 1 Two vectors x and y in an inner product space are orthogonal if ⟨ x,y ⟩=0, written x ⊥ y.

An orthogonal sequence (or orthogonal system ) e_n (finite or infinite) is one in which e_n ⊥ e_m whenever n≠ m.

An orthonormal sequence (or orthonormal system ) e_n is an orthogonal sequence with ||e_n||=1 for all n.

Exercise 2

Show that if x ⊥ x then x=0 and consequently x ⊥ y for any y∈ H.
Show that if all vectors of an orthogonal system are non-zero then they are linearly independent.

Example 3 These are orthonormal sequences:

Basis vectors (1,0,0), (0,1,0), (0,0,1) in ℝ³ or ℂ³.
Vectors e_n=(0,…,0,1,0,…) (with the only 1 on the nth place) in l₂. (Could you see a similarity with the previous example?)
Functions e_n(t)=1/(√2π) e^int , n∈ℤ in C[0,2π]:
⟨ e_n,e_m ⟩=
2π

∫

0

1

2π

e^inte^−imtdt = ⎧
⎨
⎩
1, n=m;

0, n≠ m.

(19)

Exercise 4 Let A be a subset of an inner product space V and x⊥ y for any y∈ A. Prove that x⊥ z for all z∈CLin(A).

Theorem 5 (Pythagoras’) If x ⊥ y then ||x+y||²=||x||²+||y||². Also if e₁, …, e_n is orthonormal then

⎪⎪
⎪⎪
⎪⎪
⎪⎪

∑

a_k e_k

⎪⎪
⎪⎪
⎪⎪
⎪⎪

²=⟨

∑

a_k e_k,

∑

a_k e_k ⟩=

∑

⎪
⎪

a_k

⎪
⎪

².

Proof. A one-line calculation. □

The following theorem provides an important property of Hilbert spaces which will be used many times. Recall, that a subset K of a linear space V is convex if for all x, y∈ K and λ∈ [0,1] the point λ x +(1−λ)y is also in K. Particularly any subspace is convex and any unit ball as well (see Exercise 8(1)).

Theorem 6 (about the Nearest Point) Let K be a non-empty convex closed subset of a Hilbert space H. For any point x∈ H there is the unique point y∈ K nearest to x.

Proof. Let d=inf_{y∈ K} d(x,y), where d(x,y)—the distance coming from the norm ||x||=√⟨ x,x ⟩ and let y_n a sequence points in K such that lim_n→
∞d(x,y_n)=d. Then y_n is a Cauchy sequence. Indeed from the parallelogram identity for the parallelogram generated by vectors x−y_n and x−y_m we have:

⎪⎪
⎪⎪

y_n−y_m

⎪⎪
⎪⎪

²=2

⎪⎪
⎪⎪

x−y_n

⎪⎪
⎪⎪

²+2

⎪⎪
⎪⎪

x−y_m

⎪⎪
⎪⎪

²−

⎪⎪
⎪⎪

2x−y_n−y_m

⎪⎪
⎪⎪

².

Note that ||2x−y_n−y_m||²=4||x−y_n+y_m/2||²≥ 4d² since y_n+y_m/2∈ K by its convexity. For sufficiently large m and n we get ||x−y_m||²≤ d +є and ||x−y_n||²≤ d +є, thus ||y_n−y_m||≤ 4(d²+є)−4d²=4є, i.e. y_n is a Cauchy sequence.

Let y be the limit of y_n, which exists by the completeness of H, then y∈ K since K is closed. Then d(x,y)=lim_{n→ ∞}d(x,y_n)=d. This show the existence of the nearest point. Let y′ be another point in K such that d(x,y′)=d, then the parallelogram identity implies:

⎪⎪
⎪⎪

y−y′

⎪⎪
⎪⎪

²=2

⎪⎪
⎪⎪

x−y

⎪⎪
⎪⎪

²+2

⎪⎪
⎪⎪

x−y′

⎪⎪
⎪⎪

²−

⎪⎪
⎪⎪

2x−y−y′

⎪⎪
⎪⎪

²≤ 4d²−4d²=0.

This shows the uniqueness of the nearest point. □

Exercise^* 7 The essential rôle of the parallelogram identity in the above proof indicates that the theorem does not hold in a general Banach space.

Show that in ℝ² with either norm ||·||₁ or ||·||_∞ form Example 9 the nearest point could be non-unique;
Could you construct an example (in Banach space) when the nearest point does not exists?

Liberte, Egalite, Fraternite!

A longstanding ideal approximated in the real life by something completely different

3.2 Bessel’s inequality

For the case then a convex subset is a subspace we could characterise the nearest point in the term of orthogonality.

Theorem 8 (on Perpendicular) Let M be a subspace of a Hilbert space H and a point x∈ H be fixed. Then z∈ M is the nearest point to x if and only if x−z is orthogonal to any vector in M.

(i) (ii)

Figure 6: (i) A smaller distance for a non-perpendicular direction; and

(ii) Best approximation from a subspace

Proof. Let z is the nearest point to x existing by the previous Theorem. We claim that x−z orthogonal to any vector in M, otherwise there exists y∈ M such that ⟨ x−z,y ⟩≠ 0. Then

⎪⎪
⎪⎪

x−z−є y

⎪⎪
⎪⎪

x−z

⎪⎪
⎪⎪

²−2є ℜ⟨ x−z,y ⟩+є²

⎪⎪
⎪⎪

x−z

⎪⎪
⎪⎪

²,

if є is chosen to be small enough and such that є ℜ⟨ x−z,y ⟩ is positive, see Figure 6(i). Therefore we get a contradiction with the statement that z is closest point to x.

On the other hand if x−z is orthogonal to all vectors in H₁ then particularly (x−z)⊥ (z−y) for all y∈ H₁, see Figure 6(ii). Since x−y=(x−z)+(z−y) we got by the Pythagoras’ theorem:

⎪⎪
⎪⎪

x−y

⎪⎪
⎪⎪

²=

⎪⎪
⎪⎪

x−z

⎪⎪
⎪⎪

² +

⎪⎪
⎪⎪

z−y

⎪⎪
⎪⎪

².

So ||x−y||²≥ ||x−z||² and the are equal if and only if z=y. □

Exercise 9 The above proof does not work if ⟨ x−z,y ⟩ is an imaginary number, what to do in this case?

Consider now a basic case of approximation: let x∈ H be fixed and e₁, …, e_n be orthonormal and denote H₁=Lin{e₁,…,e_n}. We could try to approximate x by a vector y=λ₁ e₁+⋯ +λ_n e_n ∈ H₁.

Corollary 10 The minimal value of ||x−y|| for y∈ H₁ is achieved when y=∑₁ⁿ⟨ x,e_i ⟩ e_i.

Proof. Let z=∑₁ⁿ⟨ x,e_i ⟩ e_i, then ⟨ x−z,e_i ⟩=⟨ x,e_i ⟩−⟨ z,e_i ⟩=0. By the previous Theorem z is the nearest point to x. □

Figure 7: Best approximation by three trigonometric polynomials

Example 11

In ℝ³ find the best approximation to (1,0,0) from the plane V:{x₁+x₂+x₃=0}. We take an orthonormal basis e₁=(2^−1/2, −2^−1/2,0), e₂=(6^−1/2, 6^−1/2, −2· 6^−1/2) of V (Check this!). Then:

z=⟨ x,e₁ ⟩e₁+⟨ x,e₂ ⟩e₂=

⎛
⎜
⎜
⎝

,−

⎞
⎟
⎟
⎠

⎛
⎜
⎜
⎝

,−

⎞
⎟
⎟
⎠

⎛
⎜
⎜
⎝

,−

⎞
⎟
⎟
⎠

In C[0,2π] what is the best approximation to f(t)=t by functions a+be^it+ce^−it? Let

e₀=

√

2π

, e₁=

√

2π

e^it, e₋₁=

√

2π

e^−it.

We find:

⟨ f,e₀ ⟩

2π

∫

√

2π

dt=

⎡
⎢
⎢
⎢
⎢
⎣

t²

√

2π

⎤
⎥
⎥
⎥
⎥
⎦

2π

√

π^3/2;

⟨ f,e₁ ⟩

2π

∫

t e^−it

√

2π

dt=i

√

2π

(Check this!)

⟨ f,e₋₁ ⟩

2π

∫

t e^it

√

2π

dt=−i

√

2π

(Why we may not check this one?)

Then the best approximation is (see Figure 7):

f₀(t)

⟨ f,e₀ ⟩e₀+⟨ f,e₁ ⟩e₁+⟨ f,e₋₁ ⟩e₋₁

√

π^3/2

√

2π

+ie^it−ie^−it=π−2sint.

Corollary 12 (Bessel’s inequality) If (e_i) is orthonormal then

⎪⎪
⎪⎪

²≥

∑

i=1

⎪
⎪

⟨ x,e_i ⟩

⎪
⎪

².

Proof. Let z= ∑₁ⁿ⟨ x,e_i ⟩e_i then x−z⊥ e_i for all i therefore by Exercise 4 x−z⊥ z. Hence:

⎪⎪
⎪⎪

²+

⎪⎪
⎪⎪

x−z

⎪⎪
⎪⎪

≥

⎪⎪
⎪⎪

²=

∑

i=1

⎪
⎪

⟨ x,e_i ⟩

⎪
⎪

².

□

—Did you say “rice and fish for them”?

A student question

3.3 The Riesz–Fischer theorem

When (e_i) is orthonormal we call ⟨ x,e_n ⟩ the nth Fourier coefficient of x (with respect to (e_i), naturally).

Theorem 13 (Riesz–Fisher) Let (e_n)₁^∞ be an orthonormal sequence in a Hilbert space H. Then ∑₁^∞λ_n e_n converges in H if and only if ∑₁^∞| λ_n |² < ∞. In this case ||∑₁^∞λ_n e_n||²=∑₁^∞| λ_n |².

Proof. Necessity: Let x_k=∑₁^k λ_n e_n and x=lim_{k→ ∞} x_k. So ⟨ x,e_n ⟩=lim_k→
∞⟨ x_k,e_n ⟩=λ_n for all n. By the Bessel’s inequality for all k

⎪⎪
⎪⎪

²≥

∑

⎪
⎪

⟨ x,e_n ⟩

⎪
⎪

²=

∑

⎪
⎪

λ_n

⎪
⎪

²,

hence ∑₁^k | λ_n |² converges and the sum is at most ||x||².

Sufficiency: Consider ||x_k−x_m||=||∑_m^k λ_n e_n||=(∑_m^k | λ_n |²)^1/2 for k>m. Since ∑_m^k | λ_n |² converges x_k is a Cauchy sequence in H and thus has a limit x. By the Pythagoras’ theorem ||x_k||²=∑₁^k | λ_n |² thus for k→ ∞ ||x||²=∑₁^∞| λ_n |² by the Lemma about inner product limit. □

Observation: the closed linear span of an orthonormal sequence in any Hilbert space looks like l₂, i.e. l₂ is a universal model for a Hilbert space.

By Bessel’s inequality and the Riesz–Fisher theorem we know that the series ∑₁^∞⟨ x,e_i ⟩ e_i converges for any x∈ H. What is its limit?

Let y=x− ∑₁^∞⟨ x,e_i ⟩ e_i, then

⟨ y,e_k ⟩=⟨ x,e_k ⟩−

∞

∑

⟨ x,e_i ⟩ ⟨ e_i,e_k ⟩=⟨ x,e_k ⟩−⟨ x,e_k ⟩ =0 for all k. (20)

Definition 14 An orthonormal sequence (e_i) in a Hilbert space H is complete if the identities ⟨ y,e_k ⟩=0 for all k imply y=0.

A complete orthonormal sequence is also called orthonormal basis in H.

Theorem 15 (on Orthonormal Basis) Let e_i be an orthonormal basis in a Hilber space H. Then for any x∈ H we have

∞

∑

n=1

⟨ x,e_n ⟩e_n and

⎪⎪
⎪⎪

²=

∞

∑

n=1

⎪
⎪

⟨ x,e_n ⟩

⎪
⎪

².

Proof. By the Riesz–Fisher theorem, equation (20) and definition of orthonormal basis. □

There are constructive existence theorems in mathematics.

An example of pure existence statement

3.4 Construction of Orthonormal Sequences

Natural questions are: Do orthonormal sequences always exist? Could we construct them?

Theorem 16 (Gram–Schmidt) Let (x_i) be a sequence of linearly independent vectors in an inner product space V. Then there exists orthonormal sequence (e_i) such that

Lin{x₁,x₂,…,x_n}=Lin{e₁,e₂,…,e_n}, for all n.

Proof. We give an explicit algorithm working by induction. The base of induction: the first vector is e₁=x₁/||x₁||. The step of induction: let e₁, e₂, …, e_n are already constructed as required. Let y_n+1=x_n+1−∑_i=1ⁿ⟨ x_n+1,e_i ⟩e_i. Then by (20) y_n+1 ⊥ e_i for i=1,…,n. We may put e_n+1=y_n+1/||y_n+1|| because y_n+1≠ 0 due to linear independence of x_k’s. Also

Lin{e₁,e₂,…,e_n+1}	=	Lin{e₁,e₂,…,y_n+1}
	=	Lin{e₁,e₂,…,x_n+1}
	=	Lin{x₁,x₂,…,x_n+1}.

So (e_i) are orthonormal sequence. □

Example 17 Consider C[0,1] with the usual inner product (17) and apply orthogonalisation to the sequence 1, x, x², …. Because ||1||=1 then e₁(x)=1. The continuation could be presented by the table:

e₁(x)=1

y₂(x)=x−⟨ x,1 ⟩1=x−

⎪⎪
⎪⎪

y₂

⎪⎪
⎪⎪

²=

∫

(x−

)² d x=

, e₂(x)=

√

(x−

)

y₃(x)=x²−⟨ x²,1 ⟩1−⟨ x²,x−

⟩(x−

)· 12 , …, e₃=

y₃

⎪⎪
⎪⎪

y₃

⎪⎪
⎪⎪

… … …

Figure 8: Five first Legendre P_i and Chebyshev T_i polynomials

Example 18 Many famous sequences of orthogonal polynomials, e.g. Chebyshev, Legendre, Laguerre, Hermite, can be obtained by orthogonalisation of 1, x, x², …with various inner products.

Legendre polynomials in C[−1,1] with inner product
⟨ f,g ⟩=
1

∫

−1

f(t)

g(t)

 d t. (21)
Chebyshev polynomials in C[−1,1] with inner product
⟨ f,g ⟩=
1

∫

−1

f(t)

g(t)

dt

√

1−t²

(22)
Laguerre polynomials in the space of polynomials P[0,∞) with inner product
⟨ f,g ⟩=
∞

∫

0

f(t)

g(t)

e^−t d t.

See Figure 8 for the five first Legendre and Chebyshev polynomials. Observe the difference caused by the different inner products (21) and (22). On the other hand note the similarity in oscillating behaviour with different “frequencies”.

Another natural question is: When is an orthonormal sequence complete?

Proposition 19 Let (e_n) be an orthonormal sequence in a Hilbert space H. The following are equivalent:

(e_n) is an orthonormal basis.
CLin((e_n))=H.
||x||²=∑₁^∞| ⟨ x,e_n ⟩ |² for all x∈ H.

Proof. Clearly 19(1) implies 19(2) because x=∑₁^∞⟨ x,e_n ⟩e_n in CLin((e_n)) and ||x||²=∑₁^∞⟨ x,e_n ⟩e_n by Theorem 15. The same theorem tells that 19(1) implies 19(3).

If (e_n) is not complete then there exists x∈ H such that x≠ 0 and ⟨ x,e_k ⟩=0 for all k, so 19(3) fails, consequently 19(3) implies 19(1).

Finally if ⟨ x,e_k ⟩=0 for all k then ⟨ x,y ⟩=0 for all y∈Lin((e_n)) and moreover for all y∈CLin((e_n)), by the Lemma on continuity of the inner product. But then x∉CLin((e_n)) and 19(2) also fails because ⟨ x,x ⟩=0 is not possible. Thus 19(2) implies 19(1). □

Corollary 20 A separable Hilbert space (i.e. one with a countable dense set) can be identified with either l₂ⁿ or l₂, in other words it has an orthonormal basis (e_n) (finite or infinite) such that

∞

∑

n=1

⟨ x,e_n ⟩e_n and

⎪⎪
⎪⎪

²=

∞

∑

n=1

⎪
⎪

⟨ x,e_n ⟩

⎪
⎪

².

Proof. Take a countable dense set (x_k), then H=CLin((x_k)), delete all vectors which are a linear combinations of preceding vectors, make orthonormalisation by Gram–Schmidt the remaining set and apply the previous proposition. □

Most pleasant compliments are usually orthogonal to our real qualities.

An advise based on observations

3.5 Orthogonal complements

Orthogonality allow us split a Hilbert space into subspaces which will be “independent from each other” as much as possible.

Definition 21 Let M be a subspace of an inner product space V. The orthogonal complement , written M^⊥, of M is

M^⊥={x∈ V: ⟨ x,m ⟩=0 ∀ m∈ M}.

Theorem 22 If M is a closed subspace of a Hilbert space H then M^⊥ is a closed subspace too (hence a Hilbert space too).

Proof. Clearly M^⊥ is a subspace of H because x, y∈ M^⊥ implies ax+by∈ M^⊥:

⟨ ax+by,m ⟩= a⟨ x,m ⟩+ b⟨ y,m ⟩=0.

Also if all x_n∈ M^⊥ and x_n→ x then x∈ M^⊥ due to inner product limit Lemma. □

Theorem 23 Let M be a closed subspace of a Hilber space H. Then for any x∈ H there exists the unique decomposition x=m+n with m∈ M, n∈ M^⊥ and ||x||²=||m||²+||n||². Thus H=M⊕ M^⊥ and (M^⊥)^⊥=M.

Proof. For a given x there exists the unique closest point m in M by the Theorem on nearest point and by the Theorem on perpendicular (x−m)⊥ y for all y∈ M.

So x= m + (x−m)= m+n with m∈ M and n∈ M^⊥. The identity ||x||²=||m||²+||n||² is just Pythagoras’ theorem and M∩ M^⊥={0} because null vector is the only vector orthogonal to itself.

Finally (M^⊥)^⊥=M. We have H=M⊕ M^⊥=(M^⊥)^⊥⊕ M^⊥, for any x∈(M^⊥)^⊥ there is a decomposition x=m+n with m∈ M and n∈ M^⊥, but then n is orthogonal to itself and therefore is zero. □

4 Duality of Linear Spaces

Everything has another side

Orthonormal basis allows to reduce any question on Hilbert space to a question on sequence of numbers. This is powerful but sometimes heavy technique. Sometime we need a smaller and faster tool to study questions which are represented by a single number, for example to demonstrate that two vectors are different it is enough to show that there is a unequal values of a single coordinate. In such cases linear functionals are just what we needed.

–Is it functional?
–Yes, it works!

4.1 Dual space of a normed space

Definition 1 A linear functional on a vector space V is a linear mapping α: V→ ℂ (or α: V→ ℝ in the real case), i.e.

α(ax+by)=aα(x)+bα(y), for all x,y∈ V and a,b∈ℂ.

Exercise 2 Show that α(0) is necessarily 0.

We will not consider any functionals but linear, thus below functional always means linear functional.

Example 3

Let V=ℂⁿ and c_k, k=1,…,n be complex numbers. Then α((x₁,…,x_n))=c₁x₁+⋯+c₂x₂ is a linear functional.
On C[0,1] a functional is given by α(f)=∫₀¹ f(t) d t.
On a Hilbert space H for any x∈ H a functional α_x is given by α_x(y)=⟨ y,x ⟩.

Theorem 4 Let V be a normed space and α is a linear functional. The following are equivalent:

α is continuous (at any point of V).
α is continuous at point 0.
sup{| α(x) |: ||x||≤ 1}< ∞, i.e. α is a bounded linear functional .

Proof. Implication 4(1) ⇒ 4(2) is trivial.

Show 4(2) ⇒ 4(3). By the definition of continuity: for any є>0 there exists δ>0 such that ||v||<δ implies | α(v)−α(0) |<є . Take є=1 then | α(δ x) |<1 for all x with norm less than 1 because ||δ x||< δ. But from linearity of α the inequality | α(δ x) |<1 implies | α(x) |<1/δ<∞ for all ||x||≤ 1.

4(3) ⇒ 4(1). Let mentioned supremum be M. For any x, y∈ V such that x≠ y vector (x−y)/||x−y|| has norm 1. Thus | α ((x−y)/||x−y||) |<M. By the linearity of α this implies that | α (x)−α(y) |<M||x−y||. Thus α is continuous. □

Definition 5 The dual space X^* of a normed space X is the set of continuous linear functionals on X. Define a norm on it by

⎪⎪
⎪⎪

sup

⎪⎪
⎪⎪

= 1

⎪
⎪

α(x)

⎪
⎪

. (23)

Exercise 6

Show that the chain of inequalities:

⎪⎪
⎪⎪

≤

sup

⎪⎪
⎪⎪

≤ 1

⎪
⎪

α(x)

⎪
⎪

≤

sup

x ≠ 0

⎪
⎪

α(x)

⎪
⎪

⎪⎪
⎪⎪

≤

⎪⎪
⎪⎪

Deduce that any of the mentioned supremums deliver the norm of α. Which of them you will prefer if you need to show boundedness of α? Which of them is better to use if boundedness of α is given?

Show that | α(x) |≤ ||α||·||x|| for all x∈ X, α ∈ X^*.

The important observations is that linear functionals form a normed space as follows:

Exercise 7

Show that X^* is a linear space with natural (point-wise) operations.
Show that (23) defines a norm on X^*.

Furthermeore, X^* is always complete, regardless of properties of X!

Theorem 8 X^* is a Banach space with the defined norm (even if X was incomplete).

Proof. Due to Exercise 7 we only need to show that X^* is complete. Let (α_n) be a Cauchy sequence in X^*, then for any x∈ X scalars α_n(x) form a Cauchy sequence, since | α_m(x)−α_n(x) |≤||α_m−α_n||·||x||. Thus the sequence has a limit and we define α by α(x)=lim_n→∞α_n(x). Clearly α is a linear functional on X. We should show that it is bounded and α_n→ α. Given є>0 there exists N such that ||α_n−α_m||<є for all n, m≥ N. If ||x||≤ 1 then | α_n(x)−α_m(x) |≤ є, let m→∞ then | α_n(x)−α(x) |≤ є, so

⎪
⎪

α(x)

⎪
⎪

≤

⎪
⎪

α_n(x)

⎪
⎪

+є≤

⎪⎪
⎪⎪

α_n

⎪⎪
⎪⎪

+ є,

i.e. ||α|| is finite and ||α_n−α||≤ є, thus α_n→α. □

Definition 9 The kernel of linear functional α, write kerα, is the set all vectors x∈ X such that α(x)=0.

Exercise 10 Show that

kerα is a subspace of X.
If α≢0 then obviously kerα ≠ X. Furthermore, if X has at least two linearly independent vectors then kerα ≠ {0}, thus kerα is a proper subspace of X.
If α is continuous then kerα is closed.

Study one and get any other for free!

Hilbert spaces sale

4.2 Self-duality of Hilbert space

Lemma 11 (Riesz–Fréchet) Let H be a Hilbert space and α a continuous linear functional on H, then there exists the unique y∈ H such that α(x)=⟨ x,y ⟩ for all x∈ H. Also ||α||_H^*=||y||_H.

Proof. Uniqueness: if ⟨ x,y ⟩=⟨ x,y′ ⟩ ⇔ ⟨ x,y−y′ ⟩=0 for all x∈ H then y−y′ is self-orthogonal and thus is zero (Exercise 2(1)).

Existence: we may assume that α≢0 (otherwise take y=0), then M=kerα is a closed proper subspace of H. Since H=M⊕ M^⊥, there exists a non-zero z∈ M^⊥, by scaling we could get α(z)=1. Then for any x∈ H:

x=(x−α(x)z)+α(x)z, with x−α(x)z∈ M, α(x)z∈ M^⊥.

Because ⟨ x,z ⟩=α(x)⟨ z,z ⟩=α(x)||z||² for any x∈ H we set y=z/||z||².

Equality of the norms ||α||_H^*=||y||_H follows from the Cauchy–Bunyakovskii–Schwarz inequality in the form α(x)≤ ||x||·||y|| and the identity α(y/||y||)=||y||. □

Example 12 On L₂[0,1] let α(f)=⟨ f,t² ⟩=∫₀¹ f(t)t² d t. Then

⎪⎪
⎪⎪

t²

⎪⎪
⎪⎪

⎛
⎜
⎜
⎝

∫

(t²)² d t

⎞
⎟
⎟
⎠

1/2

√

5 Fourier Analysis

All bases are equal, but some are more equal then others.

As we saw already any separable Hilbert space posses an orthonormal basis (infinitely many of them indeed). Are they equally good? This depends from our purposes. For solution of differential equation which arose in mathematical physics (wave, heat, Laplace equations, etc.) there is a proffered choice. The fundamental formula: d/dx e^ax=ae^ax reduces the derivative to a multiplication by a. We could benefit from this observation if the orthonormal basis will be constructed out of exponents. This helps to solve differential equations as was demonstrated in Subsection 0.2.

7.40pm Fourier series: Episode II

Today’s TV listing

5.1 Fourier series

Now we wish to address questions stated in Remark 9. Let us consider the space L₂[−π,π]. As we saw in Example 3(3) there is an orthonormal sequence e_n(t)=(2π)^−1/2e^int in L₂[−π,π]. We will show that it is an orthonormal basis, i.e.

f(t)∈ L₂[−π,π] ⇔ f(t)=

∞

∑

k=−∞

⟨ f,e_k ⟩e_k(t),

with convergence in L₂ norm. To do this we show that CLin{e_k:k∈ℤ}=L₂[−π,π].

Let CP[−π,π] denote the continuous functions f on [−π,π] such that f(π)=f(−π). We also define f outside of the interval [−π,π] by periodicity.

Lemma 1 The space CP[−π,π] is dense in L₂[−π,π].

Figure 9: A modification of continuous function to periodic

Proof. Let f∈L₂[−π,π]. Given є>0 there exists g∈ C[−π,π] such that ||f−g||<є/2. From continuity of g on a compact set follows that there is M such that | g(t) |<M for all t∈[−π,π].

We can now replace g by periodic g′, which coincides with g on [−π,π−δ] for an arbitrary δ>0 and has the same bounds: | g′(t) |<M, see Figure 9. Then

⎪⎪
⎪⎪

g−g′

⎪⎪
⎪⎪

₂²=

∫

π−δ

⎪
⎪

g(t)−g′(t)

⎪
⎪

² d t ≤ (2M)²δ.

So if δ<є²/(4M)² then ||g−g′||<є/2 and ||f−g′||<є. □

Now if we could show that CLin{e_k: k ∈ ℤ} includes CP[−π,π] then it also includes L₂[−π,π].

Notation 2 Let f∈CP[−π,π],write

f_n=

∑

k=−n

⟨ f,e_k ⟩ e_k , for n=0,1,2,… (24)

the partial sum of the Fourier series for f.

We want to show that ||f−f_n||₂→ 0. To this end we define nth Fejér sum by the formula

F_n=

f₀+f₁+⋯+f_n

n+1

, (25)

and show that

⎪⎪
⎪⎪

F_n−f

⎪⎪
⎪⎪

_∞ → 0.

Then we conclude

⎪⎪
⎪⎪

F_n−f

⎪⎪
⎪⎪

₂=

⎛
⎜
⎜
⎝

∫

−π

⎪
⎪

F_n(t)−f

⎪
⎪

⎞
⎟
⎟
⎠

1/2

≤ (2π)^1/2

⎪⎪
⎪⎪

F_n−f

⎪⎪
⎪⎪

_∞→ 0.

Since F_n∈Lin((e_n)) then f∈CLin((e_n)) and hence f=∑_−∞^∞⟨ f,e_k ⟩e_k.

Remark 3 It is not always true that ||f_n−f||_∞→ 0 even for f∈CP[−π,π].

Exercise 4 Find an example illustrating the above Remark.

The summation method used in (25) us useful not only in the context of Fourier series but for many other cases as well. In such a wider framework the method is known as .

It took 19 years of his life to prove this theorem

5.2 Fejér’s theorem

Proposition 5 (Fejér, age 19) Let f∈CP[−π,π]. Then

F_n(x)

2π

∫

−π

f(t) K_n(x−t) d t, where

(26)

K_n(t)

n+1

∑

k=0

∑

m=−k

e^imt,

(27)

is the Fejér kernel .

Proof. From notation (24):

f_k(x)

∑

m=−k

⟨ f,e_m ⟩ e_m(x)

∑

m=−k

∫

−π

f(t)

e^−imt

√

2π

 d t 

e^imx

√

2π

∫

−π

f(t)

∑

m=−k

e^im(x−t) d t.

Then from (25):

F_n(x)

n+1

∑

k=0

f_k(x)

n+1

2π

∑

k=0

∫

−π

f(t)

∑

m=−k

e^im(x−t) d t

2π

∫

−π

f(t)

n+1

∑

k=0

∑

m=−k

e^im(x−t) d t,

which finishes the proof. □

Lemma 6 The Fejér kernel is 2π-periodic, K_n(0)=n+1 and can be expressed as:

K_n(t)=

n+1

sin²

(n+1)t

sin²

, for t∉2πℤ. (28)

1

z⁻¹ 1 z

z⁻² z⁻¹ 1 z z²

⋮ ⋮ ⋮ ⋮ ⋮ ⋱

Table 1: Counting powers in rows and columns

Proof. Let z=e^it, then:

K_n(t)

n+1

∑

k=0

(z^−k+⋯+1+z+⋯+z^k)

n+1

∑

j=−n

(n+1−

⎪
⎪

)

z^j,

by switch from counting in rows to counting in columns in Table 1. Let w=e^it/2, i.e. z=w², then

K_n(t)

n+1

(w⁻²ⁿ+2w⁻²ⁿ⁺²+⋯+(n+1)+nw²+⋯+w²ⁿ)

n+1

(w⁻ⁿ+w⁻ⁿ⁺²+⋯+wⁿ⁻²+wⁿ)²

(29)

n+1

⎛
⎜
⎜
⎝

w⁻ⁿ⁻¹−wⁿ⁺¹

w⁻¹−w

⎞
⎟
⎟
⎠

Could you sum a geometric progression?

n+1

⎛
⎜
⎜
⎜
⎜
⎜
⎝

2isin

(n+1)t

2isin

⎞
⎟
⎟
⎟
⎟
⎟
⎠

if w≠ ± 1. For the value of K_n(0) we substitute w=1 into (29). □

Figure 10: A family of Fejér kernels with the parameter m running from 0 to 9 is on the left picture. For a comparison unregularised Fourier kernels are on the right picture.

The first eleven Fejér kernels are shown on Figure 10, we could observe that:

Lemma 7 Fejér’s kernel has the following properties:

K_n(t)≥0 for all t∈ ℝ and n∈ℕ.
∫_−π^πK_n(t) d t=2π.
For any δ∈ (0,π)

−δ

∫

−π

+
π

∫

δ

K_n(t) d t → 0 as n→ ∞.

Proof. The first property immediately follows from the explicit formula (28). In contrast the second property is easier to deduce from expression with double sum (27):

∫

−π

K_n(t) d t

∫

−π

n+1

∑

k=0

∑

m=−k

e^imt d t

n+1

∑

k=0

∑

m=−k

∫

−π

e^imt d t

n+1

∑

k=0

2π

2π,

since the formula (19).

Finally if | t |>δ then sin²(t/2)≥ sin²(δ/2)>0 by monotonicity of sinus on [0,π/2], so:

0≤ K_n(t) ≤

(n+1) sin²(δ/2)

implying:

0≤

∫

δ≤

⎪
⎪

≤ π

K_n(t)  d t ≤

1(π−δ)

(n+1) sin²(δ/2)

→ 0 as n→ 0.

Therefore the third property follows from the squeeze rule. □

Theorem 8 (Fejér Theorem) Let f∈CP[−π,π]. Then its Fejér sums F_n (25) converges in supremum norm to f on [−π,π] and hence in L₂ norm as well.

Proof. Idea of the proof: if in the formula (26)

F_n(x)=

2π

∫

−π

f(t) K_n(x−t) d t,

t is long way from x, K_n is small (see Lemma 7 and Figure 10), for t near x, K_n is big with total “weight” 2π, so the weighted average of f(t) is near f(x).

Here are details. Using property 7(2) and periodicity of f and K_n we could express trivially

f(x)= f(x)

2π

x+π

∫

x−π

K_n(x−t)  dt =

2π

x+π

∫

x−π

f(x) K_n(x−t) d t.

Similarly we rewrite (26) as

F_n(x)=

2π

x+π

∫

x−π

f(t) K_n(x−t) d t,

then

⎪
⎪

f(x)−F_n(x)

⎪
⎪

2π

⎪
⎪
⎪
⎪

x+π

∫

x−π

(f(x)−f(t)) K_n(x−t) d t

⎪
⎪
⎪
⎪

≤

2π

x+π

∫

x−π

⎪
⎪

f(x)−f(t)

⎪
⎪

K_n(x−t) d t.

Given є>0 split into three intervals: I₁=[x−π,x−δ], I₂=[x−δ,x+δ], I₃=[x+δ,x+π], where δ is chosen such that | f(t)−f(x) |<є/2 for t∈ I₂, which is possible by continuity of f. So

2π

∫

I₂

⎪
⎪

f(x)−f(t)

⎪
⎪

K_n(x−t) d t≤

2π

∫

I₂

K_n(x−t) d t <

And

2π

∫

I₁⋃ I₃

⎪
⎪

f(x)−f(t)

⎪
⎪

K_n(x−t)  dt

≤

⎪⎪
⎪⎪

_∞

2π

∫

I₁⋃ I₃

K_n(x−t)  dt

⎪⎪
⎪⎪

_∞

∫

δ<

⎪
⎪

<π

K_n(u)  du

if n is sufficiently large due to property 7(3) of K_n. Hence | f(x)−F_n(x) |<є for a large n independent of x. □

Remark 9 The above properties 7(1)–7(3) and their usage in the last proof can be generalised to the concept of approximation of the identity . See § 15.4 for a further example.

We almost finished the demonstration that e_n(t)=(2π)^−1/2e^int is an orthonormal basis of L₂[−π,π]:

Corollary 10 (Fourier series) Let f∈ L₂[−π,π], with Fourier series

∞

∑

n=−∞

⟨ f,e_n ⟩e_n=

∞

∑

n=−∞

c_ne^int where c_n=

⟨ f,e_n ⟩

√

2π

√

2π

∫

−π

f(t)e^−int d t.

Then the series ∑_−∞^∞⟨ f,e_n ⟩e_n=∑_−∞^∞c_ne^int converges in L₂[−π,π] to f, i.e

lim

k→ ∞

⎪⎪
⎪⎪
⎪⎪
⎪⎪

f−

∑

n=−k

c_ne^int

⎪⎪
⎪⎪
⎪⎪
⎪⎪

₂=0.

Proof. This follows from the previous Theorem, Lemma 1 about density of CP in L₂, and Theorem 15 on orthonormal basis. □

Remark 11 There is a reason why we had used the Fejér kernel and the Cezàro summation F_n (25) instead of plain partial sums f_n (24) of the Fourier series. It can be shown that point-wise convergence f_n → f does not hold for every continuous function f, cf. Cor. 41.

5.3 Parseval’s formula

The following result first appeared in the framework of L₂[−π,π] and only later was understood to be a general property of inner product spaces.

Theorem 12 (Parseval’s formula) If f, g∈L₂[−π,π] have Fourier series f=∑_n=−∞^∞c_ne^int and g=∑_n=−∞^∞d_ne^int, then

⟨ f,g ⟩=

∫

−π

f(t)

g(t)

 d t=2π

∞

∑

−∞

c_n

d_n

. (30)

More generally if f and g are two vectors of a Hilbert space H with an orthonormal basis (e_n)_−∞^∞ then

⟨ f,g ⟩=

∞

∑

k=−∞

c_n

d_n

, where c_n=⟨ f,e_n ⟩, d_n=⟨ g,e_n ⟩,

are the Fourier coefficients of f and g.

Proof. In fact we could just prove the second, more general, statement—the first one is its particular realisation. Let f_n=∑_k=−nⁿ c_ke_k and g_n=∑_k=−nⁿ d_ke_k will be partial sums of the corresponding Fourier series. Then from orthonormality of (e_n) and linearity of the inner product:

⟨ f_n,g_n ⟩=⟨

∑

k=−n

c_ke_k,

∑

k=−n

d_ke_k ⟩=

∑

k=−n

c_k

d_k

This formula together with the facts that f_k→ f and g_k→ g (following from Corollary 10) and Lemma about continuity of the inner product implies the assertion. □

Corollary 13 A integrable function f belongs to L₂[−π,π] if and only if its Fourier series is convergent and then ||f||²=2π∑_−∞^∞| c_k |².

Proof. The necessity, i.e. implication f∈L₂ ⇒ ⟨ f,f ⟩=||f||²=2π∑| c_k |² , follows from the previous Theorem. The sufficiency follows by Riesz–Fisher Theorem. □

Remark 14 The actual rôle of the Parseval’s formula is shadowed by the orthonormality and is rarely recognised until we meet the wavelets or coherent states. Indeed the equality (30) should be read as follows:

Theorem 15 (Modified Parseval) The map W: H → l₂ given by the formula [Wf](n)=⟨ f,e_n ⟩ is an isometry for any orthonormal basis (e_n).

We could find many other systems of vectors (e_x), x∈ X (very different from orthonormal bases) such that the map W: H → L₂(X) given by the simple universal formula

[Wf](x)=⟨ f,e_x ⟩ (31)

will be an isometry of Hilbert spaces. The map (31) is oftenly called wavelet transform and most famous is the Cauchy integral formula in complex analysis. The majority of wavelets transforms are linked with group representations , see our postgraduate course Wavelets in Applied and Pure Maths.

Heat and noise but not a fire?

Answer:

5.4 Some Application of Fourier Series

We are going to provide now few examples which demonstrate the importance of the Fourier series in many questions. The first two (Example 16 and Theorem 17) belong to pure mathematics and last two are of more applicable nature.

Example 16 Let f(t)=t on [−π,π]. Then

⟨ f,e_n ⟩=

∫

−π

te^−int d t=

⎧
⎪
⎪
⎨
⎪
⎪
⎩

(−1)ⁿ

2π i

n≠ 0

n=0

(check!),

so f(t)∼ ∑_−∞^∞(−1)ⁿ (i/n) e^int. By a direct integration:

⎪⎪
⎪⎪

₂²=

∫

−π

t² d t=

2π³

On the other hand by the previous Corollary:

⎪⎪
⎪⎪

₂²=2π

∑

n≠ 0

⎪
⎪
⎪
⎪

(−1)ⁿi

⎪
⎪
⎪
⎪

²=4π

∞

∑

n=1

n²

Thus we get a beautiful formula

∞

∑

n²

π²

Here is another important result.

Theorem 17 (Weierstrass Approximation Theorem) For any function f∈C[a,b] and any є>0 there exists a polynomial p such that ||f−p||_∞<є.

Proof. Change variable: t=2π(x−a+b/2)/(b−a) this maps x∈[a,b] onto t∈[−π,π]. Let P denote the subspace of polynomials in C[−π,π]. Then e^int∈$P_^$ for any n∈ℤ since Taylor series converges uniformly in [−π,π]. Consequently P contains the closed linear span in (supremum norm) of e^int, any n∈ℤ, which is CP[−π,π] by the Fejér theorem. Thus $P_^$⊇ CP[−π,π] and we extend that to non-periodic function as follows (why we could not make use of Lemma 1 here, by the way?).

For any f∈C[−π,π] let λ=(f(π)−f(−π))/(2π) then f₁(t)=f(t)−λ t∈ CP[−π,π] and could be approximated by a polynomial p₁(t) from the above discussion. Then f(t) is approximated by the polynomial p(t)=p₁(t)+λ t. □

It is easy to see, that the rôle of exponents e^int in the above prove is rather modest: they can be replaced by any functions which has a Taylor expansion. The real glory of the Fourier analysis is demonstrated in the two following examples.

Figure 11: The dynamics of a heat equation:

x—coordinate on the rod,

t—time,

T—temperature.

Example 18 The modern history of the Fourier analysis starts from the works of Fourier on the heat equation. As was mentioned in the introduction to this part, the exceptional role of Fourier coefficients for differential equations is explained by the simple formula ∂_x e^inx= ine^inx. We shortly review a solution of the heat equation to illustrate this.

Let we have a rod of the length 2π. The temperature at its point x∈[−π,π] and a moment t∈[0,∞) is described by a function u(t,x) on [0,∞)×[−π,π]. The mathematical equation describing a dynamics of the temperature distribution is:

∂ u(t,x)

∂ t

∂² u(t,x)

∂ x²

or, equivalently,

⎛
⎝

∂_t−∂_x²

⎞
⎠

u(t,x)=0. (32)

For any fixed moment t₀ the function u(t₀,x) depends only from x∈[−π,π] and according to Corollary 10 could be represented by its Fourier series:

u(t₀,x)=

∞

∑

n=−∞

⟨ u,e_n ⟩e_n=

∞

∑

n=−∞

c_n(t₀)e^inx,

where

c_n(t₀)=

⟨ u,e_n ⟩

√

2π

√

2π

∫

−π

u(t₀,x)e^−inx d x,

with Fourier coefficients c_n(t₀) depending from t₀. We substitute that decomposition into the heat equation (32) to receive:

⎛
⎝

∂_t−∂_x²

⎞
⎠

u(t,x)

⎛
⎝

∂_t−∂_x²

⎞
⎠

∞

∑

n=−∞

c_n(t)e^inx

∞

∑

n=−∞

⎛
⎝

∂_t−∂_x²

⎞
⎠

c_n(t)e^inx

∞

∑

n=−∞

(c′_n(t)+n²c_n(t))e^inx=0 .

(33)

Since function e^inx form a basis the last equation (33) holds if and only if

c′_n(t)+n²c_n(t)=0 for all n and t. (34)

Equations from the system (34) have general solutions of the form:

c_n(t)=c_n(0)e^−n²t for all t∈[0,∞), (35)

producing a general solution of the heat equation (32) in the form:

u(t,x)=

∞

∑

n=−∞

c_n(0)e^−n²te^inx =

∞

∑

n=−∞

c_n(0)e^−n²t+inx, (36)

where constant c_n(0) could be defined from boundary condition. For example, if it is known that the initial distribution of temperature was u(0,x)=g(x) for a function g(x)∈L₂[−π,π] then c_n(0) is the n-th Fourier coefficient of g(x).

The general solution (36) helps produce both the analytical study of the heat equation (32) and numerical simulation. For example, from (36) obviously follows that

the temperature is rapidly relaxing toward the thermal equilibrium with the temperature given by c₀(0), however never reach it within a finite time;
the “higher frequencies” (bigger thermal gradients) have a bigger speed of relaxation; etc.

The example of numerical simulation for the initial value problem with g(x)=2cos(2*u) + 1.5sin(u). It is clearly illustrate our above conclusions.

Figure 12: Two oscillation with unharmonious frequencies and the appearing dissonance. Click to listen the blue and green pure harmonics and red dissonance.

Figure 13: Graphics of G5 performed on different musical instruments (click on picture to hear the sound). Samples are taken from Sound Library.

Figure 14: Fourier series for G5 performed on different musical instruments (same order and colour as on the previous Figure)

(a) (b)
(c)
Figure 15: Limits of the Fourier analysis: different frequencies separated in time

Example 19 Among the oldest periodic functions in human culture are acoustic waves of musical tones. The mathematical theory of musics (including rudiments of the Fourier analysis!) is as old as mathematics itself and was highly respected already in Pythagoras’ school more 2500 years ago.

The earliest observations are that

The musical sounds are made of pure harmonics (see the blue and green graphs on the Figure 12), in our language cos and sin functions form a basis;
Not every two pure harmonics are compatible, to be their frequencies should make a simple ratio. Otherwise the dissonance (red graph on Figure 12) appears.

The musical tone, say G5, performed on different instruments clearly has something in common and different, see Figure 13 for comparisons. The decomposition into the pure harmonics, i.e. finding Fourier coefficient for the signal, could provide the complete characterisation, see Figure 14.

The Fourier analysis tells that:

All sound have the same base (i.e. the lowest) frequencies which corresponds to the G5 tone, i.e. 788 Gz.
The higher frequencies, which are necessarily are multiples of 788 Gz to avoid dissonance, appears with different weights for different instruments.

The Fourier analysis is very useful in the signal processing and is indeed the fundamental tool. However it is not universal and has very serious limitations. Consider the simple case of the signals plotted on the Figure 15(a) and (b). They are both made out of same two pure harmonics:

On the first signal the two harmonics (drawn in blue and green) follow one after another in time on Figure 15(a);
They just blended in equal proportions over the whole interval on Figure 15(b).

This appear to be two very different signals. However the Fourier performed over the whole interval does not seems to be very different, see Figure 15(c). Both transforms (drawn in blue-green and pink) have two major pikes corresponding to the pure frequencies. It is not very easy to extract differences between signals from their Fourier transform (yet this should be possible according to our study).

Even a better picture could be obtained if we use windowed Fourier transform , namely use a sliding “window” of the constant width instead of the entire interval for the Fourier transform. Yet even better analysis could be obtained by means of wavelets already mentioned in Remark 14 in connection with Plancherel’s formula. Roughly, wavelets correspond to a sliding window of a variable size—narrow for high frequencies and wide for low.

6 Operators

All the space’s a stage,
and all functionals and operators merely players!

All our previous considerations were only a preparation of the stage and now the main actors come forward to perform a play. The vectors spaces are not so interesting while we consider them in statics, what really make them exciting is the their transformations. The natural first steps is to consider transformations which respect both linear structure and the norm.

6.1 Linear operators

Definition 1 A linear operator T between two normed spaces X and Y is a mapping T:X→ Y such that T(λ v + µ u)=λ T(v) + µ T(u). The kernel of linear operator kerT and image are defined by

kerT ={x∈ X: Tx=0} Im T={y∈ Y: y=Tx, for some x∈ X}.

Exercise 2 Show that kernel of T is a linear subspace of X and image of T is a linear subspace of Y.

As usual we are interested also in connections with the second (topological) structure:

Definition 3 A norm of linear operator is defined:

⎪⎪
⎪⎪

=sup{

⎪⎪
⎪⎪

_Y:

⎪⎪
⎪⎪

_X≤ 1}. (37)

T is a bounded linear operator if ||T||=sup{||Tx||: ||x||}<∞.

Exercise 4 Show that ||Tx||≤ ||T||·||x|| for all x∈ X.

Example 5 Consider the following examples and determine kernel and images of the mentioned operators.

On a normed space X define the zero operator to a space Y by Z: x→ 0 for all x∈ X. Its norm is 0.
On a normed space X define the identity operator by I_X: x→ x for all x∈ X. Its norm is 1.
On a normed space X any linear functional define a linear operator from X to ℂ, its norm as operator is the same as functional.
The set of operators from ℂⁿ to ℂ^m is given by n× m matrices which acts on vector by the matrix multiplication. All linear operators on finite-dimensional spaces are bounded.
On l₂, let S(x₁,x₂,…)=(0,x₁,x₂,…) be the right shift operator . Clearly ||Sx||=||x|| for all x, so ||S||=1.

On L₂[a,b], let w(t)∈ C[a,b] and define multiplication operator M_wf by (M_w f)(t)=w(t)f(t). Now:

⎪⎪
⎪⎪

M_w f

⎪⎪
⎪⎪

∫

⎪
⎪

w(t)

⎪
⎪

f(t)

⎪
⎪

² d t

≤

K²

∫

⎪
⎪

f(t)

⎪
⎪

² d t, where K=

⎪⎪
⎪⎪

_∞=

sup

[a,b]

⎪
⎪

w(t)

⎪
⎪

so ||M_w||≤ K.

Exercise 6 Show that for multiplication operator in fact there is the equality of norms ||M_w||₂= ||w(t)||_∞.

Theorem 7 Let T: X → Y be a linear operator. The following conditions are equivalent:

T is continuous on X;
T is continuous at the point 0.
T is a bounded linear operator.

Proof. Proof essentially follows the proof of similar Theorem 4. □

6.2 Orthoprojections

Here we will use orthogonal complement, see § 3.5, to introduce a class of linear operators—orthogonal projections. Despite of (or rather due to) their extreme simplicity these operators are among most frequently used tools in the theory of Hilbert spaces.

Corollary 8 (of Thm. 23, about Orthoprojection) Let M be a closed linear subspace of a hilbert space H. There is a linear map P_M from H onto M (the orthogonal projection or orthoprojection) such that

P_M²=P_M, kerP_M=M^⊥, P_M^⊥=I−P_M. (38)

Proof. Let us define P_M(x)=m where x=m+n is the decomposition from the previous theorem. The linearity of this operator follows from the fact that both M and M^⊥ are linear subspaces. Also P_M(m)=m for all m∈ M and the image of P_M is M. Thus P_M²=P_M. Also if P_M(x)=0 then x⊥ M, i.e. kerP_M=M^⊥. Similarly P_M^⊥(x)=n where x=m+n and P_M+P_M^⊥=I. □

Example 9 Let (e_n) be an orthonormal basis in a Hilber space and let S⊂ ℕ be fixed. Let M=CLin{e_n: n∈ S} and M^⊥=CLin{e_n:n∈ ℕ∖ S}. Then

∞

∑

k=1

a_k e_k =

∑

k∈ S

a_k e_k +

∑

k∉S

a_k e_k.

Remark 10 In fact there is a one-to-one correspondence between closed linear subspaces of a Hilber space H and orthogonal projections defined by identities (38).

6.3 B(H) as a Banach space (and even algebra)

Theorem 11 Let B(X,Y) be the space of bounded linear operators from X and Y with the norm defined above. If Y is complete, then B(X,Y) is a Banach space.

Proof. The proof repeat proof of the Theorem 8, which is a particular case of the present theorem for Y=ℂ, see Example 5(3). □

Theorem 12 Let T∈ B(X,Y) and S∈B(Y,Z), where X, Y, and Z are normed spaces. Then ST∈B(X,Z) and ||ST||≤||S||||T||.

Proof. Clearly (ST)x=S(Tx)∈ Z, and

⎪⎪
⎪⎪

STx

⎪⎪
⎪⎪

≤

⎪⎪
⎪⎪

≤

⎪⎪
⎪⎪

which implies norm estimation if ||x||≤1. □

Corollary 13 Let T∈ B(X,X)=B(X), where X is a normed space. Then for any n≥ 1, Tⁿ∈B(X) and ||Tⁿ||≤ ||T||ⁿ.

Proof. It is induction by n with the trivial base n=1 and the step following from the previous theorem. □

Remark 14 Some texts use notations L(X,Y) and L(X) instead of ours B(X,Y) and B(X).

Definition 15 Let T∈ B(X,Y). We say T is an invertible operator if there exists S∈ B(Y,X) such that

ST= I_X and TS=I_Y.

Such an S is called the inverse operator of T.

Exercise 16 Show that

for an invertible operator T:X→ Y we have ker T={0} and ℑ T=Y.
the inverse operator is unique (if exists at all). (Assume existence of S and S′, then consider operator STS′.)

Example 17 We consider inverses to operators from Exercise 5.

The zero operator is never invertible unless the pathological spaces X=Y={0}.
The identity operator I_X is the inverse of itself.
A linear functional is not invertible unless it is non-zero and X is one dimensional.
An operator ℂⁿ→ ℂ^m is invertible if and only if m=n and corresponding square matrix is non-singular, i.e. has non-zero determinant.
The right shift S is not invertible on l₂ (it is one-to-one but is not onto). But the left shift operator T(x₁,x₂,…)=(x₂,x₃,…) is its left inverse , i.e. TS=I but TS≠I since ST(1,0,0,…)=(0,0,…). T is not invertible either (it is onto but not one-to-one), however S is its right inverse .
Operator of multiplication M_w is invertible if and only if w⁻¹∈C[a,b] and inverse is M_w⁻¹. For example M_1+t is invertible L₂[0,1] and M_t is not.

6.4 Adjoints

Theorem 18 Let H and K be Hilbert Spaces and T∈B(H,K). Then there exists operator T^*∈ B(K,H) such that

⟨ Th,k ⟩_K=⟨ h,T^*k ⟩_H for all h∈ H, k∈ K.

Such T^* is called the adjoint operator of T. Also T^**=T and ||T^*||=||T||.

Proof. For any fixed k∈ K the expression h:→ ⟨ Th,k ⟩_K defines a bounded linear functional on H. By the Riesz–Fréchet lemma there is a unique y∈ H such that ⟨ Th,k ⟩_K=⟨ h,y ⟩_H for all h∈ H. Define T^* k =y then T^* is linear:

⟨ h,T^*(λ₁k₁+λ₂k₂) ⟩_H	=	⟨ Th,λ₁k₁+λ₂k₂ ⟩_K
	=	λ₁⟨ Th,k₁ ⟩_K+λ₂⟨ Th,k₂ ⟩_K
	=	λ₁⟨ h,T^k₁ ⟩_H+λ₂⟨ h,T^k₂ ⟩_K
	=	⟨ h,λ₁T^k₁+λ₂T^k₂ ⟩_H

So T^*(λ₁k₁+λ₂k₂)=λ₁T^*k₁+λ₂T^*k₂. T^** is defined by ⟨ k,T^**h ⟩=⟨ T^*k,h ⟩ and the identity ⟨ T^**h,k ⟩=⟨ h,T^*k ⟩=⟨ Th,k ⟩ for all h and k shows T^**=T. Also:

⎪⎪
⎪⎪

T^* k

⎪⎪
⎪⎪

⟨ T^*k,T^*k ⟩=⟨ k,TT^*k ⟩

≤

⎪⎪
⎪⎪

TT^*k

⎪⎪
⎪⎪

≤

⎪⎪
⎪⎪

T^*k

⎪⎪
⎪⎪

which implies ||T^*k||≤||T||·||k||, consequently ||T^*||≤||T||. The opposite inequality follows from the identity ||T||=||T^**||. □

Exercise 19

For operators T₁ and T₂ show that
(T₁T₂)^*=T₂^*T₁^*, (T₁+T₂)^*=T₁^*+T₂^* (λ T)^*=λT^*.
If A is an operator on a Hilbert space H then (kerA)^⊥= Im A^*.

6.5 Hermitian, unitary and normal operators

Definition 20 An operator T: H→ H is a Hermitian operator or self-adjoint operator if T=T^*, i.e. ⟨ Tx,y ⟩=⟨ x,Ty ⟩ for all x, y∈ H.

Example 21

On l₂ the adjoint S^* to the right shift operator S is given by the left shift S^*=T, indeed:
⟨ Sx,y ⟩ = ⟨ (0,x₁,x₂,…),(y₁,y₂,…) ⟩

= x₁ȳ₂+x₂y_3+⋯=⟨ (x₁,x₂,…),(y₂,y₃,…) ⟩

= ⟨ x,Ty ⟩.

Thus S is not Hermitian.
Let D be diagonal operator on l₂ given by
D(x₁,x₂,…)=(λ₁ x₁, λ₂ x₂, …).

where (λ_k) is any bounded complex sequence. It is easy to check that ||D||=||(λ_n)||_∞=sup_k| λ_k | and
D^* (x₁,x₂,…)=(λ₁ x₁, λ₂ x₂, …),

thus D is Hermitian if and only if λ_k∈ℝ for all k.
If T: ℂⁿ→ ℂⁿ is represented by multiplication of a column vector by a matrix A, then T^* is multiplication by the matrix A^*—transpose and conjugate to A.

Exercise 22 Show that for any bounded operator T operators T_r=1/2(T+ T^*), T_i=1/2i(T− T^*), T^*T and TT^* are Hermitians. Note, that any operator is the linear combination of two hermitian operators: T=T_r+i T_i (cf. z= ℜ z + i ℑ z for z∈ℂ).

To appreciate the next Theorem the following exercise is useful:

Exercise 23 Let H be a Hilbert space. Show that

For x∈ H we have ||x||= sup { | ⟨ x,y ⟩ | for all y∈ H such that ||y||=1}.
For T∈B(H) we have
⎪⎪
⎪⎪ T ⎪⎪
⎪⎪ = sup {  ⎪
⎪ ⟨ Tx,y ⟩ ⎪
⎪ for all x,y∈ H such that ⎪⎪
⎪⎪ x ⎪⎪
⎪⎪ = ⎪⎪
⎪⎪ y ⎪⎪
⎪⎪ =1}. (39)

The next theorem says, that for a Hermitian operator T the supremum in (39) may be taken over the “diagonal” x=y only.

Theorem 24 Let T be a Hermitian operator on a Hilbert space. Then

⎪⎪
⎪⎪

sup

⎪⎪
⎪⎪

= 1

⎪
⎪

⟨ Tx,x ⟩

⎪
⎪

Proof. If Tx=0 for all x∈ H, both sides of the identity are 0. So we suppose that ∃ x∈ H for which Tx≠ 0.

We see that | ⟨ Tx,x ⟩ |≤ ||Tx||||x|| ≤ ||T||||x²||, so sup_{||x|| =1} | ⟨ Tx,x ⟩ |≤ ||T||. To get the inequality the other way around, we first write s:=sup_{||x|| =1} | ⟨ Tx,x ⟩ |. Then for any x∈ H, we have | ⟨ Tx,x ⟩ |≤ s||x²||.

We now consider

⟨ T(x+y),x+y ⟩ =⟨ Tx,x ⟩ +⟨ Tx,y ⟩+⟨ Ty,x ⟩ +⟨ Ty,y ⟩ = ⟨ Tx,x ⟩ +2ℜ ⟨ Tx,y ⟩ +⟨ Ty,y ⟩

(because T being Hermitian gives ⟨ Ty,x ⟩=⟨ y,Tx ⟩ =⟨ Tx,y ⟩) and, similarly,

⟨ T(x−y),x−y ⟩ = ⟨ Tx,x ⟩ −2ℜ ⟨ Tx,y ⟩ +⟨ Ty,y ⟩.

Subtracting gives

4ℜ ⟨ Tx,y ⟩

= ⟨ T(x+y),x+y ⟩−⟨ T(x−y),x−y ⟩

≤ s(

⎪⎪
⎪⎪

x+y

⎪⎪
⎪⎪

² +

⎪⎪
⎪⎪

x−y

⎪⎪
⎪⎪

²)

= 2s(

⎪⎪
⎪⎪

² +

⎪⎪
⎪⎪

²),

by the parallelogram identity.

Now, for x∈ H such that Tx≠ 0, we put y=||Tx||⁻¹||x|| Tx. Then ||y|| =||x|| and when we substitute into the previous inequality, we get

⎪⎪
⎪⎪

=4ℜ⟨ Tx,y ⟩ ≤ 4s

⎪⎪
⎪⎪

x²

⎪⎪
⎪⎪

So ||Tx||≤ s||x|| and it follows that ||T||≤ s, as required. □

Definition 25 We say that U:H→ H is a unitary operator on a Hilbert space H if U^*=U⁻¹, i.e. U^*U=UU^*=I.

Example 26

If D:l₂→l₂ is a diagonal operator such that D e_k=λ_k e_k, then D^* e_k=λ_k e_k and D is unitary if and only if | λ_k |=1 for all k.
The shift operator S satisfies S^*S=I but SS^*≠ I thus S is not unitary.

Theorem 27 For an operator U on a complex Hilbert space H the following are equivalent:

U is unitary;
U is surjection and an isometry , i.e. ||Ux||=||x|| for all x∈ H;
U is a surjection and preserves the inner product, i.e. ⟨ Ux,Uy ⟩=⟨ x,y ⟩ for all x, y∈ H.

Proof. 27(1)⇒27(2). Clearly unitarity of operator implies its invertibility and hence surjectivity. Also

⎪⎪
⎪⎪

²=⟨ Ux,Ux ⟩=⟨ x,U^*Ux ⟩=⟨ x,x ⟩=

⎪⎪
⎪⎪

².

27(2)⇒27(3). Using the polarisation identity (cf. polarisation in equation (16)):

4⟨ Tx,y ⟩

⟨ T(x+y),x+y ⟩+i⟨ T(x+iy),x+iy ⟩

−⟨ T(x−y),x−y ⟩−i⟨ T(x−iy),x−iy ⟩.

∑

k=0

i^k⟨ T(x+i^ky),x+i^ky ⟩

Take T=U^*U and T=I, then

4⟨ U^*Ux,y ⟩

∑

k=0

i^k⟨ U^*U(x+i^ky),x+i^ky ⟩

∑

k=0

i^k⟨ U(x+i^ky),U(x+i^ky) ⟩

∑

k=0

i^k⟨ (x+i^ky),(x+i^ky) ⟩

4⟨ x,y ⟩.

27(3)⇒27(1). Indeed ⟨ U^*U x,y ⟩=⟨ x,y ⟩ implies ⟨ (U^*U−I)x,y ⟩=0 for all x,y∈ H, then U^*U=I. Since U is surjective, for any y∈ H there is x∈ H such that y=Ux. Then, using the already established fact U^*U=I we get

UU^* y = UU^*(Ux) = U(U^*U)x = Ux= y.

Thus we have UU^*=I as well and U is unitary. □

Definition 28 A normal operator T is one for which T^*T=TT^*.

Example 29

Any self-adjoint operator T is normal, since T^*=T.
Any unitary operator U is normal, since U^*U=I=UU^*.
Any diagonal operator D is normal , since D e_k=λ_k e_k, D^* e_k=λ_k e_k, and DD^*e_k=D^*D e_k=| λ_k |² e_k.
The shift operator S is not normal.
A finite matrix is normal (as an operator on l₂ⁿ) if and only if it has an orthonormal basis in which it is diagonal.

Remark 30 Theorems 24 and 27(2) draw similarity between those types of operators and multiplications by complex numbers. Indeed Theorem 24 said that an operator which significantly change direction of vectors (“rotates”) cannot be Hermitian, just like a multiplication by a real number scales but do not rotate. On the other hand Theorem 27(2) says that unitary operator just rotate vectors but do not scale, as a multiplication by an unimodular complex number. We will see further such connections in Theorem 17.

7 Spectral Theory

Beware of ghosts² in this area!

As we saw operators could be added and multiplied each other, in some sense they behave like numbers, but are much more complicated. In this lecture we will associate to each operator a set of complex numbers which reflects certain (unfortunately not all) properties of this operator.

The analogy between operators and numbers become even more deeper since we could construct functions of operators (called functional calculus ) in a way we build numeric functions. The most important functions of this sort is called resolvent (see Definition 5). The methods of analytical functions are very powerful in operator theory and students may wish to refresh their knowledge of complex analysis before this part.

7.1 The spectrum of an operator on a Hilbert space

An eigenvalue of operator T∈B(H) is a complex number λ such that there exists a nonzero x∈ H, called eigenvector with property Tx=λ x, in other words x∈ker(T−λ I).

In finite dimensions T−λ I is invertible if and only if λ is not an eigenvalue. In infinite dimensions it is not the same: the right shift operator S is not invertible but 0 is not its eigenvalue because Sx=0 implies x=0 (check!).

Definition 1 The resolvent set ρ(T) of an operator T is the set

ρ (T)={λ∈ℂ: T−λ I is invertible}.

The spectrum of operator T∈B(H), denoted σ(T), is the complement of the resolvent set ρ(T):

σ(T)={λ∈ℂ: T−λ I is not invertible}.

Example 2 If H is finite dimensional the from previous discussion follows that σ(T) is the set of eigenvalues of T for any T.

Even this example demonstrates that spectrum does not provide a complete description for operator even in finite-dimensional case. For example, both operators in ℂ² given by matrices (

0	0
0	0

) and (

0	0
1	0

) have a single point spectrum {0}, however are rather different. The situation became even worst in the infinite dimensional spaces.

Theorem 3 The spectrum σ(T) of a bounded operator T is a nonempty compact (i.e. closed and bounded) subset of ℂ.

For the proof we will need several Lemmas.

Lemma 4 Let A∈B(H). If ||A||<1 then I−A is invertible in B(H) and inverse is given by the Neumann series (C. Neumann, 1877):

(I−A)⁻¹=I+A+A²+A³+…=

∞

∑

k=0

A^k. (40)

Proof. Define the sequence of operators B_n=I+A+⋯+A^N—the partial sums of the infinite series (40). It is a Cauchy sequence, indeed:

⎪⎪
⎪⎪

B_n−B_m

⎪⎪
⎪⎪

A^m+1+A^m+2+⋯+Aⁿ

⎪⎪
⎪⎪

(if n<m)

≤

⎪⎪
⎪⎪

A^m+1

⎪⎪
⎪⎪

A^m+2

⎪⎪
⎪⎪

+⋯+

⎪⎪
⎪⎪

Aⁿ

⎪⎪
⎪⎪

≤

⎪⎪
⎪⎪

^m+1+

⎪⎪
⎪⎪

^m+2+⋯+

⎪⎪
⎪⎪

ⁿ

≤

⎪⎪
⎪⎪

^m+1

1−

⎪⎪
⎪⎪

<є

for a large m. By the completeness of B(H) there is a limit, say B, of the sequence B_n. It is a simple algebra to check that (I−A)B_n=B_n(I−A)=I−Aⁿ⁺¹, passing to the limit in the norm topology, where Aⁿ⁺¹→ 0 and B_n→ B we get:

(I−A)B=B(I−A)=I ⇔ B=(I−A)⁻¹.

□

Definition 5 The resolventof an operator T is the operator valued function defined on the resolvent set by the formula:

R(λ,T)=(T−λ I)⁻¹. (41)

Corollary 6

If | λ |>||T|| then λ∈ ρ(T), hence the spectrum is bounded.
The resolvent set ρ(T) is open, i.e for any λ ∈ ρ(T) then there exist є>0 such that all µ with | λ−µ |<є are also in ρ(T), i.e. the resolvent set is open and the spectrum is closed.

Both statements together imply that the spectrum is compact.

Proof.

If | λ |>||T|| then ||λ⁻¹T||<1 and the operator T−λ I=−λ(I−λ⁻¹T) has the inverse
R(λ,T)= (T−λ I)⁻¹=−
∞

∑

k=0

λ^−k−1T^k. (42)

by the previous Lemma.
Indeed:
T−µ I = T−λ I + (λ−µ)I

= (T−λ I)(I+(λ−µ)(T−λ I)⁻¹).

The last line is an invertible operator because T−λ I is invertible by the assumption and I+(λ−µ)(T−λ I)⁻¹ is invertible by the previous Lemma, since ||(λ−µ)(T−λ I)⁻¹||<1 if є<||(T−λ I)⁻¹||.

□

Exercise 7

Prove the first resolvent identity :
R(λ,T)−R(µ,T)=(λ−µ)R(λ,T)R(µ,T) (43)
Use the identity (43) to show that (T−µ I)⁻¹→ (T−λ I)⁻¹ as µ→ λ.
Use the identity (43) to show that for z∈ρ(t) the complex derivative d/dz R(z,T) of the resolvent R(z,T) is well defined, i.e. the resolvent is an analytic function operator valued function of z.

Lemma 8 The spectrum is non-empty.

Proof. Let us assume the opposite, σ(T)=∅ then the resolvent function R(λ,T) is well defined for all λ∈ℂ. As could be seen from the von Neumann series (42) ||R(λ,T)||→ 0 as λ→ ∞. Thus for any vectors x, y∈ H the function f(λ)=⟨ R(λ,T)x,y) ⟩ is analytic (see Exercise 7(3)) function tensing to zero at infinity. Then by the Liouville theorem from complex analysis R(λ,T)=0, which is impossible. Thus the spectrum is not empty. □

Proof.[Proof of Theorem 3] Spectrum is nonempty by Lemma 8 and compact by Corollary 6. □

Remark 9 Theorem 3 gives the maximal possible description of the spectrum, indeed any non-empty compact set could be a spectrum for some bounded operator, see Problem 23.

7.2 The spectral radius formula

The following definition is of interest.

Definition 10 The spectral radius of T is

r(T)=sup{

⎪
⎪

: λ∈ σ(T)}.

From the Lemma 6(1) immediately follows that r(T)≤||T||. The more accurate estimation is given by the following theorem.

Theorem 11 For a bounded operator T we have

r(T)=

lim

n→∞

⎪⎪
⎪⎪

Tⁿ

⎪⎪
⎪⎪

^1/n. (44)

We start from the following general lemma:

Lemma 12 Let a sequence (a_n) of positive real numbers satisfies inequalities: 0≤ a_m+n≤ a_m+a_n for all m and n. Then there is a limit lim_n→∞(a_n/n) and its equal to inf_n(a_n/n).

Proof. The statements follows from the observation that for any n and m=nk+l with 0≤ l≤ n we have a_m≤ ka_n+la₁ thus, for big m we got a_m/m≤ a_n/n +la₁/m ≤ a_n/n+є. □

Proof.[Proof of Theorem 11] The existence of the limit lim_n→∞||Tⁿ||^1/n in (44) follows from the previous Lemma since by the Lemma 12 log||T^n+m||≤ log||Tⁿ||+log||T^m||. Now we are using some results from the complex analysis. The Laurent series for the resolvent R(λ,T) in the neighbourhood of infinity is given by the von Neumann series (42). The radius of its convergence (which is equal, obviously, to r(T)) by the Hadamard theorem is exactly lim_n→∞||Tⁿ||^1/n. □

Corollary 13 There exists λ∈σ(T) such that | λ |=r(T).

Proof. Indeed, as its known from the complex analysis the boundary of the convergence circle of a Laurent (or Taylor) series contain a singular point, the singular point of the resolvent is obviously belongs to the spectrum. □

Example 14 Let us consider the left shift operator S^*, for any λ∈ℂ such that | λ | <1 the vector (1,λ,λ²,λ³,…) is in l₂ and is an eigenvector of S^* with eigenvalue λ, so the open unit disk | λ |<1 belongs to σ(S^*). On the other hand spectrum of S^* belongs to the closed unit disk | λ |≤ 1 since r(S^*)≤ ||S^*||=1. Because spectrum is closed it should coincide with the closed unit disk, since the open unit disk is dense in it. Particularly 1∈σ(S^*), but it is easy to see that 1 is not an eigenvalue of S^*.

Proposition 15 For any T∈B(H) the spectrum of the adjoint operator is σ(T^*)={λ: λ∈ σ(T)}.

Proof. If (T−λ I)V=V(T−λ I)=I the by taking adjoints V^*(T^*−λI)=(T^*−λI)V^*=I. So λ ∈ ρ(T) implies λ∈ρ(T^*), using the property T^**=T we could invert the implication and get the statement of proposition. □

Example 16 In continuation of Example 14 using the previous Proposition we conclude that σ(S) is also the closed unit disk, but S does not have eigenvalues at all!

7.3 Spectrum of Special Operators

Theorem 17

If U is a unitary operator then σ(U)⊆ {| z |=1}.
If T is Hermitian then σ(T)⊆ ℝ.

Proof.

If | λ |>1 then ||λ⁻¹U||<1 and then λ I−U=λ(I−λ⁻¹U) is invertible, thus λ∉σ(U). If | λ |<1 then ||λ U^*||<1 and then λ I−U=U (λ U^*−I) is invertible, thus λ∉σ(U). The remaining set is exactly {z:| z |=1}.
Without lost of generality we could assume that ||T||<1, otherwise we could multiply T by a small real scalar. Let us consider the Cayley transform which maps real axis to the unit circle:
U=(T−iI)(T+iI)⁻¹.

Straightforward calculations show that U is unitary if T is Hermitian. Let us take λ∉ℝ and λ≠ −i (this case could be checked directly by Lemma 4). Then the Cayley transform µ=(λ−i)(λ+i)⁻¹ of λ is not on the unit circle and thus the operator
U−µ I=(T−iI)(T+iI)⁻¹−(λ−i)(λ+i)⁻¹I= 2i(λ+i)⁻¹(T−λ I)(T+iI)⁻¹,

is invertible, which implies invertibility of T−λ I. So λ∉ℝ.

□

The above reduction of a self-adjoint operator to a unitary one (it can be done on the opposite direction as well!) is an important tool which can be applied in other questions as well, e.g. in the following exercise.

Exercise 18

Show that an operator U: f(t) ↦ e^itf(t) on L₂[0,2π] is unitary and has the entire unit circle {| z |=1} as its spectrum .
Find a self-adjoint operator T with the entire real line as its spectrum.

8 Compactness

It is not easy to study linear operators “in general” and there are many questions about operators in Hilbert spaces raised many decades ago which are still unanswered. Therefore it is reasonable to single out classes of operators which have (relatively) simple properties. Such a class of operators more closed to finite dimensional ones will be studied here.

These operators are so compact that we even can fit them in our course

8.1 Compact operators

Let us recall some topological definition and results.

Definition 1 A compact set in a metric space is defined by the property that any its covering by a family of open sets contains a subcovering by a finite subfamily.

In the finite dimensional vector spaces ℝⁿ or ℂⁿ there is the following equivalent definition of compactness (equivalence of 2(1) and 2(2) is known as Heine–Borel theorem ):

Theorem 2 If a set E in ℝⁿ or ℂⁿ has any of the following properties then it has other two as well:

E is bounded and closed;
E is compact;
Any infinite subset of E has a limiting point belonging to E.

Exercise^* 3 Which equivalences from above are not true any more in the infinite dimensional spaces?

Definition 4 Let X and Y be normed spaces, T∈B(X,Y) is a finite rank operator if Im T is a finite dimensional subspace of Y. T is a compact operator if whenever (x_i)₁^∞ is a bounded sequence in X then its image (T x_i)₁^∞ has a convergent subsequence in Y.

The set of finite rank operators is denote by F(X,Y) and the set of compact operators—by K(X,Y)

Exercise 5 Show that both F(X,Y) and K(X,Y) are linear subspaces of B(X,Y).

We intend to show that F(X,Y)⊂K(X,Y).

Lemma 6 Let Z be a finite-dimensional normed space. Then there is a number N and a mapping S: l₂^N → Z which is invertible and such that S and S⁻¹ are bounded.

Proof. The proof is given by an explicit construction. Let N=dimZ and z₁, z₂, …, z_N be a basis in Z. Let us define

S: l₂^N → Z by S(a₁,a₂,…,a_N)=

∑

k=1

a_k z_k,

then we have an estimation of norm:

⎪⎪
⎪⎪

⎪⎪
⎪⎪
⎪⎪
⎪⎪

∑

k=1

a_k z_k

⎪⎪
⎪⎪
⎪⎪
⎪⎪

≤

∑

k=1

⎪
⎪

a_k

⎪
⎪

⎪⎪
⎪⎪

z_k

⎪⎪
⎪⎪

≤

⎛
⎜
⎜
⎝

∑

k=1

⎪
⎪

a_k

⎪
⎪

⎞
⎟
⎟
⎠

1/2

⎛
⎜
⎜
⎝

∑

k=1

⎪⎪
⎪⎪

z_k

⎪⎪
⎪⎪

⎞
⎟
⎟
⎠

1/2

So ||S||≤ (∑₁^N ||z_k||²)^1/2 and S is continuous.

Clearly S has the trivial kernel, particularly ||Sa||>0 if ||a||=1. By the Heine–Borel theorem the unit sphere in l₂^N is compact, consequently the continuous function a↦ ||∑₁^N a_k z_k|| attains its lower bound, which has to be positive. This means there exists δ>0 such that ||a||=1 implies ||Sa||>δ , or, equivalently if ||z||<δ then ||S⁻¹ z||<1. The later means that ||S⁻¹||≤ δ⁻¹ and boundedness of S⁻¹. □

Corollary 7 For any two metric spaces X and Y we have F(X,Y)⊂ K(X,Y).

Proof. Let T∈F(X,Y), if (x_n)₁^∞ is a bounded sequence in X then ((Tx_n)₁^∞⊂ Z=Im T is also bounded. Let S: l₂^N→ Z be a map constructed in the above Lemma. The sequence (S⁻¹T x_n)₁^∞ is bounded in l₂^N and thus has a limiting point, say a₀. Then Sa₀ is a limiting point of (T x_n)₁^∞. □

There is a simple condition which allows to determine which diagonal operators are compact (particularly the identity operator I_X is not compact if dimX =∞):

Proposition 8 Let T is a diagonal operator and given by identities T e_n=λ_n e_n for all n in a basis e_n. T is compact if and only if λ_n→ 0.

Figure 16: Distance between scales of orthonormal vectors

Proof. If λ_n↛0 then there exists a subsequence λ_{n_k} and δ>0 such that | λ_{n_k} |>δ for all k. Now the sequence (e_{n_k}) is bounded but its image T e_{n_k}=λ _{n_k} e_{n_k} has no convergent subsequence because for any k≠ l:

⎪⎪
⎪⎪

λ _{n_k}e_{n_k}−λ _{n_l}e_{n_l}

⎪⎪
⎪⎪

= (

⎪
⎪

λ _{n_k}

⎪
⎪

² +

⎪
⎪

λ _{n_l}

⎪
⎪

²)^1/2≥

√

δ ,

i.e. T e_{n_k} is not a Cauchy sequence, see Figure 16. For the converse, note that if λ_n→ 0 then we can define a finite rank operator T_m, m≥ 1—m-“truncation” of T by:

T_m e_n =

⎧
⎨
⎩

Te_n=λ_n e_n,	1≤ n≤ m;
0 ,	n>m.

(45)

Then obviously

(T−T_m) e_n =

⎧
⎨
⎩

0,	1≤ n≤ m;
λ_n e_n ,	n>m,

and ||T−T_m||=sup_n>m| λ_n |→ 0 if m→ ∞. All T_m are finite rank operators (so are compact) and T is also compact as their limit—by the next Theorem. □

Theorem 9 Let T_m be a sequence of compact operators convergent to an operator T in the norm topology (i.e. ||T−T_m||→ 0) then T is compact itself. Equivalently K(X,Y) is a closed subspace of B(X,Y).

Figure 17: The є/3 argument to estimate | f(x)−f(y) |.

T₁x₁⁽¹⁾ T₁x₂⁽¹⁾ T₁x₃⁽¹⁾ … T₁x_n⁽¹⁾ … → a₁

T₂x₁⁽²⁾ T₂x₂⁽²⁾ T₂x₃⁽²⁾ … T₂x_n⁽²⁾ … → a₂

T₃x₁⁽³⁾ T₃x₂⁽³⁾ T₃x₃⁽³⁾ … T₃x_n⁽³⁾ … → a₃

… … … … … …

T_nx₁⁽ⁿ⁾ T_nx₂⁽ⁿ⁾ T_nx₃⁽ⁿ⁾ … T_nx_n⁽ⁿ⁾ … → a_n

… … … … … … ↓

↘

a

Table 2: The “diagonal argument”.

Proof. Take a bounded sequence (x_n)₁^∞. From compactness

of T₁	⇒ ∃	subsequence (x_n⁽¹⁾)₁^∞ of (x_n)₁^∞	s.t.	(T₁x_n⁽¹⁾)₁^∞ is convergent.
of T₂	⇒ ∃	subsequence (x_n⁽²⁾)₁^∞ of (x_n⁽¹⁾)₁^∞	s.t.	(T₂x_n⁽²⁾)₁^∞ is convergent.
of T₃	⇒ ∃	subsequence (x_n⁽³⁾)₁^∞ of (x_n⁽²⁾)₁^∞	s.t.	(T₃x_n⁽³⁾)₁^∞ is convergent.
…	…	…	…	…

Could we find a subsequence which converges for all T_m simultaneously? The first guess “take the intersection of all above sequences (x_n^(k))₁^∞” does not work because the intersection could be empty. The way out is provided by the diagonal argument (see Table 2): a subsequence (T_m x_k^(k))₁^∞ is convergent for all m, because at latest after the term x_m^(m) it is a subsequence of (x_k^(m))₁^∞.

We are claiming that a subsequence (T x_k^(k))₁^∞ of (T x_n)₁^∞ is convergent as well. We use here є/3 argument (see Figure 17): for a given є>0 choose p∈ℕ such that ||T−T_p||<є/3. Because (T_p x_k^(k))→ 0 it is a Cauchy sequence, thus there exists n₀>p such that ||T_p x_k^(k)−T_p x_l^(l)||< є/3 for all k, l>n₀. Then:

⎪⎪
⎪⎪

T x_k^(k)−T x_l^(l)

⎪⎪
⎪⎪

(T x_k^(k)−T_p x_k^(k))+(T_p x_k^(k)−T_p x_l^(l))+(T_p x_l^(l)−T x_l^(l))

⎪⎪
⎪⎪

≤

⎪⎪
⎪⎪

T x_k^(k)−T_p x_k^(k)

⎪⎪
⎪⎪

T_p x_k^(k)−T_p x_l^(l)

⎪⎪
⎪⎪

T_p x_l^(l)−T x_l^(l)

⎪⎪
⎪⎪

≤

Thus T is compact. □

8.2 Hilbert–Schmidt operators

Definition 10 Let T: H→ K be a bounded linear map between two Hilbert spaces. Then T is said to be Hilbert–Schmidt operator if there exists an orthonormal basis in H such that the series ∑_k=1^∞||T e_k||² is convergent.

Example 11

Let T: l₂→ l₂ be a diagonal operator defined by Te_n=e_n/n, for all n≥ 1. Then ∑ ||Te_n||²=∑n⁻²=π²/6 (see Example 16) is finite.
The identity operator I_H is not a Hilbert–Schmidt operator, unless H is finite dimensional.

A relation to compact operator is as follows.

Theorem 12 All Hilbert–Schmidt operators are compact. (The opposite inclusion is false, give a counterexample!)

Proof. Let T∈ B(H,K) have a convergent series ∑ ||T e_n||² in an orthonormal basis (e_n)₁^∞ of H. We again (see (45)) define the m-truncation of T by the formula

T_m e_n =

⎧
⎨
⎩

Te_n,	1≤ n≤ m;
0 ,	n>m.

(46)

Then T_m(∑₁^∞a_k e_k)=∑₁^m a_k e_k and each T_m is a finite rank operator because its image is spanned by the finite set of vectors Te₁, …, Te_n. We claim that ||T−T_m||→ 0. Indeed by linearity and definition of T_m:

(T−T_m)

⎛
⎜
⎜
⎝

∞

∑

n=1

a_n e_n

⎞
⎟
⎟
⎠

∞

∑

n=m+1

a_n (Te_n).

Thus:

⎪⎪
⎪⎪
⎪⎪
⎪⎪

(T−T_m)

⎛
⎜
⎜
⎝

∞

∑

n=1

a_n e_n

⎞
⎟
⎟
⎠

⎪⎪
⎪⎪
⎪⎪
⎪⎪

∞

∑

n=m+1

a_n (Te_n)

⎪⎪
⎪⎪
⎪⎪
⎪⎪

(47)

≤

∞

∑

n=m+1

⎪
⎪

a_n

⎪
⎪

⎪⎪
⎪⎪

(Te_n)

⎪⎪
⎪⎪

≤

⎛
⎜
⎜
⎝

∞

∑

n=m+1

⎪
⎪

a_n

⎪
⎪

⎞
⎟
⎟
⎠

1/2

⎛
⎜
⎜
⎝

∞

∑

n=m+1

⎪⎪
⎪⎪

(Te_n)

⎪⎪
⎪⎪

⎞
⎟
⎟
⎠

1/2

≤

⎪⎪
⎪⎪
⎪⎪
⎪⎪

∞

∑

n=1

a_n e_n

⎪⎪
⎪⎪
⎪⎪
⎪⎪

⎛
⎜
⎜
⎝

∞

∑

n=m+1

⎪⎪
⎪⎪

(Te_n)

⎪⎪
⎪⎪

⎞
⎟
⎟
⎠

1/2

(48)

so ||T−T_m||→ 0 and by the previous Theorem T is compact as a limit of compact operators. □

Corollary 13 (from the above proof) For a Hilbert–Schmidt operator

⎪⎪
⎪⎪

≤

⎛
⎜
⎜
⎝

∞

∑

n=m+1

⎪⎪
⎪⎪

(Te_n)

⎪⎪
⎪⎪

⎞
⎟
⎟
⎠

1/2

Proof. Just consider difference of T and T₀=0 in (47)–(48). □

Example 14 An integral operator T on L₂[0,1] is defined by the formula:

(T f)(x)=

∫

K(x,y)f(y) d y, f(y)∈L₂[0,1], (49)

where the continuous on [0,1]×[0,1] function K is called the kernel of integral operator .

Theorem 15 Integral operator (49) is Hilbert–Schmidt.

Proof. Let (e_n)_−∞^∞ be an orthonormal basis of L₂[0,1], e.g. (e^{2π i
nt})_n∈ℤ. Let us consider the kernel K_x(y)=K(x,y) as a function of the argument y depending from the parameter x. Then:

(T e_n)(x)=

∫

K(x,y)e_n(y) d y=

∫

K_x(y)e_n(y) d y= ⟨ K_x,ē_n ⟩.

So ||T e_n||²= ∫₀¹| ⟨ K_x,ē_n ⟩ |² d x. Consequently:

∞

∑

−∞

⎪⎪
⎪⎪

T e_n

⎪⎪
⎪⎪

∞

∑

−∞

∫

⎪
⎪

⟨ K_x,ē_n ⟩

⎪
⎪

² d x

∫

∞

∑

⎪
⎪

⟨ K_x,ē_n ⟩

⎪
⎪

² d x

(50)

∫

⎪⎪
⎪⎪

K_x

⎪⎪
⎪⎪

² d x

∫

⎪
⎪

K(x,y)

⎪
⎪

² d x d y < ∞

Exercise 16 Justify the exchange of summation and integration in (50).

□

Remark 17 The definition 14 and Theorem 15 work also for any T: L₂[a,b] → L₂[c,d] with a continuous kernel K(x,y) on [c,d]×[a,b].

Definition 18 Define Hilbert–Schmidt norm of a Hilbert–Schmidt operator A by ||A||_HS²=∑_n=1^∞||Ae_n||² (it is independent of the choice of orthonormal basis (e_n)₁^∞, see Question 27).

Exercise^* 19 Show that set of Hilbert–Schmidt operators with the above norm is a Hilbert space and find the an expression for the inner product.

Example 20 Let K(x,y)=x−y, then

(Tf)(x)=

∫

(x−y)f(y) d y =x

∫

f(y) d y −

∫

yf(y) d y

is a rank 2 operator. Furthermore:

⎪⎪
⎪⎪

_HS²

∫

(x−y)² d x d y =

∫

⎡
⎢
⎢
⎣

(x−y)³

⎤
⎥
⎥
⎦

x=0

 d y

∫

(1−y)³

y³

 d y=

⎡
⎢
⎢
⎣

−

(1−y)⁴

y⁴

⎤
⎥
⎥
⎦

On the other hand there is an orthonormal basis such that

Tf=

√

⟨ f,e₁ ⟩e₁−

√

⟨ f,e₂ ⟩e₂,

and ||T||=1/√12 and ∑₁² ||Te_k||²=1/6 and we get ||T||≤ ||T||_HS in agreement with Corollary 13.

9 Compact normal operators

Recall from Section 6.5 that an operator T is normal if TT^*=T^*T; Hermitian (T^*=T) and unitary (T^*=T⁻¹) operators are normal.

9.1 Spectrum of normal operators

Theorem 1 Let T∈B(H) be a normal operator then

kerT =kerT^*, so ker(T−λ I) =ker (T^*−λI) for all λ∈ℂ
Eigenvectors corresponding to distinct eigenvalues are orthogonal.
||T||=r(T).

Proof.

Obviously:
x∈kerT ⇔ ⟨ Tx,Tx ⟩=0 ⇔ ⟨ T^*Tx,x ⟩=0

⇔ ⟨ TT^*x,x ⟩=0 ⇔ ⟨ T^*x,T^*x ⟩=0

⇔ x∈kerT^*.

The second part holds because normalities of T and T−λ I are equivalent.
If Tx=λ x, Ty=µ y then from the previous statement T^* y =µy. If λ≠µ then the identity
λ⟨ x,y ⟩=⟨ Tx,y ⟩ =⟨ x,T^*y ⟩=µ⟨ x,y ⟩

implies ⟨ x,y ⟩=0.
Let S=T^*T, then S is Hermitian (check!). Consequently, inequality
⎪⎪
⎪⎪ Sx ⎪⎪
⎪⎪ ²=⟨ Sx,Sx ⟩=⟨ S²x,x ⟩≤ ⎪⎪
⎪⎪ S² ⎪⎪
⎪⎪ ⎪⎪
⎪⎪ x ⎪⎪
⎪⎪ ²

implies ||S||²≤ ||S²||. But the opposite inequality follows from the Theorem 12, thus we have the equality ||S²||=||S||² and more generally by induction: ||S^{2^m}||=||S||^{2^m} for all m.
Now we claim ||S||=||T||². From Theorem 12 and 18 we get ||S||=||T^*T||≤ ||T||². On the other hand if ||x||=1 then
⎪⎪
⎪⎪ T^*T ⎪⎪
⎪⎪ ≥ ⎪
⎪ ⟨ T^*Tx,x ⟩ ⎪
⎪ =⟨ Tx,Tx ⟩= ⎪⎪
⎪⎪ Tx ⎪⎪
⎪⎪ ²

implies the opposite inequality ||S||≥||T||². Only now we use normality of T to obtain (T^{2^m})^*T^{2^m}=(T^*T)^{2^m} and get the equality
⎪⎪
⎪⎪ T^{2^m} ⎪⎪
⎪⎪ ²= ⎪⎪
⎪⎪ (T^*T)^{2^m} ⎪⎪
⎪⎪ = ⎪⎪
⎪⎪ T^*T ⎪⎪
⎪⎪ ^{2^m} = ⎪⎪
⎪⎪ T ⎪⎪
⎪⎪ ^{2^m+1}.

Thus:
r(T)=

lim

m→∞

⎪⎪
⎪⎪ T^{2^m} ⎪⎪
⎪⎪ ^{1/2^m}=

lim

m→∞

⎪⎪
⎪⎪ T ⎪⎪
⎪⎪ ^{2^m+1/2^m+1} = ⎪⎪
⎪⎪ T ⎪⎪
⎪⎪ .

by the spectral radius formula (44).

□

Example 2 It is easy to see that normality is important in 1(3), indeed the non-normal operator T given by the matrix (

0	1
0	0

) in ℂ has one-point spectrum {0}, consequently r(T)=0 but ||T||=1.

Lemma 3 Let T be a compact normal operator then

The set of of eigenvalues of T is either finite or a countable sequence tending to zero.
All the eigenspaces, i.e. ker(T−λ I), are finite-dimensional for all λ≠ 0.

Remark 4 This Lemma is true for any compact operator, but we will not use that in our course.

Proof.

Let H₀ be the closed linear span of eigenvectors of T. Then T restricted to H₀ is a diagonal compact operator with the same set of eigenvalues λ_n as in H. Then λ_n→ 0 from Proposition 8 .
Exercise 5 Use the proof of Proposition 8 to give a direct demonstration.
Proof.[Solution] Or straightforwardly assume opposite: there exist an δ>0 and infinitely many eigenvalues λ_n such that | λ_n |>δ. By the previous Theorem there is an orthonormal sequence v_n of corresponding eigenvectors T v_n=λ_n v_n. Now the sequence (v_n) is bounded but its image T v_n=λ _n e_n has no convergent subsequence because for any k≠ l:
⎪⎪
⎪⎪ λ _kv_k−λ _le_l ⎪⎪
⎪⎪ = ( ⎪
⎪ λ _k ⎪
⎪ ² + ⎪
⎪ λ_l ⎪
⎪ ²)^1/2≥ √

2

δ ,

i.e. T e_{n_k} is not a Cauchy sequence, see Figure 16. □
Similarly if H₀=ker(T−λ I) is infinite dimensional, then restriction of T on H₀ is λ I—which is non-compact by Proposition 8. Alternatively consider the infinite orthonormal sequence (v_n), Tv_n=λ v_n as in Exercise 5.

□

Lemma 6 Let T be a compact normal operator. Then all non-zero points λ∈ σ(T) are eigenvalues and there exists an eigenvalue of modulus ||T||.

Proof. Assume without lost of generality that T≠ 0. Let λ∈σ(T), without lost of generality (multiplying by a scalar) λ=1.

We claim that if 1 is not an eigenvalue then there exist δ>0 such that

⎪⎪
⎪⎪

(I−T)x

⎪⎪
⎪⎪

≥ δ

⎪⎪
⎪⎪

. (51)

Otherwise there exists a sequence of vectors (x_n) with unit norm such that (I−T)x_n→ 0. Then from the compactness of T for a subsequence (x_{n_k}) there is y∈ H such that Tx_{n_k} → y, then x_n→ y implying Ty=y and y≠ 0—i.e. y is eigenvector with eigenvalue 1.

Now we claim Im (I−T) is closed, i.e. y∈Im(I−T) implies y∈Im(I−T). Indeed, if (I−T)x_n → y, then there is a subsequence (x_{n_k}) such that Tx_{n_k}→ z implying x_{n_k}→ y+z, then (I−T)(z+y)=y by continuity of I−T.

Finally I−T is injective, i.e ker(I−T)={0}, by (51). By the property 1(1), ker(I−T^*)={0} as well. But because always ker(I−T^*)=Im(I−T)^⊥ (by 19(2)) we got surjectivity, i.e. Im(I−T)^⊥={0}, of I−T. Thus (I−T)⁻¹ exists and is bounded because (51) implies ||y||>δ ||(I−T)⁻¹y||. Thus 1∉σ(T).

The existence of eigenvalue λ such that | λ |=||T|| follows from combination of Lemma 13 and Theorem 1(3). □

9.2 Compact normal operators

Theorem 7 (The spectral theorem for compact normal operators) Let T be a compact normal operator on a Hilbert space H. Then there exists an orthonormal sequence (e_n) of eigenvectors of T and corresponding eigenvalues (λ_n) such that:

Tx=

∑

λ_n ⟨ x,e_n ⟩ e_n, for all x∈ H. (52)

If (λ_n) is an infinite sequence it tends to zero.

Conversely, if T is given by a formula (52) then it is compact and normal.

Proof. Suppose T≠ 0. Then by the previous Theorem there exists an eigenvalue λ₁ such that | λ₁ |=||T|| with corresponding eigenvector e₁ of the unit norm. Let H₁=Lin(e₁)^⊥. If x∈ H₁ then

⟨ Tx,e₁ ⟩=⟨ x,T^*e₁ ⟩=⟨ x,λ₁ e₁ ⟩=λ₁⟨ x,e₁ ⟩=0, (53)

thus Tx∈ H₁ and similarly T^* x ∈ H₁. Write T₁=T|_H₁ which is again a normal compact operator with a norm does not exceeding ||T||. We could inductively repeat this procedure for T₁ obtaining sequence of eigenvalues λ₂, λ₃, …with eigenvectors e₂, e₃, …. If T_n=0 for a finite n then theorem is already proved. Otherwise we have an infinite sequence λ_n→ 0. Let

∑

⟨ x,e_k ⟩e_k +y_n ⇒

⎪⎪
⎪⎪

²=

∑

⎪
⎪

⟨ x,e_k ⟩

⎪
⎪

² +

⎪⎪
⎪⎪

y_n

⎪⎪
⎪⎪

² , y_n∈ H_n,

from Pythagoras’s theorem. Then ||y_n||≤ ||x|| and ||T y_n||≤ ||T_n||||y_n||≤ | λ_n |||x||→ 0 by Lemma 3. Thus

T x =

lim

n→ ∞

⎛
⎜
⎜
⎝

∑

⟨ x,e_n ⟩ Te_n + Ty_n

⎞
⎟
⎟
⎠

∞

∑

λ_n⟨ x,e_n ⟩ e_n

Conversely, if T x = ∑₁^∞λ_n⟨ x,e_n ⟩ e_n then

⟨ Tx,y ⟩=

∞

∑

λ_n⟨ x,e_n ⟩ ⟨ e_n,y ⟩ =

∞

∑

⟨ x,e_n ⟩ λ_n

⟨ y,e_n ⟩

thus T^* y = ∑₁^∞λ_n⟨ y,e_n ⟩ e_n. Then we got the normality of T: T^*Tx=TT^*x= ∑₁^∞| λ_n |²⟨ y,e_n ⟩ e_n. Also T is compact because it is a uniform limit of the finite rank operators T_nx=∑₁ⁿ λ_n⟨ x,e_n ⟩e_n. □

Corollary 8 Let T be a compact normal operator on a separable Hilbert space H, then there exists a orthonormal basis g_k such that

Tx=

∞

∑

λ_n⟨ x,g_n ⟩ g_n,

and λ_n are eigenvalues of T including zeros.

Proof. Let (e_n) be the orthonormal sequence constructed in the proof of the previous Theorem. Then x is perpendicular to all e_n if and only if its in the kernel of T. Let (f_n) be any orthonormal basis of kerT. Then the union of (e_n) and (f_n) is the orthonormal basis (g_n) we have looked for. □

Exercise 9 Finish all details in the above proof.

Corollary 10 (Singular value decomposition) If T is any compact operator on a separable Hilbert space then there exists orthonormal sequences (e_k) and (f_k) such that Tx=∑_k µ_k ⟨ x,e_k ⟩ f_k where (µ_k) is a sequence of positive numbers such that µ_k→ 0 if it is an infinite sequence.

Proof. Operator T^*T is compact and Hermitian (hence normal). From the previous Corollary there is an orthonormal basis (e_k) such that T^*T x= ∑_n λ_n⟨ x,e_k ⟩e_k for some positive λ_n=||T e_n||². Let µ_n=||Te_n|| and f_n=Te_n/µ_n. Then f_n is an orthonormal sequence (check!) and

Tx=

∑

⟨ x,e_n ⟩ Te_n =

∑

⟨ x,e_n ⟩ µ_n f_n.

□

Corollary 11 A bounded operator in a Hilber space is compact if and only if it is a uniform limit of the finite rank operators.

Proof. Sufficiency follows from 9.
Necessity: by the previous Corollary Tx =∑_n ⟨ x,e_n ⟩ µ_n f_n thus T is a uniform limit of operators T_m x=∑_n=1^m ⟨ x,e_n ⟩ µ_n f_n which are of finite rank. □

10 Integral equations

In this lecture we will study the Fredholm equation defined as follows. Let the integral operator with a kernel K(x,y) defined on [a,b]×[a,b] be defined as before:

(Tφ)(x)=

∫

K(x,y)φ(y) d y. (54)

The Fredholm equation of the first and second kinds correspondingly are:

Tφ=f and φ −λ Tφ=f, (55)

for a function f on [a,b]. A special case is given by Volterra equation by an operator integral operator (54) T with a kernel K(x,y)=0 for all y>x which could be written as:

(Tφ)(x)=

∫

K(x,y)φ(y) d y. (56)

We will consider integral operators with kernels K such that ∫_a^b∫_a^b K(x,y) d x d y<∞, then by Theorem 15 T is a Hilbert–Schmidt operator and in particular bounded.

As a reason to study Fredholm operators we will mention that solutions of differential equations in mathematical physics (notably heat and wave equations) requires a decomposition of a function f as a linear combination of functions K(x,y) with “coefficients” φ. This is an continuous analog of a discrete decomposition into Fourier series.

Using ideas from the proof of Lemma 4 we define Neumann series for the resolvent:

(I−λ T)⁻¹=I+λ T + λ²T²+⋯, (57)

which is valid for all λ<||T||⁻¹.

Example 1 Solve the Volterra equation

φ(x)−λ

∫

y φ(y) d y=x², on L₂[0,1].

In this case I−λ T φ = f, with f(x)=x² and:

K(x,y)=

⎧
⎨
⎩

y,	0≤ y ≤ x;
0,	x< y ≤ 1.

Straightforward calculations shows:

(Tf)(x)

∫

y· y² d y=

x⁴

(T²f)(x)

∫

y⁴

 d y=

x⁶

, …

and generally by induction:

(Tⁿf)(x) =

∫

y²ⁿ

2ⁿ⁻¹n!

 d y=

x²ⁿ⁺²

2ⁿ(n+1)!

Hence:

φ(x)

∞

∑

λⁿTⁿ f =

∞

∑

λⁿx²ⁿ⁺²

2ⁿ(n+1)!

∞

∑

λⁿ⁺¹x²ⁿ⁺²

2ⁿ⁺¹(n+1)!

(e^λ x²/2−1) for all λ ∈ ℂ∖ {0},

because in this case r(T)=0. For the Fredholm equations this is not always the case, see Tutorial problem 29.

Among other integral operators there is an important subclass with separable kernel , namely a kernel which has a form:

K(x,y)=

∑

j=1

g_j(x)h_j(y). (58)

In such a case:

(Tφ)(x)

∫

∑

j=1

g_j(x)h_j(y)φ(y) d y

∑

j=1

g_j(x)

∫

h_j(y)φ(y) d y,

i.e. the image of T is spanned by g₁(x), …, g_n(x) and is finite dimensional, consequently the solution of such equation reduces to linear algebra.

Example 2 Solve the Fredholm equation (actually find eigenvectors of T):

φ(x)

2π

∫

cos(x+y)φ(y) d y

2π

∫

(cosxcosy − sinx siny)φ(y) d y.

Clearly φ (x) should be a linear combination φ(x)=Acos x+Bsinx with coefficients A and B satisfying to:

2π

∫

cosy (Acosy+Bsiny) d y,

−λ

2π

∫

siny (Acosy+Bsiny) d y.

Basic calculus implies A=λπ A and B=−λπ B and the only nonzero solutions are:

λ=π⁻¹	A ≠ 0	B = 0
λ=−π⁻¹	A = 0	B ≠ 0

We develop some Hilbert–Schmidt theory for integral operators.

Theorem 3 Suppose that K(x,y) is a continuous function on [a,b]×[a,b] and K(x,y)=K(y,x) and operator T is defined by (54). Then

T is a self-adjoint Hilbert–Schmidt operator.
All eigenvalues of T are real and satisfy ∑_n λ_n²<∞.
The eigenvectors v_n of T can be chosen as an orthonormal basis of L₂[a,b], are continuous for nonzero λ_n and
Tφ=
∞

∑

n=1

λ_n ⟨ φ,v_n ⟩v_n where φ=
∞

∑

n=1

⟨ φ,v_n ⟩v_n

Proof.

The condition K(x,y)=K(y,x) implies the Hermitian property of T:

⟨ Tφ,ψ ⟩

∫

⎛
⎜
⎜
⎝

∫

K(x,y)φ(y) d y

⎞
⎟
⎟
⎠

ψ(x) d x

∫

K(x,y)φ(y) ψ(x) d x d y

∫

φ(y)

⎛
⎜
⎜
⎝

∫

K(y,x) ψ(x)

 d x

⎞
⎟
⎟
⎠

⟨ φ,Tψ ⟩.

The Hilbert–Schmidt property (and hence compactness) was proved in Theorem 15.

Spectrum of T is real as for any Hermitian operator, see Theorem 17(2) and finiteness of ∑_n λ_n² follows from Hilbert–Schmidt property
The existence of orthonormal basis consisting from eigenvectors (v_n) of T was proved in Corollary 8. If λ_n≠ 0 then:
v_n(x₁)−v_n(x₂) = λ_n⁻¹((Tv_n)(x₁)−(Tv_n)(x₂))

=
1

λ_n

b

∫

a

(K(x₁,y)−K(x₂,y))v_n(y) d y

and by Cauchy–Schwarz-Bunyakovskii inequality:
⎪
⎪ v_n(x₁)−v_n(x₂) ⎪
⎪ ≤
1

⎪
⎪ λ_n ⎪
⎪

⎪⎪
⎪⎪ v_n ⎪⎪
⎪⎪ ₂
b

∫

a

⎪
⎪ K(x₁,y)−K(x₂,y) ⎪
⎪  d y

which tense to 0 due to (uniform) continuity of K(x,y).

□

Theorem 4 Let T be as in the previous Theorem. Then if λ≠ 0 and λ⁻¹∉σ(T), the unique solution φ of the Fredholm equation of the second kind φ−λ T φ=f is

φ=

∞

∑

⟨ f,v_n ⟩

1−λ λ_n

v_n. (59)

Proof. Let φ=∑₁^∞a_n v_n where a_n=⟨ φ,v_n ⟩, then

φ−λ Tφ=

∞

∑

a_n(1−λ λ_n) v_n =f=

∞

∑

⟨ f,v_n ⟩v_n

if and only if a_n=⟨ f,v_n ⟩/(1−λ λ_n) for all n. Note 1−λ λ_n≠ 0 since λ⁻¹∉σ(T).

Because λ_n→ 0 we got ∑₁^∞| a_n |² by its comparison with ∑₁^∞| ⟨ f,v_n ⟩ |²=||f||², thus the solution exists and is unique by the Riesz–Fisher Theorem. □

See Exercise 30 for an example.

Theorem 5 (Fredholm alternative) Let T∈K(H) be compact normal and λ∈ℂ∖ {0}. Consider the equations:

φ−λ Tφ	=	0	(60)
φ−λ Tφ	=	f	(61)

then either

the only solution to (60) is φ=0 and (61) has a unique solution for any f∈ H; or
there exists a nonzero solution to (60) and (61) can be solved if and only if f is orthogonal all solutions to (60).

Proof.

If φ=0 is the only solution of (60), then λ⁻¹ is not an eigenvalue of T and then by Lemma 6 is neither in spectrum of T. Thus I−λ T is invertible and the unique solution of (61) is given by φ=(I−λ T)⁻¹ f.
A nonzero solution to (60) means that λ⁻¹∈σ(T). Let (v_n) be an orthonormal basis of eigenvectors of T for eigenvalues (λ_n). By Lemma 3(2) only a finite number of λ_n is equal to λ⁻¹, say they are λ₁, …, λ_N, then
(I−λ T)φ=
∞

∑

n=1

(1−λ λ_n)⟨ φ,v_n ⟩v_n =
∞

∑

n=N+1

(1−λ λ_n)⟨ φ,v_n ⟩v_n.

If f=∑₁^∞⟨ f,v_n ⟩v_n then the identity (I−λ T)φ=f is only possible if ⟨ f,v_n ⟩=0 for 1≤ n≤ N. Conversely from that condition we could give a solution
φ=
∞

∑

n=N+1

⟨ f,v_n ⟩

1−λ λ_n

v_n +φ₀, for any φ₀∈Lin(v₁,…,v_N),

which is again in H because f∈ H and λ_n→ 0.

□

Example 6 Let us consider

(Tφ)(x)=

∫

(2xy−x−y+1)φ(y) d y.

Because the kernel of T is real and symmetric T=T^*, the kernel is also separable:

(Tφ)(x)=x

∫

(2y−1)φ(y) d y+

∫

(−y+1)φ(y) d y,

and T of the rank 2 with image of T spanned by 1 and x. By direct calculations:

↦

x +

or T is given by the matrix

⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝

⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠

According to linear algebra decomposition over eigenvectors is:

λ₁=

with vector

⎛
⎜
⎝

⎞
⎟
⎠

λ₂=

with vector

⎛
⎜
⎜
⎜
⎜
⎝

−

⎞
⎟
⎟
⎟
⎟
⎠

with normalisation v₁(y)=1, v₂(y)=√12(y−1/2) and we complete it to an orthonormal basis (v_n) of L₂[0,1]. Then

If λ≠ 2 or 6 then (I−λ T)φ = f has a unique solution (cf. equation (59)):

∑

n=1

⟨ f,v_n ⟩

1−λ λ_n

v_n +

∞

∑

n=3

⟨ f,v_n ⟩ v_n

∑

n=1

⟨ f,v_n ⟩

1−λ λ_n

v_n +

⎛
⎜
⎜
⎝

f−

∑

n=1

⟨ f,v_n ⟩ v_n)

⎞
⎟
⎟
⎠

∑

n=1

λλ_n

1−λ λ_n

⟨ f,v_n ⟩v_n.

If λ=2 then the solutions exist provided ⟨ f,v₁ ⟩=0 and are:
φ=f+
λλ₂

1−λ λ₂

⟨ f,v₂ ⟩v₂+Cv₁=f+
1

2

⟨ f,v₂ ⟩v₂+Cv₁, C∈ℂ.
If λ=6 then the solutions exist provided ⟨ f,v₂ ⟩=0 and are:
φ=f+
λλ₁

1−λ λ₁

⟨ f,v₁ ⟩v₁+Cv₂=f−
3

2

⟨ f,v₂ ⟩v₂+Cv₂, C∈ℂ.

11 Banach and Normed Spaces

We will work with either the field of real numbers ℝ or the complex numbers ℂ. To avoid repetition, we use K to denote either ℝ or ℂ.

11.1 Normed spaces

Recall, see Defn. 3, a norm on a vector space V is a map ||·||:V→[0,∞) such that

||u||=0 only when u=0;
||λ u|| = | λ | ||u|| for λ∈K and u∈ V;
||u+v|| ≤ ||u|| + ||v|| for u,v∈ V.

Note, that the second and third conditions imply that linear operations—multiplication by a scalar and addition of vectors respectively—are continuous in the topology defined by the norm.

A norm induces a metric, see Defn. 1, on V by setting d(u,v)=||u−v||. When V is complete, see Defn. 6, for this metric, we say that V is a Banach space.

Theorem 1 Every finite-dimensional normed vector space is a Banach space.

We will use the following simple inequality:

Lemma 2 (Young’s inequality) Let two real numbers 1<p,q<∞ are related through 1/p+1/q=1 then

⎪
⎪

≤

⎪
⎪

, (62)

for any complex a and b.

Proof.[First proof: analytic] Obviously, it is enough to prove inequality for positive reals a=| a | and b=| b |. If p>1 then 0<1/p < 1. Consider the function φ(t)=t^m−mt for an 0<m<1. From its derivative φ(t)=m(t^m−1−1) we find the only critical point t=1 on [0,∞), which is its maximum for m=1/p<1. Thus write the inequality φ(t)≤ φ(1) for t=a^p/b^q and m=1/p. After a transformation we get a· b^−q/p−1≤ 1/p(a^pb^−q−1) and multiplication by b^q with rearrangements lead to the desired result. □

Proof.[Second proof: geometric] Consider the plane with coordinates (x,y) and take the curve y=x^p−1 which is the same as x=y^q−1. Comparing areas on the figure:

we see that S₁+S₂≥ ab for any positive reals a and b. Elementary integration shows:

S₁=

∫

x^p−1 d x=

a^p

, S₂=

∫

y^q−1 d y=

b^q

This finishes the demonstration. □

Remark 3 You may notice, that the both proofs introduced some specific auxiliary functions related to x^p/p. It is a fruitful generalisation to conduct the proofs for more functions and derive respective forms of Young’s inequality.

Proposition 4 (Hölder’s Inequality) For 1<p<∞, let q∈(1,∞) be such that 1/p + 1/q = 1. For n≥1 and u,v∈Kⁿ, we have that

∑

j=1

⎪
⎪

u_j v_j

⎪
⎪

≤

⎛
⎜
⎜
⎝

∑

j=1

⎪
⎪

u_j

⎪
⎪

⎞
⎟
⎟
⎠

⎛
⎜
⎜
⎝

∑

j=1

⎪
⎪

v_j

⎪
⎪

⎞
⎟
⎟
⎠

Proof. For reasons become clear soon we use the notation ||u||_p=( ∑_j=1ⁿ | u_j |^p )^1/p and ||v||_q= ( ∑_j=1ⁿ | v_j |^q )^1/q and define for 1≤ i ≤ n:

a_i=

u_i

⎪⎪
⎪⎪

and b_i=

v_i

⎪⎪
⎪⎪

Summing up for 1≤ i ≤ n all inequalities obtained from (62):

⎪
⎪

a_i b_i

⎪
⎪

≤

⎪
⎪

a_i

⎪
⎪

b_i

⎪
⎪

we get the result. □

Using Hölder inequality we can derive the following one:

Proposition 5 (Minkowski’s Inequality) For 1<p<∞, and n≥ 1, let u,v∈Kⁿ. Then

⎛
⎜
⎜
⎝

∑

j=1

⎪
⎪

u_j+v_j

⎪
⎪

⎞
⎟
⎟
⎠

1/p

≤

⎛
⎜
⎜
⎝

∑

j=1

⎪
⎪

u_j

⎪
⎪

⎞
⎟
⎟
⎠

1/p

⎛
⎜
⎜
⎝

∑

j=1

⎪
⎪

v_j

⎪
⎪

⎞
⎟
⎟
⎠

1/p

Proof. For p>1 we have:

∑

⎪
⎪

u_k+v_k

⎪
⎪

^p =

∑

⎪
⎪

u_k

⎪
⎪

u_k+v_k

⎪
⎪

^p−1 +

∑

⎪
⎪

v_k

⎪
⎪

u_k+v_k

⎪
⎪

^p−1. (63)

By Hölder inequality

∑

⎪
⎪

u_k

⎪
⎪

u_k+v_k

⎪
⎪

^p−1 ≤

⎛
⎜
⎜
⎝

∑

⎪
⎪

u_k

⎪
⎪

⎞
⎟
⎟
⎠

⎛
⎜
⎜
⎝

∑

⎪
⎪

u_k+v_k

⎪
⎪

^q(p−1)

⎞
⎟
⎟
⎠

Adding a similar inequality for the second term in the right hand side of (63) and division by (∑₁ⁿ | u_k+v_k |^q(p−1))^1/q yields the result. □

Minkowski’s inequality shows that for 1≤ p<∞ (the case p=1 is easy) we can define a norm ||·||_p on Kⁿ by

⎪⎪
⎪⎪

_p =

⎛
⎜
⎜
⎝

∑

j=1

⎪
⎪

u_j

⎪
⎪

⎞
⎟
⎟
⎠

1/p

( u =(u₁,⋯,u_n)∈Kⁿ ).

See, Figure 2 for illustration of various norms of this type defined in ℝ².

We can define an infinite analogue of this. Let 1≤ p<∞, let l_p be the space of all scalar sequences (x_n) with ∑_n | x_n |^p < ∞. A careful use of Minkowski’s inequality shows that l_p is a vector space. Then l_p becomes a normed space for the ||·||_p norm. Note also, that l₂ is the Hilbert space introduced before in Example 12(2).

Recall that a Cauchy sequence, see Defn. 5, in a normed space is bounded: if (x_n) is Cauchy then we can find N with ||x_n−x_m||<1 for all n,m≥ N. Then ||x_n|| ≤ ||x_n−x_N|| + ||x_N|| < ||x_N||+1 for n≥ N, so in particular, ||x_n|| ≤ max( ||x₁||,||x₂||,⋯,||x_N−1||,||x_N||+1).

Theorem 6 For 1≤ p<∞, the space l_p is a Banach space.

Remark 7 Most completeness proofs (in particular, all completeness proof in this course) are similar to the next one, see also Thm. 24. The general scheme of those proofs has three steps:

For a general Cauchy sequence we build “limit” in some point-wise sense.
At this stage it is not clear either the constructed “limit” is at our space at all, that is shown on the second step.
From the construction it does not follows that the “limit” is really the limit in the topology of our space, that is the third step of the proof.

Proof. We repeat the proof of Thm. 24 changing 2 to p. Let (x⁽ⁿ⁾) be a Cauchy-sequence in l_p; we wish to show this converges to some vector in l_p.

For each n, x⁽ⁿ⁾∈l_p so is a sequence of scalars, say (x_k⁽ⁿ⁾)_k=1^∞. As (x⁽ⁿ⁾) is Cauchy, for each є>0 there exists N_є so that ||x⁽ⁿ⁾ − x^(m)||_p ≤ є for n,m≥ N_є.

For k fixed,

⎪
⎪

x_k⁽ⁿ⁾ − x_k^(m)

⎪
⎪

≤

⎛
⎜
⎜
⎝

∑

⎪
⎪

x_j⁽ⁿ⁾ − x_j^(m)

⎪
⎪

⎞
⎟
⎟
⎠

1/p

⎪⎪
⎪⎪

x⁽ⁿ⁾ − x^(m)

⎪⎪
⎪⎪

_p ≤ є,

when n,m≥ N_є. Thus the scalar sequence (x_k⁽ⁿ⁾)_n=1^∞ is Cauchy in K and hence converges, to x_k say. Let x=(x_k), so that x is a candidate for the limit of (x⁽ⁿ⁾).

Firstly, we check that x−x⁽ⁿ⁾∈l_p for some n. Indeed, for a given є>0 find n₀ such that ||x⁽ⁿ⁾−x^(m)||<є for all n,m>n₀. For any K and m:

∑

k=1

⎪
⎪

x_k⁽ⁿ⁾−x_k^(m)

⎪
⎪

^p ≤

⎪⎪
⎪⎪

x⁽ⁿ⁾−x^(m)

⎪⎪
⎪⎪

^p<є^p.

Let m→ ∞ then ∑_k=1^K | x_k⁽ⁿ⁾−x_k |^p ≤ є^p.
Let K→ ∞ then ∑_k=1^∞| x_k⁽ⁿ⁾−x_k |^p ≤ є^p. Thus x⁽ⁿ⁾−x∈l_p and because l_p is a linear space then x = x⁽ⁿ⁾−(x⁽ⁿ⁾−x) is also in l_p.

Finally, we saw above that for any є >0 there is n₀ such that ||x⁽ⁿ⁾−x||<є for all n>n₀. Thus x⁽ⁿ⁾→ x. □

For p=∞, there are two analogies to the l_p spaces. First, we define l_∞ to be the vector space of all bounded scalar sequences, with the sup-norm (||·||_∞-norm):

⎪⎪
⎪⎪

(x_n)

⎪⎪
⎪⎪

_∞ =

sup

n∈ℕ

⎪
⎪

x_n

⎪
⎪

( (x_n)∈ l_∞ ). (64)

Second, we define c₀ to be the space of all scalar sequences (x_n) which converge to 0. We equip c₀ with the sup norm (64). This is defined, as if x_n→0, then (x_n) is bounded. Hence c₀ is a subspace of l_∞, and we can check (exercise!) that c₀ is closed.

Theorem 8 The spaces c₀ and l_∞ are Banach spaces.

Proof. This is another variant of the previous proof of Thm. 6. We do the l_∞ case. Again, let (x⁽ⁿ⁾) be a Cauchy sequence in l_∞, and for each n, let x⁽ⁿ⁾=(x_k⁽ⁿ⁾)_k=1^∞. For є>0 we can find N such that ||x⁽ⁿ⁾−x^(m)||_∞ < є for n,m≥ N. Thus, for any k, we see that | x_k⁽ⁿ⁾ − x_k^(m) | < є when n,m≥ N. So (x_k⁽ⁿ⁾)_n=1^∞ is Cauchy, and hence converges, say to x_k∈K. Let x=(x_k).

Let m≥ N, so that for any k, we have that

⎪
⎪

x_k − x_k^(m)

⎪
⎪

lim

n→∞

⎪
⎪

x_k⁽ⁿ⁾ − x_k^(m)

⎪
⎪

≤ є.

As k was arbitrary, we see that sup_k | x_k−x_k^(m) | ≤ є. So, firstly, this shows that (x−x^(m))∈l_∞, and so also x = (x−x^(m)) + x^(m) ∈ l_∞. Secondly, we have shown that ||x−x^(m)||_∞ ≤ є when m≥ N, so x^(m)→ x in norm. □

Example 9 We can also consider a Banach space of functions L_p[a,b] with the norm

⎪⎪
⎪⎪

_p=

⎛
⎜
⎜
⎜
⎜
⎝

∫

⎪
⎪

f(t)

⎪
⎪

^p d t

⎞
⎟
⎟
⎟
⎟
⎠

1/p

See the discussion after Defn. 22 for a realisation of such spaces.

11.2 Bounded linear operators

Recall what a linear map is, see Defn. 1. A linear map is often called an operator. A linear map T:E→ F between normed spaces is bounded if there exists M>0 such that ||T(x)|| ≤ M ||x|| for x∈ E, see Defn. 3. We write B(E,F) for the set of operators from E to F. For the natural operations, B(E,F) is a vector space. We norm B(E,F) by setting

⎪⎪
⎪⎪

= sup

⎧
⎪
⎨
⎪
⎩

⎪⎪
⎪⎪

T(x)

⎪⎪
⎪⎪

: x∈ E, x≠0

⎫
⎪
⎬
⎪
⎭

. (65)

Exercise 10 Show that

The expression (65) is a norm in the sense of Defn. 3.

We equivalently have

⎪⎪
⎪⎪

= sup

⎧
⎨
⎩

⎪⎪
⎪⎪

T(x)

⎪⎪
⎪⎪

: x∈ E,

⎪⎪
⎪⎪

≤1

⎫
⎬
⎭

= sup

⎧
⎨
⎩

⎪⎪
⎪⎪

T(x)

⎪⎪
⎪⎪

: x∈ E,

⎪⎪
⎪⎪

⎫
⎬
⎭

Proposition 11 For a linear map T:E→ F between normed spaces, the following are equivalent:

T is continuous (for the metrics induced by the norms on E and F);
T is continuous at 0;
T is bounded.

Proof. Proof essentially follows the proof of similar Theorem 4. See also discussion about usefulness of this theorem there. □

Theorem 12 Let E be a normed space, and let F be a Banach space. Then B(E,F) is a Banach space.

Proof. In the essence, we follows the same three-step procedure as in Thms. 24, 6 and 8. Let (T_n) be a Cauchy sequence in B(E,F). For x∈ E, check that (T_n(x)) is Cauchy in F, and hence converges to, say, T(x), as F is complete. Then check that T:E→ F is linear, bounded, and that ||T_n−T||→ 0. □

We write B(E) for B(E,E). For normed spaces E, F and G, and for T∈B(E,F) and S∈B(F,G), we have that ST=S∘ T∈B(E,G) with ||ST|| ≤ ||S|| ||T||.

For T∈B(E,F), if there exists S∈B(F,E) with ST=I_E, the identity of E, and TS=I_F, then T is said to be invertible, and write T=S⁻¹. In this case, we say that E and F are isomorphic spaces, and that T is an isomorphism.

If ||T(x)||=||x|| for each x∈ E, we say that T is an isometry. If additionally T is an isomorphism, then T is an isometric isomorphism, and we say that E and F are isometrically isomorphic.

11.3 Dual Spaces

Let E be a normed vector space, and let E^* (also written E′) be B(E,K), the space of bounded linear maps from E to K, which we call functionals, or more correctly, bounded linear functionals, see Defn. 1. Notice that as K is complete, the above theorem shows that E^* is always a Banach space.

Theorem 13 Let 1<p<∞, and again let q be such that 1/p+1/q=1. Then the map l_q→(l_p)^*: u↦φ_u, is an isometric isomorphism, where φ_u is defined, for u=(u_j)∈l_q, by

φ_u(x) =

∞

∑

j=1

u_j x_j

⎛
⎝

x=(x_j)∈l_p

⎞
⎠

Proof. By Hölder’s inequality, we see that

⎪
⎪

φ_u(x)

⎪
⎪

≤

∞

∑

j=1

⎪
⎪

u_j

⎪
⎪

x_j

⎪
⎪

≤

⎛
⎜
⎜
⎝

∞

∑

j=1

⎪
⎪

u_j

⎪
⎪

⎞
⎟
⎟
⎠

1/q

⎛
⎜
⎜
⎝

∞

∑

j=1

⎪
⎪

x_j

⎪
⎪

⎞
⎟
⎟
⎠

1/p

⎪⎪
⎪⎪

_p.

So the sum converges, and hence φ_u is defined. Clearly φ_u is linear, and the above estimate also shows that ||φ_u|| ≤ ||u||_q. The map u↦ φ_u is also clearly linear, and we’ve just shown that it is norm-decreasing.

Now let φ∈(l_p)^*. For each n, let e_n = (0,⋯,0,1,0,⋯) with the 1 in the nth position. Then, for x=(x_n)∈l_p,

⎪⎪
⎪⎪
⎪⎪
⎪⎪

x −

∑

k=1

x_k e_k

⎪⎪
⎪⎪
⎪⎪
⎪⎪

_p =

⎛
⎜
⎜
⎝

∞

∑

k=n+1

⎪
⎪

x_k

⎪
⎪

⎞
⎟
⎟
⎠

1/p

→ 0,

as n→∞. As φ is continuous, we see that

φ(x) =

lim

n→∞

∑

k=1

φ(x_ke_k) =

∞

∑

k=1

x_k φ(e_k).

Let u_k=φ(e_k) for each k. If u=(u_k)∈l_q then we would have that φ=φ_u.

Let us fix N∈ℕ, and define

x_k =

⎧
⎪
⎨
⎪
⎩

if u_k=0 or k>N;

u_k

⎪
⎪

u_k

⎪
⎪

^q−2,

if u_k≠0 and k≤ N.

Then we see that

∞

∑

k=1

⎪
⎪

x_k

⎪
⎪

^p =

∑

k=1

⎪
⎪

u_k

⎪
⎪

^p(q−1) =

∑

k=1

⎪
⎪

u_k

⎪
⎪

^q,

as p(q−1) = q. Then, by the previous paragraph,

φ(x) =

∞

∑

k=1

x_k u_k =

∑

k=1

⎪
⎪

u_k

⎪
⎪

^q.

Hence

⎪⎪
⎪⎪

≥

⎪
⎪

φ(x)

⎪
⎪

⎪⎪
⎪⎪

⎛
⎜
⎜
⎝

∑

k=1

⎪
⎪

u_k

⎪
⎪

⎞
⎟
⎟
⎠

1−1/p

⎛
⎜
⎜
⎝

∑

k=1

⎪
⎪

u_k

⎪
⎪

⎞
⎟
⎟
⎠

1/q

By letting N→∞, it follows that u∈l_q with ||u||_q ≤ ||φ||. So φ=φ_u and ||φ|| = ||φ_u|| ≤ ||u||_q. Hence every element of (l_p)^* arises as φ_u for some u, and also ||φ_u|| = ||u||_q. □

Loosely speaking, we say that l_q = (l_p)^*, although we should always be careful to keep in mind the exact map which gives this.

Corollary 14 (Riesz–Frechet Self-duality Lemma 11) l₂ is self-dual: l₂=l₂^*.

Similarly, we can show that c₀^*=l₁ and that (l₁)^*=l_∞ (the implementing isometric isomorphism is giving by the same summation formula).

11.4 Hahn–Banach Theorem

Mathematical induction is a well known method to prove statements depending from a natural number. The mathematical induction is based on the following property of natural numbers: any subset of ℕ has the least element. This observation can be generalised to the transfinite induction described as follows.

A poset is a set X with a relation ≼ such that a≼ a for all a∈ X, if a≼ b and b≼ a then a=b, and if a≼ b and b≼ c, then a≼ c. We say that (X,≼) is total if for every a,b∈ X, either a≼ b or b≼ a. For a subset S⊆ X, an element a∈ X is an upper bound for S if s≼ a for every s∈ S. An element a∈ X is maximal if whenever b∈ X is such that a≼ b, then also b≼ a.

Then Zorn’s Lemma tells us that if X is a non-empty poset such that every total subset has an upper bound, then X has a maximal element. Really this is an axiom which we have to assume, in addition to the usual axioms of set-theory. Zorn’s Lemma is equivalent to the axiom of choice and Zermelo’s theorem.

Theorem 15 (Hahn–Banach Theorem) Let E be a normed vector space, and let F⊆ E be a subspace. Let φ∈ F^*. Then there exists ψ∈ E^* with ||ψ||≤||φ|| and ψ(x)=φ(x) for each x∈ F.

Proof. We do the real case. An “extension” of φ is a bounded linear map φ_G:G→ℝ such that F⊆ G⊆ E, φ_G(x)=φ(x) for x∈ F, and ||φ_G||≤||φ||. We introduce a partial order on the pairs (G, φ_G) of subspaces and functionals as follows: (G₁, φ_G₁)≼ (G₂, φ_G₂) if and only if G₁⊆ G₂ and φ_G₁(x)=φ_G₂(x) for all x∈ G₁. A Zorn’s Lemma argument shows that a maximal extension φ_G:G→ℝ exists. We shall show that if G≠E, then we can extend φ_G, a contradiction.

Let x∉G, so an extension φ₁ of φ to the linear span of G and x must have the form

φ₁(x′+ax) = φ(x) + a α (x′∈ G, a∈ℝ),

for some α∈ℝ. Under this, φ₁ is linear and extends φ, but we also need to ensure that ||φ₁||≤||φ||. That is, we need

⎪
⎪

φ(x′) + aα

⎪
⎪

≤

⎪⎪
⎪⎪

x′+ax

⎪⎪
⎪⎪

(x′∈ G, a∈ℝ). (66)

It is straightforward for a=0, otherwise to simplify proof put −a y=x′ in (66) and divide both sides of the identity by a. Thus we need to show that there exist such α that

⎪
⎪

α−φ(y)

⎪
⎪

≤

⎪⎪
⎪⎪

x−y

⎪⎪
⎪⎪

for all y∈ G, a∈ℝ,

φ(y)−

⎪⎪
⎪⎪

x−y

⎪⎪
⎪⎪

≤ α ≤ φ(y)+

⎪⎪
⎪⎪

x−y

⎪⎪
⎪⎪

For any y₁ and y₂ in G we have:

φ(y₁)−φ(y₂)≤

⎪⎪
⎪⎪

y₁−y₂

⎪⎪
⎪⎪

≤

⎪⎪
⎪⎪

(

⎪⎪
⎪⎪

x−y₂

⎪⎪
⎪⎪

x−y₁

⎪⎪
⎪⎪

Thus

φ(y₁)−

⎪⎪
⎪⎪

x−y₁

⎪⎪
⎪⎪

≤ φ(y₂)+

⎪⎪
⎪⎪

x−y₂

⎪⎪
⎪⎪

As y₁ and y₂ were arbitrary,

sup

y∈ G

(φ(y) −

⎪⎪
⎪⎪

y+x

⎪⎪
⎪⎪

) ≤

inf

y∈ G

(φ(y) +

⎪⎪
⎪⎪

y+x

⎪⎪
⎪⎪

Hence we can choose α between the inf and the sup.

The complex case follows by “complexification”. □

The Hahn-Banach theorem tells us that a functional from a subspace can be extended to the whole space without increasing the norm. In particular, extending a functional on a one-dimensional subspace yields the following.

Corollary 16 Let E be a normed vector space, and let x∈ E. Then there exists φ∈ E^* with ||φ||=1 and φ(x)=||x||.

Another useful result which can be proved by Hahn-Banach is the following.

Corollary 17 Let E be a normed vector space, and let F be a subspace of E. For x∈ E, the following are equivalent:

x∈ F the closure of F;
for each φ∈ E^* with φ(y)=0 for each y∈ F, we have that φ(x)=0.

Proof. 17(1)⇒17(2) follows because we can find a sequence (y_n) in F with y_n→ x; then it’s immediate that φ(x)=0, because φ is continuous. Conversely, we show that if 17(1) doesn’t hold then 17(2) doesn’t hold (that is, the contrapositive to 17(2)⇒17(1)).

So, x∉F. Define ψ:{F,x}→K by

ψ(y+tx) = t (y∈ F, t∈K).

This is well-defined, for y, y′∈ F if y+tx=y′+t′x then either t=t′, or otherwise x = (t−t′)⁻¹(y′−y) ∈ F which is a contradiction. The map ψ is obviously linear, so we need to show that it is bounded. Towards a contradiction, suppose that ψ is not bounded, so we can find a sequence (y_n+t_nx) with ||y_n+t_nx||≤1 for each n, and yet | ψ(y_n+t_nx) |=| t_n |→∞. Then || t_n⁻¹ y_n + x || ≤ 1/| t_n | → 0, so that the sequence (−t_n⁻¹y_n), which is in F, converges to x. So x is in the closure of F, a contradiction. So ψ is bounded. By Hahn-Banach theorem, we can find some φ∈ E^* extending ψ. For y∈ F, we have φ(y)=ψ(y)=0, while φ(x)=ψ(x)=1, so 17(2) doesn’t hold, as required. □

We define E^** = (E^*)^* to be the bidual of E, and define J:E→ E^** as follows. For x∈ E, J(x) should be in E^**, that is, a map E^*→K. We define this to be the map φ↦φ(x) for φ∈ E^*. We write this as

J(x)(φ) = φ(x) (x∈ E, φ∈ E^*).

The Corollary 16 shows that J is an isometry; when J is surjective (that is, when J is an isomorphism), we say that E is reflexive. For example, l_p is reflexive for 1<p<∞. On the other hand c₀ is not reflexive.

11.5 C(X) Spaces

This section is not examinable. Standard facts about topology will be used in later sections of the course.

All our topological spaces are assumed Hausdorff. Let X be a compact space, and let C_K(X) be the space of continuous functions from X to K, with pointwise operations, so that C_K(X) is a vector space. We norm C_K(X) by setting

⎪⎪
⎪⎪

_∞ =

sup

x∈ X

⎪
⎪

f(x)

⎪
⎪

(f∈ C_K(X)).

Theorem 18 Let X be a compact space. Then C_K(X) is a Banach space.

Let E be a vector space, and let ||·||₍₁₎ and ||·||₍₂₎ be norms on E. These norms are equivalent if there exists m>0 with

m⁻¹

⎪⎪
⎪⎪

₍₂₎ ≤

⎪⎪
⎪⎪

₍₁₎ ≤ m

⎪⎪
⎪⎪

₍₂₎ (x∈ E).

Theorem 19 Let E be a finite-dimensional vector space with basis {e₁,…,e_n}, so we can identify E with Kⁿ as vector spaces, and hence talk about the norm ||·||₂ on E. If ||·|| is any norm on E, then ||·|| and ||·||₂ are equivalent.

Corollary 20 Let E be a finite-dimensional normed space. Then a subset X⊆ E is compact if and only if it is closed and bounded.

Lemma 21 Let E be a normed vector space, and let F be a closed subspace of E with E≠F. For 0<θ<1, we can find x₀∈ E with ||x₀||≤1 and ||x₀−y||>θ for y∈ F.

Theorem 22 Let E be an infinite-dimensional normed vector space. Then the closed unit ball of E, the set {x∈ E : ||x||≤ 1}, is not compact.

Proof. Use the above lemma to construct a sequence (x_n) in the closed unit ball of E with, say, ||x_n−x_m||≥1/2 for each n≠m. Then (x_n) can have no convergent subsequence, and so the closed unit ball cannot be compact. □

12 Measure Theory

The presentation in this section is close to [].

12.1 Basic Measure Theory

The following object will be the cornerstone of our construction.

Definition 1 Let X be a set. A σ-algebra R on X is a collection of subsets of X, written R⊆ 2^X, such that

X∈R;
if A,B∈R, then A∖ B∈R;
if (A_n) is any sequence in R, then ∪_n A_n∈R.

Note, that in the third condition we admit any countable unions. The usage of “σ” in the names of σ -algebra and σ-ring is a reference to this. If we replace the condition by

if (A_n)₁^m is any finite family in R, then ∪_n=1^m A_n∈R;

then we obtain definitions of an algebra.

For a σ-algebra R and A,B∈R, we have

A ⋂ B = X∖

⎛
⎝

X∖(A⋂ B)

⎞
⎠

= X ∖

⎛
⎝

(X∖ A)⋃(X∖ B)

⎞
⎠

∈R.

Similarly, R is closed under taking (countably) infinite intersections.

If we drop the first condition from the definition of (σ-)algebra (but keep the above conclusion from it!) we got a (σ-)ring, that is a (σ-)ring is closed under (countable) unions, (countable) intersections and subtractions of sets.

Exercise 2

Use the above comments to write in full the three missing definitions: of set algebra, set ring and set σ-ring.
Show that the empty set belongs to any non-empty ring.

Sets A_k are pairwise disjoint if A_n∩ A_m=∅ for n≠m. We denote the union of pairwise disjoint sets by ⊔, e.g. A ⊔ B ⊔ C.

It is easy to work with a vector space through its basis. For a ring of sets the following notion works as a helpful “basis”.

Definition 3 A semiring S of sets is a collection such that

it is closed under intersection;
for A, B∈ S we have A∖ B=C₁⊔ … ⊔ C_N with C_k∈ S.

Again, any non-empty semiring contain the empty set.

Example 4 The following are semirings but not rings:

The collection of intervals [a,b) on the real line;
The collection of all rectangles { a≤ x < b, c≤ y <d } on the plane.

As the intersection of a family of σ-algebras is again a σ-algebra, and the power set 2^X is a σ-algebra, it follows that given any collection D⊆ 2^X, there is a σ-algebra R such that D⊆R, such that if S is any other σ-algebra, with D⊆S, then R⊆S. We call R the σ-algebra generated by D.

Exercise 5 Let S be a semiring. Show that

The collection of all finite disjoint unions ⊔_k=1ⁿ A_k, where A_k∈ S, is a ring. We call it the ring R(S) generated by the semiring S.
Any ring containing S contains R(S) as well.
The collection of all finite (not necessarily disjoint!) unions ∪_k=1ⁿ A_k, where A_k∈ S, coincides with R(S).

We introduce the symbols +∞, −∞, and treat these as being “extended real numbers”, so −∞ < t < ∞ for t∈ℝ. We define t+∞ = ∞, t∞ = ∞ if t>0 and so forth. We do not (and cannot, in a consistent manner) define ∞ − ∞ or 0·∞.

Definition 6 A measure is a map µ:R→[0,∞] defined on a (semi-)ring (or σ-algebra) R, such that if A=⊔_n A_n for A∈R and a finite subset (A_n) of R, then µ (A) = ∑_n µ(A_n). This property is called additivity of a measure.

The additivity property of a measure is rather demanding. For example, let us consider the decomposition [0,1)=[0,1/2) ⊔ [1/2,1) = [0,1/3) ⊔ [1/3,2/3) ⊔ [2/3,1), then additivity puts measures of those five intervals into equations:

µ([0,

)) + µ( [

,1) ) = µ([0.1)) = µ([0,

)) + µ([

,1)).

Similar equations appear from any other (out of infinitely many) decomposition of [0,1), thus measures of various intervals are highly interconnected and very far from being arbitrary.

Exercise 7 Show that the following two conditions are equivalent:

µ(∅)=0.
There is a set A∈R such that µ(A)<∞.

The first condition often (but not always) is included in the definition of a measure.

In analysis we are interested in infinities and limits, thus the following extension of additivity is very important.

Definition 8 In terms of the previous definition we say that µ is countably additive (or σ-additive) if for any countable infinite family (A_n) of pairwise disjoint sets from R such that A=⊔_n A_n∈R we have µ(A) = ∑_n µ(A_n). If the sum diverges, then as it will be the sum of positive numbers, we can, without problem, define it to be +∞.

Note, that this property may be stated as a sort of continuity of an additive measure, cf. (7):

⎛
⎜
⎜
⎝

lim

n→∞

⊔

k=1

A_k

⎞
⎟
⎟
⎠

lim

n→∞

⎛
⎜
⎜
⎝

⊔

k=1

A_k

⎞
⎟
⎟
⎠

Example 9

Fix a point a∈ℝ and define a measure µ by the condition µ(A)=1 if a∈ A and µ(A)=0 otherwise.
For the ring obtained in Exercise 5 from semiring S in Example 4(1) define µ([a,b))=b−a on S. This is a measure, and we will show its σ-additivity.
For ring obtained in Exercise 5 from the semiring in Example 4(2), define µ(V)=(b−a)(d−c) for the rectangle V={ a≤ x < b, c≤ y <d } S. It will be again a σ-additive measure.
Let X=ℕ and R=2^ℕ, we define µ(A)=0 if A is a finite subset of X=ℕ and µ(A)=+∞ otherwise. Let A_n={n}, then µ(A_n)=0 and µ(⊔_n A_n)=µ(ℕ)=+∞≠ ∑_n µ(A_n)=0. Thus, this measure is not σ-additive.

We will see further examples of measures which are not σ-additive in Section 12.4.

Definition 10 A measure µ is finite if µ(A)<∞ for all A∈ R.

A measure µ is σ-finite if X is a union of countable number of sets X_k, such that for any A∈ R and any k∈ ℕ the intersection A∩ X_k is in R and µ(A∩ X_k)<∞.

Exercise 11 Modify the example 9(1) to obtain

a measure which is not finite, but is σ-finite. (Hint: let the measure count the number of integer points in a set).
a measure which is not σ-finite. (Hint: assign µ(A)=+∞ if a∈ A.)

Proposition 12 Let µ be a σ-additive measure on a σ-algebra R. Then:

If A,B∈R with A⊆ B, then µ(A)≤µ(B) [we call this property “monotonicity of a measure”];
If A,B∈R with A⊆ B and µ(B)<∞, then µ(B∖ A) = µ(B) − µ(A);
If (A_n) is a sequence in R, with A₁ ⊆ A₂ ⊆ A₃ ⊆⋯. Then

lim

n→∞

µ(A_n) = µ ⎛
⎝ ⋃_n A_n ⎞
⎠ .
If (A_n) is a sequence in R, with A₁ ⊇ A₂ ⊇ A₃ ⊇⋯. If µ(A_m)<∞ for some m, then

lim

n→∞

µ(A_n) = µ ⎛
⎝ ⋂_n A_n ⎞
⎠ . (67)

Proof. The two first properties are easy to see. For the third statement, define A=∪_n A_n, B₁=A₁ and B_n=A_n∖ A_n−1, n>1. Then A_n=⊔_k=1ⁿ B_n and A=⊔_k=1^∞B_n. Using the σ-additivity of measures µ(A)=∑_k=1^∞µ(B_k) and µ(A_n)=∑_k=1ⁿ µ(B_k). From the theorem in real analysis that any monotonic sequence of real numbers converges (recall that we admit +∞ as limits’ value) we have µ(A)=∑_k=1^∞µ(B_k)=lim_{n→ ∞} ∑_k=1ⁿ µ(B_k) = lim_{n→ ∞} µ(A_n). The last statement can be shown similarly. □

Exercise 13 Let a measure µ on ℕ be defined by µ(A)=0 for finite A and µ(A) = ∞ for infinite A. Check that µ is additive but not σ-additive. Therefore give an example that µ does not satisfies 12(3).

12.2 Extension of Measures

From now on we consider only finite measures, an extension to σ-finite measures will be done later.

Proposition 14 Any measure µ′ on a semiring S is uniquely extended to a measure µ on the generated ring R(S), see Ex. 5. If the initial measure was σ-additive, then the extension is σ-additive as well.

Proof. If an extension exists it shall satisfy µ(A)=∑_k=1ⁿ µ′(A_k), where A_k∈ S. We need to show for this definition two elements:

Consistency, i.e. independence of the value from a presentation of A∈ R(S) as A=⊔_k=1ⁿ A_k, where A_k∈ S. For two different presentation A=⊔_j=1ⁿ A_j and A=⊔_k=1^m B_k define C_jk=A_j∩ B_k, which will be pair-wise disjoint. By the additivity of µ′ we have µ′(A_j)=∑_kµ′(C_jk) and µ′(B_k)=∑_jµ′(C_jk). Then

∑

j

µ′(A_j)=

∑

j

∑

k

µ′(C_jk) =

∑

k

∑

j

µ′(C_jk)=

∑

k

µ′(B_k).
Additivity. For A=⊔_k=1ⁿ A_k, where A_k∈ R(S) we can present A_k=⊔_j=1^n(k) C_jk, C_jk∈ S. Thus A=⊔_k=1ⁿ ⊔_j=1^n(k) C_jk and:
µ(A)=
n

∑

k=1

n(k)

∑

j=1

µ′(C_jk)=
n

∑

k=1

µ(A_k).

Finally, show the σ-additivity. For a set A=⊔_k=1^∞A_k, where A and A_k∈ R(S), find presentations A=⊔_j=1ⁿ B_j, B_j∈ S and A_k=⊔_l=1^m(k) B_lk, B_lk∈ S. Define C_jlk=B_j ∩ B_lk∈ S, then B_j=⊔_k=1^∞⊔_l=1^m(k) C_jlk and A_k= ⊔_j=1ⁿ ⊔_l=1^m(k) C_jlk. Then, from σ-additivity of µ′:

µ(A)

∑

j=1

µ′(B_j)=

∑

j=1

∞

∑

k=1

m(k)

∑

l=1

µ′(C_jlk)=

∞

∑

k=1

∑

j=1

m(k)

∑

l=1

µ′(C_jlk) =

∞

∑

k=1

µ(A_k),

where we changed the summation order in series with non-negative terms. □

As the next step one may want to extend a measure from a semiring to corresponding σ-ring, however it can be done even for a larger family. The procedure recall the famous story on Baron Munchausen saves himself from being drowned in a swamp by pulling on his own hair. Indeed, initially we knew measure for elements of semiring S or their finite disjoint unions from R(S). For an arbitrary set A we may assign a measure from an element of R(S) which “approximates” A. But how to measure such approximation? Well, to this end we use the measure on R(S) again (pulling on his own hair)!

The following extension of a measure by continuity requires the measure to be continuous, thus for the rest of this subsection we assume that µ is σ-additive. To start with we introduce the following notion.

Definition 15 Let S be a semi-ring of subsets in X, and µ be a σ-additive measure defined on S. An outer measure µ^* on X is a map µ^*:2^X→[0,∞] defined by:

µ^*(A)=inf

⎧
⎪
⎨
⎪
⎩

∑

µ(A_k), such that A⊆ ⋃_k A_k, A_k∈ S

⎫
⎪
⎬
⎪
⎭

Proposition 16 An outer measure has the following properties:

µ^*(∅)=0;
µ^*(A)=µ(A) for any A∈ S.
if A⊆ B then µ^*(A)≤µ^*(B), this is called monotonicity of the outer measure;
if (A_n) is any sequence in 2^X, then µ^*(∪_n A_n) ≤ ∑_n µ^*(A_n).

Proof. Demonstration is not difficult. For example for 16(2): the inequality µ^*(A) ≤ µ(A) is trivial since A∈ S is its own covering by S. To show the opposite inequality µ^*(A) ≥ µ(A) assume towards a contradiction µ^*(A) < µ(A), that would mean there exists a covering ∪_k=1^∞A_k ⊃ A such that ∑_k=1^∞µ(A_k) < µ(A) . Let B₁ = A₁ ∩ A and B_k = (A_k ∩ A) ∖ ⊔_j=1^k−1 B_j for k>1. Then, B_k are pairwise disjoint elements of R(S) and ⊔_k=1^∞B_k = A. But ∑_k=1^∞µ(B_k) ≤ ∑_k=1^∞µ(A_k) < µ(A) , which contradicts the σ-additivity of µ on R(S) established in Prop. 14. □

The final condition 16(4) says that an outer measure is countably sub-additive. Note, that an outer measure may be not a measure in the sense of Defn. 6 due to a lack of additivity.

Example 17 The Lebesgue outer measure on ℝ is defined out of the measure from Example 9(2), that is, for A⊆ℝ, as

µ^*(A) = inf

⎧
⎪
⎨
⎪
⎩

∞

∑

j=1

(b_j−a_j) : A⊆ ⋃_j=1^∞[a_j,b_j)

⎫
⎪
⎬
⎪
⎭

We make this definition, as intuitively, the “length”, or measure, of the interval [a,b) is (b−a).

For example, for outer Lebesgue measure we have µ^*(A)=0 for any countable set, which follows, as clearly µ^*({x})=0 for any x∈ℝ.

Lemma 18 Let a<b. Then µ^*([a,b])=b−a.

Proof. For є>0, as [a,b] ⊆ [a,b+є), we have that µ^*([a,b])≤ (b−a)+є. As є>0, was arbitrary, µ^*([a,b]) ≤ b−a.

To show the opposite inequality we observe that [a,b)⊂[a,b] and µ^*[a,b) =µ[a,b) =b−a by Prop. 16(2) so µ^*[a,b]≥ b−a by 16(3).

□

Our next aim is to construct measures from outer measures. We use the notation A▵ B=(A∪ B)∖ (A∩ B) for symmetric difference of sets.

Definition 19 Given an outer measure µ^* defined by a σ-additive measure µ on a semiring S, we define A⊆ X to be Lebesgue measurable if for any ε >0 there is a finite union B of elements in S (in other words: B∈R(S) by Lem. 5(3)), such that µ^*(A▵ B)<ε .

Figure 18: Approximating area by refined simple sets arrangements.

See Fig. 18 for an illustration of the concept of measurable sets. Obviously all elements of S and R(S) are measurable.

Exercise 20

Define a function of pairs of Lebesgue measurable sets A and B as the outer measure of the symmetric difference of A and B:
d(A,B)=µ^*(A▵ B). (68)

Show that d is a metric on the collection of equivalence classes with respect to the equivalence relation: A∼ B if d(A,B)=0. Hint: to show the triangle inequality use the inclusion:
A▵ B ⊆ (A▵ C) ⋃ (C▵ B)
Let a sequence (ε_n)→ 0 be monotonically decreasing. For a Lebesgue measurable A there exists a sequence (A_n)⊂ R(S) such that d(A,A_n)< ε_n for each n. Show that (A_n) is a Cauchy sequence for the distance d (68).

An alternative definition of a measurable set is due to Carathéodory.

Definition 21 Given an outer measure µ^*, we define E⊆ X to be Carathéodory measurable if

µ^*(A) = µ^*(A⋂ E) + µ^*(A∖ E),

for any A⊆ X.

As µ^* is sub-additive, this is equivalent to

µ^*(A) ≥ µ^*(A⋂ E) + µ^*(A∖ E) (A⊆ X),

as the other inequality is automatic.

Exercise^* 22

Show that for a Lebesgue measurable set A and any ε>0 there exist two elements B₁ and B₂ of the ring R(S) such that B₁⊂ A ⊂ B₂ and µ(B₂∖ B₁) < ε, cf. areas shadowed in darker and lighter colours on Fig. 18.
Hint: For a set B∈ R(S) such that µ^*(A▵ B)<ε/2 from Defn. 19 shall exists C∈ R(S) such that C ⊃ A▵ B and µ(C) < µ^*(A▵ B)+ε/2. Put B₁ = B ∖ C and B₂ = B ∪ C.
Let µ(X)<∞ show that A is Lebesgue measurable if and only if µ(X) = µ^*(A)+µ^*(X∖ A).
Show that measurability by Lebesgue and Carathéodory are equivalent.

Suppose now that the ring R(S) is an algebra (i.e., contains the maximal element X). Then, the outer measure of any set is finite, and the following theorem holds:

Theorem 23 (Lebesgue) Let µ^* be an outer measure on X defined by a σ-additive measure µ on a semiring S, and let L be the collection of all Lebesgue measurable sets for µ^*. Then L is a σ-algebra, and if µ′ is the restriction of µ^* to L, then µ′ is a measure. Furthermore, µ′ is σ-additive on L.

Proof.[Sketch of proof] Clearly, R(S)⊂ L. Now we show that µ^*(A)=µ(A) for a set A∈ R(S). If A⊂ ∪_k A_k for A_k ∈ S, then µ(A)≤ ∑_k µ(A_k), taking the infimum we get µ(A)≤µ^*(A). For the opposite inequality, any A∈ R(S) has a disjoint representation A=⊔_k A_k, A_k∈ S, thus µ^*(A)≤ ∑_k µ^*(A_k)=∑_k µ(A_k)=µ(A) by Prop. 16(4) and 16(2).

Now we will show that R(S) with the distance d (68) is a (possibly incomplete) metric space, with the measure µ being uniformly continuous functions. Measurable sets make the completion of R(S) (cf. Ex. 20(2)) with µ being continuation of µ^* to the completion by continuity, cf. Ex. 62.

Then, by the definition, Lebesgue measurable sets make the closure of R(S) with respect to this distance.

We can check that measurable sets form an algebra. To this end we need to make estimations, say, of µ^*((A₁∩ A₂)▵ (B₁∩ B₂)) in terms of µ^*(A_i▵ B_i). A demonstration for any finite number of sets is performed through mathematical inductions. The above two-sets case provide both: the base and the step of the induction.

Now, we show that L is σ-algebra. Let A_k∈ L and A=∪_k A_k. Then for any ε>0 there exists B_k∈ R(S), such that µ^*(A_k▵ B_k)<ε/2^k. Define B=∪_k B_k. Then

⎛
⎝

⋃_k A_k

⎞
⎠

▵

⎛
⎝

⋃_k B_k

⎞
⎠

⊂ ⋃_k

⎛
⎝

A_k ▵ B_k

⎞
⎠

implies µ^*(A▵ B)<ε.

We cannot stop at this point since B=∪_k B_k may be not in R(S). Thus, define B′₁=B₁ and B′_k=B_k∖ ∪_i=1^k−1 B_i, so B′_k are pair-wise disjoint. Then B=⊔_k B′_k and B′_k∈R(S). From the convergence of the series there is N such that ∑_k=N^∞µ(B′_k)<ε . Let B′=∪_k=1^N B′_k, which is in R(S). Then µ^*(B▵ B′)≤ ε and, thus, µ^*(A▵ B′)≤ 2ε.

To check that µ^* is measure on L we use the following

Lemma 24 | µ^*(A)−µ^*(B) |≤ µ^*(A▵ B), that is µ^* is uniformly continuous in the metric d(A,B) (68).

Proof.[Proof of the Lemma] Use inclusions A⊂ B∪(A▵ B) and B⊂ A∪(A▵ B). □

To show additivity take A_1,2∈L , A=A₁⊔ A₂, B_1,2∈R(S) and µ^*(A_i▵ B_i)<ε. Then µ^*(A▵(B₁∪ B₂))<2ε and | µ^*(A) − µ^*(B₁∪ B₂) |<2ε. Thus µ^*(B₁∪ B₂)=µ(B₁∪ B₂)=µ (B₁) +µ (B₂)−µ (B₁∩ B₂), but µ (B₁∩ B₂)=d(B₁∩ B₂,∅)=d(B₁∩ B₂,A₁∩ A₂)<2ε. Therefore

⎪
⎪

µ^*(B₁⋃ B₂)−µ (B₁) −µ (B₂)

⎪
⎪

<2ε.

Combining everything together we get (this is a sort of ε/3-argument):

⎪
⎪

µ^*(A)−µ^*(A₁)−µ^*(A₂)

⎪
⎪

⎪ ⎪	µ^(A)−µ^(B₁⋃ B₂) +µ^*(B₁⋃ B₂) −(µ (B₁) +µ (B₂))

+µ (B₁) +µ (B₂)−µ^*(A₁)−µ^*(A₂)

⎪
⎪

≤

⎪
⎪

µ^*(A)−µ^*(B₁⋃ B₂)

⎪
⎪

µ^*(B₁⋃ B₂)−(µ (B₁) +µ (B₂))

⎪
⎪

µ (B₁) +µ (B₂)−µ^*(A₁)−µ^*(A₂)

⎪
⎪

≤

6ε.

Thus µ^* is additive on L.

Check the countable additivity for A=⊔_k A_k. The inequality µ^*(A)≤ ∑_kµ^*(A_k) follows from countable sub-additivity. The opposite inequality is the limiting case of the finite inequality µ^*(A)≥ µ^*(⊔_k=1^N A_k)=∑_k=1^Nµ^*(A_k) following from monotonicity and additivity of µ^*. □

Corollary 25 Let E⊆ℝ be open or closed. Then E is Lebesgue measurable.

Proof. As σ-algebras are closed under taking complements, we need only show that open sets are Lebesgue measurable. For the latter we will use a common trick, using the density and the countability of the rationals.

Intervals (a,b) are Lebesgue measurable because they are countable unions of measurable half-open intervals from the semiring, e.g.:

(0,1) =

∞

⊔

k=1

⎡
⎢
⎢
⎣

k+1

⎞
⎟
⎟
⎠

Now let U⊆ℝ be open. For each x∈ U, there exists a_x<b_x with x∈(a_x,b_x)⊆ U. By making a_x slightly larger, and b_x slightly smaller, we can ensure that a_x,b_x∈ℚ. Thus U = ∪_x (a_x, b_x). Each interval is measurable, and there are at most a countable number of them (endpoints make a countable set) thus U is the countable (or finite) union of Lebesgue measurable sets, and hence U is Lebesgue measurable itself. □

We perform now an extension of finite measure to σ-finite one . Let µ be a σ-additive and σ-finite measure defined on a semiring in X=⊔_k X_k, such that the restriction of µ to every X_k is finite. Consider the Lebesgue extension µ_k of µ defined within X_k. A set A⊂ X is measurable if every intersection A∩ X_k is µ_k measurable. For a such measurable set A we define its measure by the identity:

µ(A)=

∑

µ_k(A⋂ X_k).

We call a measure µ defined on L complete if whenever E⊆ X is such that there exists F∈L with µ(F)=0 and E⊆ F, we have that E∈L. Measures constructed from outer measures by the above theorem are always complete. On the example sheet, we saw how to form a complete measure from a given measure. We call sets like E null sets: complete measures are useful, because it is helpful to be able to say that null sets are in our σ-algebra. Null sets can be quite complicated. For the Lebesgue measure, all countable subsets of ℝ are null, but then so is the Cantor set, which is uncountable.

Definition 26 If we have a property P(x) which is true except possibly x∈ A and µ(A)=0, we say P(x) is almost everywhere or a.e..

12.3 Complex-Valued Measures and Charges

We start from the following observation.

Exercise 27 Let µ₁ and µ₂ be measures on a same σ-algebra. Define µ₁+µ₂ and λµ₁, λ>0 by (µ₁+µ₂)(A)=µ₁(A)+µ₂(A) and (λµ₁)(A)=λ(µ₁(A)). Then µ₁+µ₂ and λµ₁ are measures on the same σ-algebra as well.

In view of this, it will be helpful to extend the notion of a measure to obtain a linear space.

Definition 28 Let X be a set, and R be a σ-ring. A real- (complex-) valued function ν on R is called a charge (or signed measure ) if it is countably additive as follows: for any A_k∈R the identity A=⊔_k A_k implies the series ∑_k ν(A_k) is absolute convergent and has the sum ν(A).

In the following “charge” means “real charge”.

Example 29 Any linear combination of σ-additive measures on ℝ with real (complex) coefficients is real (complex) charge.

The opposite statement is also true:

Theorem 30 Any real (complex) charge ν has a representation ν=µ₁−µ₂ (ν=µ₁−µ₂+iµ₃−iµ₄), where µ_k are σ-additive measures.

To prove the theorem we need the following definition.

Definition 31 The variation of a charge on a set A is | ν |(A)=sup ∑_k| ν(A_k) | for all disjoint splitting A=⊔_k A_k.

Example 32 If ν=µ₁−µ₂, then | ν |(A)≤ µ₁(A)+µ₂(A). The inequality becomes an identity for disjunctive measures on A (that is there is a partition A=A₁⊔ A₂ such that µ₂(A₁)=µ₁(A₂)=0).

The relation of variation to charge is as follows:

Theorem 33 For any charge ν the function | ν | is a σ-additive measure.

Finally to prove the Thm. 30 we use the following

Proposition 34 For any charge ν the function | ν |−ν is a σ-additive measure as well.

From the Thm. 30 we can deduce

Corollary 35 The collection of all charges on a σ-algebra R is a linear space which is complete with respect to the distance:

d(ν₁,ν₂)=

sup

A∈R

⎪
⎪

ν₁(A)−ν₂(A)

⎪
⎪

The following result is also important:

Theorem 36 (Hahn Decomposition) Let ν be a charge. There exist A,B∈L, called a Hahn decomposition of (X,ν), with A∩ B=∅, A∪ B= X and such that for any E∈L,

ν (A⋂ E) ≥ 0, ν(B⋂ E)≤ 0.

This need not be unique.

Proof.[Sketch of proof] We only sketch this. We say that A∈L is positive if

ν(E⋂ A)≥0 (E∈L),

and similiarly define what it means for a measurable set to be negative. Suppose that ν never takes the value −∞ (the other case follows by considering the charge −ν).

Let β = infν(B₀) where we take the infimum over all negative sets B₀. If β=−∞ then for each n, we can find a negative B_n with ν(B_n)≤ −n. But then B=∪_n B_n would be negative with ν(B)≤ −n for any n, so that ν(B)=−∞ a contradiction.

So β>−∞ and so for each n we can find a negative B_n ν(B_n) < β+1/n. Then we can show that B = ∪_n B_n is negative, and argue that ν(B) ≤ β. As B is negative, actually ν(B) = β.

There then follows a very tedious argument, by contradiction, to show that A=X∖ B is a positive set. Then (A,B) is the required decomposition. □

12.4 Constructing Measures, Products

Consider the semiring S of intervals [a,b). There is a simple description of all measures on it. For a measure µ define

F_µ(t)=

⎧
⎪
⎨
⎪
⎩

µ([0,t))	if t>0,
0	if t=0,
−µ([t,0))	if t<0,

(69)

F_µ is monotonic and any monotonic function F defines a measure µ on S by the by µ([a,b))=F(b)−F(a). The correspondence is one-to-one with the additional assumption F(0)=0.

Theorem 37 The above measure µ is σ-additive on S if and only if F is continuous from the left: F(t−0)=F(t) for all t∈ℝ.

Proof. Necessity: F(t)−F(t−0)=lim_ε→
0µ([t−ε,t))=µ(lim_ε→
0[t−ε,t))=µ(∅)=0, by the continuity of a σ-additive measure, see 12(4).

For sufficiency assume [a,b)=⊔_k [a_k,b_k). The inequality µ([a,b))≥ ∑_k µ([a_k,b_k)) follows from additivity and monotonicity. For the opposite inequality take δ_k s.t. F(b)−F(b−δ)<ε and F(a_k)−F(a_k−δ_k)<ε/2^k (use left continuity of F). Then the interval [a,b−δ] is covered by (a_k−δ_k,b_k), due to compactness of [a,b−δ] there is a finite subcovering. Thus µ([a,b−δ ))≤∑_j=1^N µ([a_{k_j}−δ_{k_j},b_{k_j})) and µ([a,b))≤∑_j=1^N µ([a_{k_j},b_{k_j}))+2ε . □

Exercise 38

Give an example of function discontinued from the left at 1 and show that the resulting measure is additive but not σ-additive.
Check that, if a function F is continuous at point a then µ({a})=0.

Example 39

Take F(t)=t, then the corresponding measure is the Lebesgue measure on ℝ.
Take F(t) be the integer part of t, then µ counts the number of integer within the set.
Define the Cantor function as follows α(x)=1/2 on (1/3,2/3); α(x)=1/4 on (1/9,2/9); α(x)=3/4 on (7/9,8/9), and so for. This function is monotonic and can be continued to [0,1] by continuity, it is know as Cantor ladder. The resulting measure has the following properties:
- The measure of the entire interval is 1.
- Measure of every point is zero.
- The measure of the Cantor set is 1, while its Lebesgue measure is 0.

Another possibility to build measures is their product. In particular, it allows to expand various measures defined through (69) on the real line to ℝⁿ.

Definition 40 Let X and Y be spaces, and let S and T be semirings on X and Y respectively. Then S× T is the semiring consisting of { A× B : A∈ S, B∈ T } (“generalised rectangles”). Let µ and ν be measures on S and T respectively. Define the product measure µ×ν on S× T by the rule (µ× ν)(A× B)=µ(A) ν(B).

Example 41 The measure from Example 9(3) on the semiring of half-open rectangles is the product of two copies of pre-Lebesgue measures from Example 9(2) on the semiring of half-open intervals.

13 Integration

We now come to the main use of measure theory: to define a general theory of integration.

13.1 Measurable functions

From now on, by a measure space we shall mean a triple (X,L,µ), where X is a set, L is a σ-algebra on X, and µ is a σ-additive measure defined on L. We say that the members of L are measurable, or L-measurable, if necessary to avoid confusion.

Definition 1 A function f:X→ℝ is measurable if

E_c(f)={x∈ X: f(x)<c} that is E_c(f)=f⁻¹((−∞,c))

is in L (that is E_c(f) is a measurable set) for any c∈ℝ.

A complex-valued function is measurable if its real and imaginary parts are measurable.

Lemma 2 The following are equivalent:

A function f is measurable;
For any a<b the set f⁻¹((a,b)) is measurable;
For any open set U⊂ ℝ the set f⁻¹(U) is measurable.

Proof. To show 2(1) ⇒  2(2) we note that

f⁻¹((a,b)) = E_b(f)∖

⎛
⎜
⎜
⎝

∩

E_a+1/n(f)

⎞
⎟
⎟
⎠

For 2(2) ⇒  2(3) use that any open set U⊂ ℝ is a union of countable set of intervals (a,b), cf. proof of Cor. 25.

The final implication 2(3) ⇒  2(1) directly follows from openness of (−∞,a). □

Corollary 3 Let f: X → ℝ be measurable and g: ℝ → ℝ be continuous, then the composition g(f(x)) is measurable.

Proof. The preimage of the open set (−∞,c) under a continuous g is an open set, say U. The preimage of U under f is measurable by Lem. 2(3). Thus, the preimage of (−∞,c) under the composition g ∘ f is measurable, thereafter g ∘ f is a measurable function. □

Theorem 4 Let f,g:X→ℝ be measurable. Then af (a∈ℝ), f+g, fg, max(f,g) and min(f,g) are all measurable. That is measurable functions form an algebra and this algebra is closed under convergence a.e.

Proof. Use Cor. 3 to show measurability of λ f, | f | and f². The measurability of a sum f₁ + f₂ follows from the relation

E_c(f₁+f₂)=⋃_r∈ℚ (E_r(f₁)⋂ E_c−r(f₂)).

Next use the following identities:

f₁f₂

(f₁+f₂)²−(f₁−f₂)²

max(f₁,f₂)

(f₁+f₂)+

⎪
⎪

f₁−f₂

⎪
⎪

If (f_n) is a non-increasing sequence of measurable functions converging to f. Than E_c(f)=∪_n E_c(f_n).

Moreover any limit can be replaced by two monotonic limits:

lim

n→ ∞

f_n(x)=

lim

n→ ∞

lim

k→ ∞

max (f_n(x), f_n+1(x),…,f_n+k(x)). (70)

Finally if f₁ is measurable and f₂=f₁ almost everywhere, then f₂ is measurable as well. □

We can define several types of convergence for measurable functions.

Definition 5 We say that sequence (f_n) of functions converges

uniformly to f (notated f_n⇉ f) if

sup

x∈ X

⎪
⎪ f_n(x)−f(x) ⎪
⎪ → 0;
almost everywhere to f (notated f_n→^a.e.f) if
f_n(x)→ f(x) for all x∈ X∖ A, µ(A)=0;
in measure µ to f (notated f_n→^µf) if for all ε>0
µ({x∈ X: ⎪
⎪ f_n(x)−f(x) ⎪
⎪ >ε }) → 0. (71)

Clearly uniform convergence implies both convergences a.e and in measure.

Theorem 6 On finite measures convergence a.e. implies convergence in measure.

Proof. Define A_n(ε)={x∈ X: | f_n(x)−f(x) |≥ ε}. Let B_n(ε)=∪_{k≥ n} A_k(ε). Clearly B_n(ε)⊃ B_n+1(ε), let B(ε)=∩₁^∞B_n(ε). If x∈ B(ε) then f_n(x)↛f(x). Thus µ(B(ε))=0, but µ(B(ε))=lim_n→
∞µ(B_n(ε)), cf. (67). Since A_n(ε)⊂ B_n(ε) we see that µ(A_n(ε))→ 0 as required for (71) □

Note, that the construction of sets B_n(ε) is just another implementation of the “two monotonic limits” trick (70) for sets.

Exercise 7 Present examples of sequences (f_n) and functions f such that:

f_n→ ^µf but not f_n→ ^a.e.f.
f_n→ ^a.e.f but not f_n⇉ f.

However we can slightly “fix” either the set or the sequence to “upgrade” the convergence as shown in the following two theorems.

Theorem 8 (Egorov) If f_n→ ^a.e.f on a finite measure set X then for any σ>0 there is E_σ⊂ X with µ(E_σ)<σ and f_n⇉ f on X∖ E_σ.

Proof. We use A_n(ε) and B_n(ε) from the proof of Thm. 6. Observe that | f(x)−f_k(x) |< ε uniformly for all x ∈ X∖ B_n(ε) and k>n. For every ε>0 we seen that µ(B_n(ε))→ 0, thus for each k there is N(k) such that µ(B_N(k)(1/k))<σ/2^k. Put E_σ=∪_k B_N(k)(1/k). □

Theorem 9 If f_n→ ^µf then there is a subsequence (n_k) such that f_{n_k}→ ^a.e.f for k→ ∞.

Proof. In the notations of two previous proofs: for every natural k take n_k such that µ(A_{n_k}(1/k))< 1/2^k, which is possible since µ(A_n(ε))→ 0. Define C_m=∪_k=m^∞A_{n_k}(1/k) and C=∩ C_m. Then, µ(C_m)=1/2^m−1 and, thus, µ(C)=0 by (67). If x∉C then there is such N that x∉A_{n_k}(1/k) for all k>N. That means that | f_{n_k}(x)−f(x) |<1/k for all such k, i.e f_{n_k}(x)→ f(x). Thus, we have the point-wise convergence everywhere except the zero-measure set C. □

It is worth to note, that we can use the last two theorem subsequently and upgrade the convergence in measure to the uniform convergence of a subsequence on a subset.

Exercise 10 For your counter examples from Exercise 7, find

a subsequence f_{n_k} of the sequence from 7(1) which converges to f a.e.;
a subset such that sequence from 7(2) converges uniformly.

Exercise 11 Read about Luzin’s C-property.

13.2 Lebesgue Integral

First we define a sort of “basis” for the space of integral functions.

Definition 12 For A⊆ X, we define χ_A to be the indicator function of A, by

χ_A(x) =

⎧
⎨
⎩

1	: x∈ A,
0	: x∉A.

Then, if χ_A is measurable, then χ_A⁻¹( (1/2,3/2) ) = A ∈ L; conversely, if A∈L, then X∖ A∈L, and we see that for any U⊆ℝ open, χ_A⁻¹(U) is either ∅, A, X∖ A, or X, all of which are in L. So χ_A is measurable if and only if A∈L.

Definition 13 A measurable function f:X→ℝ is simple if it attains only a countable number of values.

Lemma 14 A function f:X→ℝ is simple if and only if

f =

∞

∑

k=1

t_k χ_{A_k} (72)

for some (t_k)_k=1^∞⊆ℝ and A_k∈L. That is, simple functions are linear combinations of indicator functions of measurable sets.

Moreover in the above representation the sets A_k can be pair-wise disjoint and all t_k≠ 0 pair-wise different. In this case the representation is unique.

In most cases there is no need to consider the above mentioned unique representation. But it is helpful and sufficient to restrict representations of simple functions to ones with pair-wise disjoint sets A_i and do not ask t_k being pair-wise different. In particular, it is easy to show that

Corollary 15 The collection of simple functions forms a vector space: this wasn’t clear from the original definition.

Proof. Closedness under multiplication by a scalar is obvious. Consider the addition. Let

f =

∞

∑

k=1

t_k χ_{A_k}, g =

∞

∑

j=1

r_j χ_{B_j}, where A_i⋂ A_j = B_i ⋂ B_j =∅ for all i≠ j.

Then, their sum is

f + g =

∞

∑

k,j=1

(t_k+ r_j) χ_{C_kj} where C_kj=A_k⋂ B_j.

Note, that C_kj are also pair-wise disjoint. □

Definition 16 A simple function in the form (72) with disjoint A_k is called summable if the following series converges:

∞

∑

k=1

⎪
⎪

t_k

⎪
⎪

µ(A_k) if f has a representation f =

∞

∑

k=1

t_k χ_{A_k} . (73)

It is another combinatorial exercise to show that this definition is independent of the way we write f.

Definition 17 We define the integral of a simple function f=∑_k t_k χ_{A_k} (72) over a measurable set A by setting

∫

f  d µ =

∞

∑

k=1

t_k µ(A_k⋂ A).

Clearly the series converges for any simple summable function f. Moreover

Lemma 18 The value of integral of a simple summable function is independent from its representation by the sum of indicators (72). In particular, we can evaluate the integral taking the canonical representation over pair-wise disjoint sets having pair-wise different values.

Proof. This is another slightly tedious combinatorial exercise. You need to prove that the integral of a simple function is well-defined, in the sense that it is independent of the way we choose to write the simple function. As usual, for two representations

f =

∞

∑

k=1

t_k χ_{A_k} =

∞

∑

j=1

r_j χ_{B_j} where A_i⋂ A_j = B_i ⋂ B_j =∅ for all i≠ j

we consider their refined representation

∞

∑

k,j=1

t_k χ_{C_kj} with C_kj=A_k⋂ B_j.

Note, that C_kj is non-empty here if and only if t_k=r_j. □

Exercise 19 Let f be the function on [0,1] which take the value 1 in all rational points and 0—everywhere else. Find the value of the Lebesgue integral ∫_[0,1] f dµ with respect to the Lebesgue measure on [0,1]. Show that the Riemann upper- and lower sums for f converges to different values, so f is not Riemann-integrable.

Remark 20 The previous exercise shows that the Lebesgue integral does not have those problems of the Riemann integral related to discontinuities. Indeed, most of function which are not Riemann-integrable are integrable in the sense of Lebesgue. The only reason, why a measurable function is not integrable by Lebesgue is divergence of the series (73). Therefore, we prefer to speak that the function is summable rather than integrable. However, those terms are used interchangeably in the mathematical literature.

We will denote by S(X) the collection of all simple summable functions on X.

Proposition 21 Let f, g:X→ ℝ be in S(X) (that is simple summable), let a, b∈ ℝ and A is a measurable set. Then:

∫_A af+bg  d µ = a∫_A f  d µ + b∫_A g  d µ, that is S(X) is a linear space;
The correspondence f→ ∫_A f d µ is a linear functional on S(X);
If f≥ 0 the correspondence A → ∫_A f d µ is a measure, for a general function f this map is a charge;
If f≤ g then ∫_A f  d µ ≤ ∫_A g  d µ, that is integral is monotonic ;
The function
d₁(f,g)= ∫

X

⎪
⎪ f−g ⎪
⎪  d µ (74)

has all properties of a metric (distance) on S(X) probably except separation, but see the next item.
For f≥ 0 we have ∫_X f  d µ=0 if and only if µ( { x∈ X : f(x)≠0 } ) = 0. Therefore for the function d₁ (74):
d₁(f,g)=0 if and only if f
a.e.

=

g.
The integral is uniformly continuous with respect the above metric d₁ (74):
⎪
⎪
⎪
⎪
⎪
⎪ ∫

A

f d µ− ∫

A

g d µ ⎪
⎪
⎪
⎪
⎪
⎪ ≤ d₁(f,g).

Proof. The proof is almost obvious, for example the Property 21(1) easily follows from Lem. 18.

We will outline 21(3) only. Let f is an indicator function of a set B, then A→ ∫_A f d µ=µ(A∩ B) is a σ-additive measure (and thus—a charge). By the Cor. 35 the same is true for finite linear combinations of indicator functions and their limits in the sense of distance d₁. □

We can identify functions which has the same values a.e. Then S(X) becomes a metric space with the distance d₁ (74). The space may be incomplete and we may wish to look for its completion. However, if we will simply try to assign a limiting point to every Cauchy sequence in S(X), then the resulting space becomes so huge that it will be impossible to realise it as a space of functions on X.

Exercise 22 Use ideas of Ex. 7(1) to present a sequence of simple functions which has the Cauchy property in metric d₁ (74) but does not have point-wise limits anywhere.

To reduce the number of Cauchy sequences in S(X) eligible to have a limit, we shall ask an additional condition. A convenient reduction to functions on X appears if we ask both the convergence in d₁ metric and the point-wise convergence on X a.e.

Definition 23 A function f is summable by a measure µ if there is a sequence (f_n)⊂S(X) such that

the sequence (f_n) is a Cauchy sequence in S(X);
f_n→^a.e. f.

Clearly, if a function is summable, then any equivalent function, that is equal a.e., is summable as well. Set of equivalent classes of summable functions will be denoted by L₁(X).

Lemma 24 If the measure µ is finite then any bounded measurable function is summable.

Proof. Define E_kn(f)={x∈ X: k/n≤ f(x)< (k+1)/n} and f_n=∑_k k/n χ_{E_kn} (note that the sum is finite due to boundedness of f).

Since | f_n(x)−f(x) |<1/n we have uniform convergence (thus convergence a.e.) and (f_n) is the Cauchy sequence: d₁(f_n,f_m)=∫_X| f_n−f_m | d µ≤ (1/n+1/m)µ(X). □

Remark 25 This Lemma can be extended to the space of essentially bounded functions L_∞(X), that is functions which are bounded a.e. In other words, L_∞(X)⊂L₁(X) for finite measures.

Another simple result, which is useful on many occasions is as follows.

Lemma 26 If the measure µ is finite and f_n⇉ f then d₁(f_n,f)→ 0.

Corollary 27 For a convergent sequence f_n→^a.e. f, which admits the uniform bound | f_n(x) |<M for all n and x, we have d₁(f_n,f)→ 0.

Proof. For any ε>0, by the Egorov’s theorem 8 we can find E, such that

µ(E)< ε/2M; and
from the uniform convergence on X∖ E there exists N such that for any n>N we have | f(x)−f_n(x) |<ε /2µ(X).

Combining this we found that for n>N, d₁(f_n,f)< M ε/2M + µ(X) ε /2µ(X) < ε . □

Exercise 28 Convergence in the metric d₁ and a.e. do not imply each other:

Give an example of f_n→^a.e. f such that d₁(f_n ,f)↛0.
Give an example of the sequence (f_n) and function f in L₁(X) such that d₁(f_n ,f)→ 0 but f_n does not converge to f a.e.

To build integral we need the following

Lemma 29 Let (f_n) and (g_n) be two Cauchy sequences in S(X) with the same limit a.e., then d₁(f_n,g_n)→ 0.

Proof. Let φ_n=f_n−g_n, then this is a Cauchy sequence with zero limit a.e. Assume the opposite to the statement: there exist δ>0 and sequence (n_k) such that ∫_x| φ_{n_k} | d µ>δ. Rescaling-renumbering we can obtain ∫_x| φ_n | d µ>1.

Take quickly convergent subsequence using the Cauchy property:

d₁(φ_{n_k},φ_{n_k+1})≤ 1/2^k+2.

Renumbering agian assume d₁(φ_k,φ_k+1)≤ 1/2^k+2.

Since φ₁ is a simple, take the canonical presentation φ₁=∑_k t_k χ_{A_k}, then ∑_k | t_k | µ(A_k)=∫_X | φ₁ | d µ≥ 1. Thus, there exists N, such that ∑_k=1^N | t_k | µ(A_k)≥ 3/4. Put A=⊔_k=1^N A_k and C=max_{1≤ k ≤
N}| t_k |=max_{x∈ A}| φ₁(x) |.

By the Egorov’s Theorem 8 there is E⊂ A such that µ(E)<1/(4C) and φ_n⇉ 0 on B=A∖ E. Then

∫

⎪
⎪

φ₁

⎪
⎪

 d µ=

∫

⎪
⎪

φ₁

⎪
⎪

 d µ−

∫

⎪
⎪

φ₁

⎪
⎪

 d µ≥

−

· C=

By the triangle inequality for d₁:

⎪
⎪
⎪
⎪
⎪
⎪

∫

⎪
⎪

φ_n

⎪
⎪

 d µ−

∫

⎪
⎪

φ_n+1

⎪
⎪

 d µ

⎪
⎪
⎪
⎪
⎪
⎪

≤ d₁(φ_n,φ_n+1)≤

2ⁿ⁺²

we get

∫

⎪
⎪

φ_n

⎪
⎪

 d µ≥

∫

⎪
⎪

φ₁

⎪
⎪

 d µ−

n−1

∑

k=1

⎪
⎪
⎪
⎪
⎪
⎪

∫

⎪
⎪

φ_n

⎪
⎪

 d µ−

∫

⎪
⎪

φ_n+1

⎪
⎪

 d µ

⎪
⎪
⎪
⎪
⎪
⎪

≥

−

n−1

∑

2^k+2

But this contradicts to the fact ∫_B | φ_n | d µ → 0, which follows from the uniform convergence φ_n⇉ 0 on B. □

It follows from the Lemma that we can use any Cauchy sequence of simple functions for the extension of integral.

Corollary 30 The functional I_A(f)=∫_A f(x) d µ(x), defined on any A∈ L on the space of simple functions S(X) can be extended by continuity to the functional on L₁(X,µ).

Definition 31 For an arbitrary summable f∈L₁(X), we define the Lebesgue integral

∫

f  d µ =

lim

n→ ∞

∫

f_n  d µ,

where the Cauchy sequence f_n of summable simple functions converges to f a.e.

Theorem 32

L₁(X) is a linear space.
For any measurable set A⊂ X the correspondence f↦ ∫_A f  d µ is a linear functional on L₁(X).
For any a non-negative f∈L₁(X) the value ν(A)=∫_A f  d µ is a measure, for a general function f the above correspondence A↦ ν(A) is charge.
d₁(f,g)=∫_A | f−g |  d µ is a distance on L₁(X).

Proof. The proof is follows from Prop. 21 and continuity of extension. □

Summing up: we build L₁(X) as a completion of S(X) with respect to the distance d₁ such that elements of L₁(X) are associated with (equivalence classes of) measurable functions on X.

13.3 Properties of the Lebesgue Integral

The space L₁ was defined from dual convergence—in d₁ metric and point-wise a.e. Can we get the continuity of the integral from the convergence almost everywhere alone? No, in general. However, we will state now some results on continuity of the integral under convergence a.e. with some additional assumptions. Finally, we show that L₁(X) is closed in d₁ metric.

Theorem 33 (Lebesgue on dominated convergence) Let (f_n) be a sequence of µ-summable functions on X, and there is φ∈L₁(X) such that | f_n(x) |≤ φ(x) for all x∈ X, n∈ℕ.

If f_n→^a.e. f, then f∈L₁(X) and for any measurable A:

lim

n→∞

∫

f_n  d µ =

∫

f  d µ.

Proof. For any measurable A the expression ν(A)=∫_A φ  d µ defines a finite measure on X due to non-negativeness of φ and Thm. 32.

Lemma 34 (Change of variable) If g is measurable and bounded then f=φ g is µ-summable and for any µ-measurable set A we have

∫

f  d µ=

∫

g  d ν. (75)

Proof.[Proof of the Lemma] Let M be the set of all g such that the Lemma is true. M includes any indicator functions g=χ_B of a measurable B:

∫

f  d µ=

∫

φχ_B  d µ =

∫

A⋂ B

φ  d µ =ν(A⋂ B)=

∫

χ_B d ν=

∫

g d ν.

Thus M contains also finite linear combinations of indicators. For any n∈ℕ and a bounded g two functions g₋(x)=1/n[ng(x)] and g₊(x)=g₋+1/n are finite linear combinations of indicators and are in M. Since g₋(x)≤ g(x)≤ g₊(x) we have

∫

g₋ d ν=

∫

φ g₋ d µ≤

∫

φ g d µ≤

∫

φ g₊ d µ=

∫

g₊ d ν.

By squeeze rule for n→ ∞ we have the middle term tenses to ∫_Ag d ν, that is g∈ M.

Note, that formula (75) is a change of variable in the Lebesgue integral of the type: ∫f(sinx) cosx d x = ∫f(sinx)  d (sinx). □

For the proof of the theorem define:

g_n(x)

⎧
⎨
⎩

f_n(x)/φ(x),	if φ(x)≠ 0,
0,	if φ(x)= 0,

g(x)

⎧
⎨
⎩

f(x)/φ(x),	if φ(x)≠ 0,
0,	if φ(x)= 0.

Then g_n is bounded by 1 and g_n→^a.e. g. To show the theorem it will be enough to show lim_{n→ ∞}∫_A g_n d ν=∫_A g d ν. For the uniformly bounded functions on the finite measure set this can be derived from the Egorov’s Thm. 8, see an example of this in the proof of Lemma 29. □

Note, that in the above proof summability of φ was used to obtain the finiteness of the measure ν, which is required for Egorov’s Thm. 8.

Exercise 35 Give an example of f_n→^a.e. f such that ∫_X f_n  d µ ≠ ∫_X f  d µ. For such an example, try to find a function φ such that | f_n | ≤ φ for all n and check either φ is summable.

Exercise 36 (Chebyshev’s inequality) Show that: if f is non-negative and summable, then

µ{x∈ X: f(x)>c} <

∫

f d µ. (76)

Theorem 37 (B. Levi’s, on monotone convergence) Let (f_n) be monotonically increasing sequence of µ-summable functions on X. Define f(x)=lim_n→∞ f_n(x) (allowing the value +∞).

If all integrals ∫_X f_n d µ are bounded by the same value C<∞ then f is summable and ∫_X f d µ=lim_n→∞∫_X f_n d µ.
If lim_n→∞∫_X f_n d µ=+∞ then function f is not summable.

Proof. Replacing f_n by f_n−f₁ and f by f−f₁ we can assume f_n≥ 0 and f≥ 0. Let E be the set where f is infinite, then

E=⋂_N⋃_n E_Nn, where E_Nn={x∈ X: f_n(x)≥ N}.

By Chebyshev’s inequality (76) we have

Nµ(E_Nn) <

∫

E_Nn

f_n d µ ≤

∫

f_n d µ≤ C,

then µ(E_Nn)≤ C/N . Thus µ(E)=lim_N→∞lim_n→∞ µ(E_Nn)=0.

Thus f is finite a.e.

Lemma 38 Let f be a measurable non-negative function attaining only finite values. f is summable if and only if sup∫_A f d µ<∞, where the supremum is taken over all finite-measure set A such that f is bounded on A.

Proof.[Proof of the Lemma] Necessity: if f is summable then for any set A⊂ X we have ∫_A f d µ≤ ∫_X f d µ<∞, thus the supremum is finite.

Sufficiency: let sup∫_A f d µ=M<∞, define B={x∈ X: f(x)=0} and A_k={x∈ X: 2^k≤ f(x)<2^k+1, k∈ℤ}, by (76) we have µ(A_k)<M/2^k and X=B⊔(⊔_{k∈ ℤ} A_k). Define

g(x)

⎧
⎨
⎩

2^k,	if x∈ A_k,
0,	if x∈ B,

f_n(x)

⎧
⎨
⎩

f(x),	if x∈ ⊔_−nⁿ A_n,
0,	otherwise.

Then g(x)≤ f(x) < 2g(x). Function g is a simple function, its summability follows from the estimate ∫_{⊔_−nⁿ A_k} g d µ≤∫_{⊔_−nⁿ A_k} f d µ≤ M which is valid for any n, taking n→ ∞ we get summability of g. Furthermore, f_n →^a.e. f and f_n(x)≤ f(x) <2g(x), so we use the Lebesgue Thm. 33 on dominated convergence to obtain the conclusion. □

Let A be a finite measure set such that f is bounded on A, then

∫

f d µ

Cor. 27

lim

n→∞

∫

f_n d µ≤

lim

n→∞

∫

f_n d µ≤ C.

This show summability of f by the previous Lemma. The rest of statement and (contrapositive to) the second part follows from the Lebesgue Thm. 33 on dominated convergence. □

Now we can extend this result dropping the monotonicity assumption.

Lemma 39 (Fatou) If a sequence (f_n) of µ-summable non-negative functions is such that:

∫_X f_n d µ≤ C for all n;
f_n →^a.e. f,

then f is µ-summable and ∫_X f d µ≤ C.

Proof.Let us replace the limit f_n→ f by two monotonic limits . Define:

g_kn(x)

min(f_n(x),…,f_n+k(x)),

g_n(x)

lim

k→ ∞

g_kn(x).

Then g_n is a non-decreasing sequence of functions and lim_{n→ ∞} g_n(x)=f(x) a.e. Since g_n≤ f_n, from monotonicity of integral we get ∫_X g_n d µ≤ C for all n. Then Levi’s Thm. 37 implies that f is summable and ∫_X f d µ≤ C. □

Remark 40 Note that the price for dropping monotonicity from Thm. 37 to Lem. 39 is that the limit ∫_X f_n d µ → ∫_X f d µ may not hold any more.

Exercise 41 Give an example such that under the Fatou’s lemma condition we get lim_n→∞∫_X f_n d µ ≠ ∫_X f d µ.

Now we can show that L₁(X) is complete:

Theorem 42 L₁(X) is a Banach space.

Proof. It is clear that the distance function d₁ indeed defines a norm ||f||₁=d₁(f,0). We only need to demonstrate the completeness. We again utilise the three-step procedure from Rem. 7.

Take a Cauchy sequence (f_n) and building a subsequence if necessary, assume that it is quickly convergent that is d₁(f_n,f_n+1)≤ 1/2^k. Put

φ₁=f₁ and φ_n=f_n−f_n−1 for n>1. Then f_n=

∑

k=1

φ_k .

The sequence ψ_n(x)=∑₁ⁿ | φ_k(x) | is monotonic, integrals ∫_X ψ_n d µ are bounded by the same constant ||f₁||₁+1. Thus, by the B. Levi’s Thm. 37 and its proof, ψ_n→ ψ for a summable essentially bounded function ψ. Therefore, the series ∑φ_k(x) converges as well to a value f(x) of a function f. But, this means that f_n →^a.e. f (the first step is completed).

We also notice | f_n(x) |≤| ψ(x) |. Thus by the Lebesgue Thm. 33 on dominated convergence f∈ L₁(X) (the second step is completed).

Furthermore,

0≤

lim

n→ ∞

∫

⎪
⎪

f_n−f

⎪
⎪

 d µ≤

lim

n→ ∞

∞

∑

k=n

⎪⎪
⎪⎪

φ_k

⎪⎪
⎪⎪

=0.

That is, f_n→ f in the norm of L₁(X). (That completes the third step and the whole proof). □

The next important property of the Lebesgue integral is its absolute continuity.

Theorem 43 (Absolute continuity of Lebesgue integral) Let f∈L₁(X). Then for any ε>0 there is a δ>0 such that | ∫_A f d µ |<ε if µ(A)<δ.

Proof. If f is essentially bounded by M, then it is enough to set δ=ε/M. In general let:

A_n

{x∈ X: n≤

⎪
⎪

f(x)

⎪
⎪

< n+1},

B_n

⊔₀ⁿ A_k,

C_n

X∖ B_n.

Then ∫_X| f | d µ=∑₀^∞∫_{A_k}| f | d µ, thus there is an N such that ∑_N^∞∫_{A_k}| f | d µ=∫_{C_N}| f | d µ<ε/2. Now put δ =ε/2N+2, then for any A⊂ X with µ(A)<δ:

⎪
⎪
⎪
⎪
⎪
⎪

∫

f d µ

⎪
⎪
⎪
⎪
⎪
⎪

≤

∫

⎪
⎪

 d µ=

∫

A⋂ B_N

⎪
⎪

 d µ+

∫

A⋂ C_N

⎪
⎪

 d µ <

=ε.

□

13.4 Integration on Product Measures

It is well-known geometrical interpretation of an integral in calculus as the “area under the graph”. If we advance from “area” to a “measure” then the Lebesgue integral can be treated as theory of measures of very special shapes created by graphs of functions. This shapes belong to the product spaces of the function domain and its range. We introduced product measures in Defn. 40, now we will study them in same details using the Lebesgue integral. We start from the following

Theorem 44 Let X and Y be spaces, and let S and T be semirings on X and Y respectively and µ and ν be measures on S and T respectively. If µ and ν are σ-additive, then the product measure ν× µ from Defn. 40 is σ-additive as well.

Proof. For any C=A× B∈ S× T let us define f_C(x)=χ_A(x)ν(B). Then

(µ×ν)(C)=µ(A)ν(B)=

∫

f_C d µ.

If the same set C has a representation C=⊔_k C_k for C_k∈ S× T, then σ-additivity of ν implies f_C=∑_k f_{C_k}. By the Lebesgue theorem 33 on dominated convergence:

∫

f_C d µ=

∑

∫

f_{C_k} d µ.

Thus

(µ×ν)(C)=

∑

(µ×ν)(C_k).

□

The above correspondence C↦ f_C can be extended to the ring R(S× T) generated by S× T by the formula:

f_C=

∑

f_{C_k}, for C=⊔_k C_k∈ R(S× T).

We have the uniform continuity of this correspondence:

⎪⎪
⎪⎪

f_C₁−f_C₂

⎪⎪
⎪⎪

₁≤ (µ×ν)(C₁▵ C₂)=d₁(C₁,C₂)

because from the representation C₁=A₁⊔ B and C₂=A₂⊔ B, where B=C₁∩ C₂ one can see that f_C₁−f_C₂=f_A₁−f_A₂, f_C₁▵
C₂=f_A₁+f_A₂ together with | f_A₁−f_A₂ |≤ f_A₁+f_A₂ for non-negative functions.

Thus the map C↦ f_C can be extended to the map of σ-algebra L(X× Y) of µ×ν-measurable set to L₁(X) by the formula f_{lim_n C_n}=lim_n f_{C_n}.

Exercise 45 Describe topologies where two limits from the last formula are taken.

The following lemma provides the geometric interpretation of the function f_C as the size of the slice of the set C along x=const.

Lemma 46 Let C∈L(X× Y). For almost every x∈ X the set C_x={y∈ Y: (x,y)∈ C} is ν-measurable and ν(C_x)=f_C(x).

Proof. For sets from the ring R(S× T) it is true by the definition. If C⁽ⁿ⁾ is a monotonic sequence of sets, then ν(lim_n C_x⁽ⁿ⁾)=lim_n ν(C_x⁽ⁿ⁾) by σ-additivity of measures. Thus the property ν(C_x)=f_x(C) is preserved by monotonic limits. The following result of the separate interest:

Lemma 47 Any measurable set can be received (up to a set of zero measure) from elementary sets by two monotonic limits .

Proof.[Proof of Lem. 47] Let C be a measurable set, put C_n∈R(S× T) to approximate C up to 2⁻ⁿ in µ×ν. Let C′=∩_n=1^∞∪_{k =1}^∞C_n+k, then

(µ× ν)

⎛
⎝

C∖ ⋃_k=1^∞C_n+k

⎞
⎠

=0 and (µ× ν)

⎛
⎝

⋃_k=1^∞C_n+k∖ C

⎞
⎠

=2¹⁻ⁿ.

Then (µ×ν)(C′▵ C)≤ 2¹⁻ⁿ for any n∈ℕ. □

Coming back to Lem. 46 we notice that (in the above notations) f_C=f_C′ almost everywhere. Then:

f_C(x)

a.e

f_C′(x)=ν(C′_x)=ν(C_x).

□

The following theorem generalizes the meaning of the integral as “area under the graph”.

Theorem 48 Let µ and ν are σ-finite measures and C be a µ×ν measurable set X× Y. We define C_x={y∈ Y: (x,y)∈ C}. Then for µ-almost every x∈ X the set C_x is ν-measurable, function f_C(x)=ν(C_x) is µ-measurable and

(µ×ν)(C)=

∫

f_C d µ, (77)

where both parts may have the value +∞.

Proof. If C has a finite measure, then the statement is reduced to Lem. 46 and a passage to limit in (77).

If C has an infinite measure, then there exists a sequence of C_n⊂ C, such that ∪_n C_n=C and (µ×ν)(C_n)→ ∞. Then f_C(x)=lim_n f_{C_n} (x) and

∫

f_{C_n} d µ=(µ×ν)(C_n)→ +∞.

Thus f_C is measurable and non-summable. □

This theorem justify the well-known technique to calculation of areas (volumes) as integrals of length (areas) of the sections.

Remark 49

The role of spaces X and Y in Theorem 48 is symmetric, thus we can swap them in the conclusion.
The Theorem 48 can be extended to any finite number of measure spaces. For the case of three spaces (X,µ), (Y,ν), (Z,λ) we have:
(µ×ν×λ )(C)= ∫

X× Y

λ(C_xy) d (µ×ν)(x,y)= ∫

Z

(µ×ν)(C_z) d λ(z), (78)

where
C_xy = {z∈ Z: (x,y,z)∈ C},

C_z = {(x,y)∈ X× Y: (x,y,z) ∈ C}.

Theorem 50 (Fubini) Let f(x,y) be a summable function on the product of spaces (X,µ) and (Y,ν). Then:

For µ-almost every x∈ X the function f(x,y) is summable on Y and f_Y(x)=∫_Y f(x,y) d ν(y) is a µ-summable on X.
For ν-almost every y∈ Y the function f(x,y) is summable on X and f_X(y)=∫_X f(x,y) d µ(x) is a ν-summable on Y.

There are the identities:

∫

X× Y

f(x,y) d (µ×ν)(x,y)

∫

⎛
⎜
⎜
⎜
⎜
⎝

∫

f(x,y) d ν(y)

⎞
⎟
⎟
⎟
⎟
⎠

dµ(x)

(79)

∫

⎛
⎜
⎜
⎜
⎜
⎝

∫

f(x,y) d µ(x)

⎞
⎟
⎟
⎟
⎟
⎠

dν(y).

For a non-negative functions the existence of any repeated integral in (79) implies summability of f on X× Y.

Proof. From the decomposition f=f₊−f₋ we can reduce our consideration to non-negative functions. Let us consider the product of three spaces (X,µ), (Y,ν), (ℝ,λ), with λ=dz being the Lebesgue measure on ℝ. Define

C={(x,y,z)∈ X× Y× ℝ: 0≤ z≤ f(x,y)}.

Using the relation (78) we get:

C_xy

{z∈ ℝ: 0≤ z≤ f(x,y)}, λ(C_xy)=f(x,y)

C_x

{(y,z)∈ Y× ℝ: 0≤ z≤ f(x,y)}, (ν× λ)(C_x)=

∫

f(x,y) d ν(y).

the theorem follows from those relations. □

Exercise 51

Show that the first three conclusions of the Fubini Theorem may fail if f is not summable.
Show that the fourth conclusion of the Fubini Theorem may fail if f has values of different signs.

13.5 Absolute Continuity of Measures

Here, we consider another topic in the measure theory which benefits from the integration theory.

Definition 52 Let X be a set with σ-algebra R and σ-finite measure µ and finite charge ν on R. The charge ν is absolutely continuous with respect to µ if µ(A)=0 for A∈ R implies ν(A)=0. Two charges ν₁ and ν₂ are equivalent if two conditions | ν₁ |(A)=0 and | ν₂ |(A)=0 are equivalent.

The above definition seems to be not justifying “absolute continuity” name, but this will become clear from the following important theorem.

Theorem 53 (Radon–Nikodym) Any charge ν which absolutely continuous with respect to a measure µ has the form

ν(A)=

∫

f d µ,

where f is a function from L₁. The function f∈L₁ is uniquely defined by the charge ν.

Proof.[Sketch of the proof] First we will assume that ν is a measure. Let D be the collection of measurable functions g:X→[0,∞) such that

∫

g  d µ ≤ ν(E) (E∈L).

Let α = sup_g∈D ∫_X g  d µ ≤ ν(X) < ∞. So we can find a sequence (g_n) in D with ∫_X g_n  d µ → α.

We define f₀(x) = sup_n g_n(x). We can show that f₀=∞ only on a set of µ-measure zero, so if we adjust f₀ on this set, we get a measurable function f:X→[0,∞). There is now a long argument to show that f is as required.

If ν is a charge, we can find f by applying the previous operation to the measures ν₊ and ν₋ (as it is easy to verify that ν₊,ν₋⋘µ).

We show that f is essentially unique. If g is another function inducing ν, then

∫

f−g  d µ = ν(E) − ν(E) = 0 (E∈L).

Let E = {x∈ X : f(x)−g(x)≥ 0}, so as f−g is measurable, E∈L. Then ∫_E f−g  d µ =0 and f−g≥0 on E, so by our result from integration theory, we have that f−g=0 almost everywhere on E. Similarly, if F = {x∈ X : f(x)−g(x)≤ 0}, then F∈L and f−g=0 almost everywhere on F. As E∪ F=X, we conclude that f=g almost everywhere. □

Corollary 54 Let µ be a measure on X, ν be a finite charge, which is absolutely continuous with respect to µ. For any ε>0 there exists δ>0 such that µ(A)<δ implies | ν |(A)<ε .

Proof. By the Radon–Nikodym theorem there is a function f∈L₁(X,µ) such that ν(A)=∫_A f d µ. Then | ν |(A)=∫_A | f | d µ ad we get the statement from Theorem 43 on absolute continuity of the Lebesgue integral. □

14 Functional Spaces

In this section we describe various Banach spaces of functions on sets with measure.

14.1 Integrable Functions

Let (X,L,µ) be a measure space. For 1≤ p<∞, we define L_p(µ) to be the space of measurable functions f:X→K such that

∫

⎪
⎪

^p  d µ < ∞.

We define ||·||_p : L_p(µ)→[0,∞) by

⎪⎪
⎪⎪

_p =

⎛
⎜
⎜
⎜
⎜
⎝

∫

⎪
⎪

^p  d µ

⎞
⎟
⎟
⎟
⎟
⎠

1/p

(f∈ L_p(µ)).

Notice that if f=0 almost everywhere, then | f |^p=0 almost everywhere, and so ||f||_p=0. However, there can be non-zero functions such that f=0 almost everywhere. So ||·||_p is not a norm on L_p(µ).

Exercise 1 Find a measure space (X,µ) such that l_p=L_p(µ), that is the space of sequences l_p is a particular case of function spaces considered in this section. It also explains why the following proofs are referencing to Section 11 so often.

Lemma 2 (Integral Hölder inequality) Let 1<p<∞, let q∈(1,∞) be such that 1/p + 1/q=1. For f∈ L_p(µ) and g∈ L_q(µ), we have that fg is summable, and

∫

⎪
⎪

 d µ ≤

⎪⎪
⎪⎪

_q. (80)

Proof. Recall that we know from Lem. 2 that

⎪
⎪

≤

⎪
⎪

(a,b∈K).

Now we follow the steps in proof of Prop. 4. Define measurable functions a,b:X→K by setting

a(x) =

f(x)

⎪⎪
⎪⎪

, b(x) =

g(x)

⎪⎪
⎪⎪

(x∈ X).

So we have that

⎪
⎪

a(x) b(x)

⎪
⎪

≤

⎪
⎪

f(x)

⎪
⎪

⎪⎪
⎪⎪

_p^p

⎪
⎪

g(x)

⎪
⎪

⎪⎪
⎪⎪

_q^q

(x∈ X).

By integrating, we see that

∫

⎪
⎪

 d µ ≤

⎪⎪
⎪⎪

_p^p

∫

⎪
⎪

^p  d µ +

⎪⎪
⎪⎪

_q^q

∫

⎪
⎪

^q  d µ =

= 1.

Hence, by the definition of a and b,

∫

⎪
⎪

≤

⎪⎪
⎪⎪

_q,

as required. □

Lemma 3 Let f,g∈ L_p(µ) and let a∈K. Then:

||af||_p = | a | ||f||_p;
|| f+g ||_p ≤ ||f||_p + ||g||_p.

In particular, L_p is a vector space.

Proof. Part 3(1) is easy. For 3(2), we need a version of Minkowski’s Inequality, which will follow from the previous lemma. We essentially repeat the proof of Prop. 5.

Notice that the p=1 case is easy, so suppose that 1<p<∞. We have that

∫

⎪
⎪

f+g

⎪
⎪

^p  d µ

∫

⎪
⎪

f+g

⎪
⎪

^p−1

⎪
⎪

f+g

⎪
⎪

 d µ

≤

∫

⎪
⎪

f+g

⎪
⎪

^p−1

⎛
⎝

⎪
⎪

⎞
⎠

 d µ

∫

⎪
⎪

f+g

⎪
⎪

^p−1

⎪
⎪

 d µ +

∫

⎪
⎪

f+g

⎪
⎪

^p−1

⎪
⎪

 d µ.

Applying the lemma, this is

≤

⎪⎪
⎪⎪

⎛
⎜
⎜
⎜
⎜
⎝

∫

⎪
⎪

f+g

⎪
⎪

^q(p−1)  d µ

⎞
⎟
⎟
⎟
⎟
⎠

1/q

⎪⎪
⎪⎪

⎛
⎜
⎜
⎜
⎜
⎝

∫

⎪
⎪

f+g

⎪
⎪

^q(p−1)  d µ

⎞
⎟
⎟
⎟
⎟
⎠

1/q

As q(p−1)=p, we see that

⎪⎪
⎪⎪

f+g

⎪⎪
⎪⎪

_p^p ≤

⎛
⎝

⎪⎪
⎪⎪

_p +

⎪⎪
⎪⎪

⎞
⎠

⎪⎪
⎪⎪

f+g

⎪⎪
⎪⎪

_p^p/q.

As p−p/q = 1, we conclude that

⎪⎪
⎪⎪

f+g

⎪⎪
⎪⎪

_p ≤

⎪⎪
⎪⎪

_p +

⎪⎪
⎪⎪

_p,

as required.

In particular, if f,g∈ L_p(µ) then af+g∈ L_p(µ), showing that L_p(µ) is a vector space. □

We define an equivalence relation ∼ on the space of measurable functions by setting f∼ g if and only if f=g almost everywhere. We can check that ∼ is an equivalence relation (the slightly non-trivial part is that ∼ is transitive).

Proposition 4 For 1≤ p<∞, the collection of equivalence classes L_p(µ) / ∼ is a vector space, and ||·||_p is a well-defined norm on L_p(µ) / ∼.

Proof. We need to show that addition, and scalar multiplication, are well-defined on L_p(µ)/∼. Let a∈K and f₁,f₂,g₁,g₂∈ L_p(µ) with f₁∼ f₂ and g₁∼ g₂. Then it’s easy to see that af₁+g₁ ∼ af₂+g₂; but this is all that’s required!

If f ∼ g then | f |^p = | g |^p almost everywhere, and so ||f||_p = ||g||_p. So ||·||_p is well-defined on equivalence classes. In particular, if f∼ 0 then ||f||_p=0. Conversely, if ||f||_p=0 then ∫_X | f |^p d µ=0, so as | f |^p is a positive function, we must have that | f |^p=0 almost everywhere. Hence f=0 almost everywhere, so f∼ 0. That is,

⎧
⎨
⎩

f∈ L_p(µ) : f∼ 0

⎫
⎬
⎭

⎧
⎨
⎩

f∈ L_p(µ) :

⎪⎪
⎪⎪

_p=0

⎫
⎬
⎭

It follows from the above lemma that this is a subspace of L_p(µ).

The above lemma now immediately shows that ||·||_p is a norm on L_p(µ)/∼. □

Definition 5 We write L_p(µ) for the normed space (L_p(µ)/∼ , ||·||_p).

We will abuse notation and continue to write members of L_p(µ) as functions. Really they are equivalence classes, and so care must be taken when dealing with L_p(µ). For example, if f∈ L_p(µ), it does not make sense to talk about the value of f at a point.

Theorem 6 Let (f_n) be a Cauchy sequence in L_p(µ). There exists f∈ L_p(µ) with ||f_n−f||_p→ 0. In fact, we can find a subsequence (n_k) such that f_{n_k}→ f pointwise, almost everywhere.

Proof. Consider first the case of a finite measure space X. We again follow the three steps scheme from Rem. 7. Let f_n be a Cauchy sequence in L_p(µ). From the Hölder inequality (80) we see that ||f_n−f_m||₁≤ ||f_n−f_m||_p (µ(X))^1/q. Thus, f_n is also a Cauchy sequence in L₁(µ). Thus by the Theorem 42 there is the limit function f∈ L₁(µ). Moreover, from the proof of that theorem we know that there is a subsequence f_{n_k} of f_n convergent to f almost everywhere. Thus in the Cauchy sequence inequality

∫

⎪
⎪

f_{n_k} −f_{n_m}

⎪
⎪

^p d µ <ε

we can pass to the limit m→ ∞ by the Fatou Lemma 39 and conclude:

∫

⎪
⎪

f_{n_k} −f

⎪
⎪

^p d µ <ε.

So, f_{n_k} converges to f in L_p(µ), then f_n converges to f in L_p(µ) as well.

For a σ-finite measure µ we represent X=⊔_k X_k with µ(X_k)<+∞ for all k. The restriction (f_n^(k)) of a Cauchy sequence (f_n)⊂L_p(X,µ) to every X_k is a Cauchy sequence in L_p(X_k,µ). By the previous paragraph there is the limit f^(k)∈ L_p(X_k,µ). Define a function f∈L_p(X,µ) by the identities f(x)=f^(k) if x∈ X_k. By the additivity of integral, the Cauchy condition on (f_n) can be written as:

∫

⎪
⎪

f_n−f_m

⎪
⎪

^p d µ=

∞

∑

k=1

∫

X_k

⎪
⎪

f_n^(k)−f_m^(k)

⎪
⎪

^p d µ<ε.

It implies for any M:

∑

k=1

∫

X_k

⎪
⎪

f_n^(k)−f_m^(k)

⎪
⎪

^p d µ<ε.

In the last inequality we can pass to the limit m→ ∞:

∑

k=1

∫

X_k

⎪
⎪

f_n^(k)−f^(k)

⎪
⎪

^p d µ<ε.

Since the last inequality is independent of M we conclude:

∫

⎪
⎪

f_n−f

⎪
⎪

^p d µ=

∞

∑

k=1

∫

X_k

⎪
⎪

f_n^(k)−f^(k)

⎪
⎪

^p d µ<ε.

Thus we conclude that f_n→ f in L_p(X,µ). □

Corollary 7 L_p(µ) is a Banach space.

Example 8 If p=2 then L_p(µ)=L₂(µ) can be equipped with the inner product:

⟨ f,g ⟩ =

∫

fḡ d µ. (81)

The previous Corollary implies that L₂(µ) is a Hilbert space, see a preliminary discussion in Defn. 22.

Proposition 9 Let (X,L,µ) be a measure space, and let 1≤ p<∞. We can define a map Φ:L_q(µ) → L_p(µ)^* by setting Φ(f)=F, for f∈ L_q(µ), 1/p+1/q=1, where

F:L_p(µ)→K, g ↦

∫

fg  d µ (g∈L_p(µ)).

Proof. This proof very similar to proof of Thm. 13. For f∈ L_q(µ) and g∈ L_p(µ), it follows by the Hölder’s Inequality (80), that fg is summable, and

⎪
⎪
⎪
⎪
⎪
⎪

∫

fg  d µ

⎪
⎪
⎪
⎪
⎪
⎪

≤

∫

⎪
⎪

 d µ ≤

⎪⎪
⎪⎪

_p.

Let f₁,f₂∈ L_q(µ) and g₁,g₂∈ L_p(µ) with f₁∼ f₂ and g₁∼ g₂. Then f₁g₁ = f₂g₁ almost everywhere and f₂g₁ = f₂g₂ almost everywhere, so f₁g₁ = f₂g₂ almost everywhere, and hence

∫

f₁g₁  d µ =

∫

f₂g₂  d µ.

So Φ is well-defined.

Clearly Φ is linear, and we have shown that ||Φ(f)|| ≤ ||f||_q.

Let f∈ L_q(µ) and define g:X→K by

g(x) =

⎧
⎪
⎨
⎪
⎩

f(x)

⎪
⎪

f(x)

⎪
⎪

^q−2

: f(x)≠0,

: f(x)=0.

Then | g(x) | = | f(x) |^q−1 for all x∈ X, and so

∫

⎪
⎪

^p  d µ =

∫

⎪
⎪

^p(q−1)  d µ =

∫

⎪
⎪

^q  d µ,

so ||g||_p = ||f||_q^q/p, and so, in particular, g∈L_p(µ). Let F=Φ(f), so that

F(g) =

∫

fg  d µ =

∫

⎪
⎪

^q  d µ =

⎪⎪
⎪⎪

_q^q.

Thus ||F|| ≥ ||f||_q^q / ||g||_p = ||f||_q. So we conclude that ||F|| = ||f||_q, showing that Φ is an isometry. □

Proposition 10 Let (X,L,µ) be a finite measure space, let 1≤ p<∞, and let F∈L_p(µ)^*. Then there exists f∈L_q(µ), 1/p+1/q=1 such that

F(g) =

∫

fg  d µ (g∈L_p(µ)).

Proof.[Sketch of the proof] As µ(X)<∞, for E∈L, we have that ||χ_E||_p = µ(E)^1/p < ∞. So χ_E∈L_p(µ), and hence we can define

ν(E) = F(χ_E) (E∈L).

We proceed to show that ν is a signed (or complex) measure. Then we can apply the Radon-Nikodym Theorem 53 to find a function f:X→K such that

F(χ_E) = ν(E) =

∫

f  d µ (E∈L).

There is then a long argument to show that f∈ L_q(µ), which we skip here. Finally, we need to show that

∫

fg  d µ = F(g)

for all g∈ L_p(µ), and not just for g=χ_E. That follows for simple functions with a finite set of values by linearity of the Lebesgue integral and F. Then, it can be extended by continuity to the entire space L_p(µ) in view in the following Prop. 14. □

Proposition 11 For 1<p<∞, we have that L_p(µ)^* = L_q(µ) isometrically, under the identification of the above results.

Remark 12

For p=q=2 we obtain a special case of the Riesz–Frechét theorem 11 about self-duality of the Hilbert space L₂(µ).
Note that L_∞^* is not isomorphic to L₁, except finite-dimensional situation. Moreover if µ is not a point measure L₁ is not a dual to any Banach space.

Exercise 13 Let µ be a measure on the real line.

Show that the space L_∞(ℝ,µ) is either finite-dimensional or non-separable.
Show that for p≠ q neither L_p(ℝ,µ) nor L_q(ℝ,µ) contains the other space.

14.2 Dense Subspaces in L_p

We note that f∈L_p(X) if and only if | f |^p is summable, thus we can use all results from Section 13 to investigate L_p(X).

Proposition 14 Let (X,L,µ) be a finite measure space, and let 1≤ p<∞. Then the collection of simple bounded functions attained only a finite number of values is dense in L_p(µ).

Proof.Let f∈L_p(µ), and suppose for now that f≥0. For each n∈ℕ, let

f_n = min(n,

⌊ n f ⌋).

Then each f_n is simple, f_n ↑ f (i.e. monotonically converges from below), and | f_n−f |^p→0 pointwise. For each n, we have that

0 ≤ f_n ≤ f 0 ≤ f−f_n ≤ f,

so that | f−f_n |^p ≤ | f |^p for all n. As ∫| f |^p  d µ<∞, we can apply the Dominated Convergence Theorem to see that

lim

∫

⎪
⎪

f_n−f

⎪
⎪

^p  d µ = 0,

that is, ||f_n−f||_p → 0.

The general case follows by taking positive and negative parts, and if K=ℂ, by taking real and imaginary parts first. □

Corollary 15 Let µ be the Lebesgue measure on the real line. The collection of simple bounded functions with compact supports attained only a finite number of values is dense in L_p(ℝ,µ).

Proof. Let f ∈ L_p(ℝ,µ), since ∫_ℝ | f |^p d µ = ∑_k=−∞^∞ ∫_[k,k+1) | f |^p d µ there exists N such that ∑_k=−∞^−N + ∑_N^k=∞ ∫_[k,k+1) | f |^p d µ < ε . By the previous Proposition, the restriction of f to [−N,N] can be ε-approximated by a simple bounded function f₁ with support in [−N,N] attained only a finite number of values. Therefore f₁ will be also (2ε)-approximation to f as well. □

Definition 16 A function f:ℝ→ ℂ is called step function if it a linear combination of a finite number of indicator functions of half-open disjoint intervals: f=∑_k c_k χ_{[a_k,b_k)}.

The regularity of the Lebesgue measure allows to make a stronger version of Prop. 14.

Lemma 17 The space of step functions is dense in L_p(ℝ).

Proof. By Prop. 14, for a given f∈L_p(ℝ) and ε>0 there exists a simple function f₀=∑_k=1ⁿ c_k χ_{A_k} such that ||f−f₀||_p<ε/2. Let M=||f₀||_∞ < ∞. By measurability of the set A_k there is C_k=⊔_j^m_k [a_{j_k},b_{j_k}) a disjoint finite union of half-open intervals such that µ(C_k▵ A_k)<ε/2n³ M. Since A_k and A_j are disjoint for k≠ j we also obtain by the triangle inequality: µ(C_j ∩ A_k)<ε/2n³ M and µ(C_j ∩ C_k)<2ε/2n³ M. We define a step function

f₁=

∑

k=1

c_k χ_{C_k}=

∑

k=1

m_k

∑

c_k χ_{[a_{j_k},b_jk)}.

Clearly

f₁(x)=c_k for all x ∈ A_k∖ ((C_k▵ A_k) ⋃(⋃_j≠ k C_j)).

Thus:

µ({x ∈ ℝ  ∣  f₀(x)≠ f₁(x)}) ≤ n· n·

2n³ M

2 n M

Then ||f₀−f₁||_p≤ nM· ε/2 n M=ε/2 because ||f₁||_∞< nM. Thus ||f−f₁||_p<ε. □

Corollary 18 The collection of continuous function belonging to L_p(ℝ) is dense in L_p(ℝ).

Proof. In view of Rem. 29 and the previous Lemma it is enough to show that the characteristic function of an interval [a,b] can be approximated by a continuous function in L_p(ℝ). The idea of such approximation is illustrated by Fig. 4 and we skip the technical details. □

We will establish denseness of the subspace of smooth function in § 15.4.

Exercise 19 Show that every f∈L₁(ℝ) is continuous on average , that is for any ε>0 there is δ>0 such that for all t such that | t |<δ we have:

∫

ℝ

⎪
⎪

f(x)−f(x+t)

⎪
⎪

 d x < ε . (82)

Here is an alternative demonstration of a similar result, it essentially encapsulate all the above separate statements. Let ([0,1],L,µ) be the restriction of Lebesgue measure to [0,1]. We often write L_p([0,1]) instead of L_p(µ).

Proposition 20 For 1≤ p<∞, we have that C_K([0,1]) is dense in L_p([0,1]).

Proof. As [0,1] is a finite measure space, and each member of C_K([0,1]) is bounded, it is easy to see that each f∈ C_K([0,1]) is such that ||f||_p<∞. So it makes sense to regard C_K([0,1]) as a subspace of L_p(µ). If C_K([0,1]) is not dense in L_p(µ), then we can find a non-zero F∈L_p([0,1])^* with F(f)=0 for each f∈ C_K([0,1]). This was a corollary of the Hahn-Banach theorem 15.

So there exists a non-zero g∈ L_q([0,1]) with

∫

[0,1]

fg  d µ = 0 (f∈ C_K([0,1])).

Let a<b in [0,1]. By approximating χ_(a,b) by a continuous function, we can show that ∫_(a,b) g  d µ = ∫ g χ_(a,b)  d µ = 0.

Suppose for now that K=ℝ. Let A = { x∈[0,1] : g(x)≥0 } ∈ L. By the definition of the Lebesgue (outer) measure, for є>0, there exist sequences (a_n) and (b_n) with A ⊆ ∪_n (a_n,b_n), and ∑_n (b_n−a_n) ≤ µ(A) + є.

For each N, consider ∪_n=1^N (a_n,b_n). If some (a_i,b_i) overlaps (a_j,b_j), then we could just consider the larger interval (min(a_i,a_j), max(b_i,b_j)). Formally by an induction argument, we see that we can write ∪_n=1^N (a_n,b_n) as a finite union of some disjoint open intervals, which we abusing notations still denote by (a_n,b_n). By linearity, it hence follows that for N∈ℕ, if we set B_N = ⊔_n=1^N (a_n,b_n), then

∫

g χ_{B_N}  d µ =

∫

g χ_{(a₁,b₁)⊔⋯⊔(a_N,b_N)}  d µ = 0.

Let B=∪_n (a_n,b_n), so A⊆ B and µ(B) ≤ ∑_n (b_n−a_n) ≤ µ(A)+є. We then have that

⎪
⎪

∫

g χ_{B_N}  d µ −

∫

g χ_B  d µ

⎪
⎪

∫

g χ_{B∖ (a₁,b₁)⊔⋯⊔(a_N,b_N)}  d µ

⎪
⎪

We now apply Hölder’s inequality to get

⎛
⎝

∫

χ_{B∖ (a₁,b₁)⋃⋯⋃(a_N,b_N)}  d µ

⎞
⎠

^1/p

⎪⎪
⎪⎪

= µ(B∖ (a₁,b₁)⊔⋯⊔(a_N,b_N))^1/p

⎪⎪
⎪⎪

≤

⎛
⎜
⎜
⎝

∞

∑

n=N+1

(b_n−a_n)

⎞
⎟
⎟
⎠

1/p

⎪⎪
⎪⎪

_q.

We can make this arbitrarily small by making N large. Hence we conclude that

∫

g χ_B  d µ=0.

Then we apply Hölder’s inequality again to see that

⎪
⎪

∫

gχ_A  d µ

⎪
⎪

∫

gχ_A  d µ −

∫

gχ_B  d µ

⎪
⎪

∫

g χ_B∖ A  d µ

⎪
⎪

≤

⎪⎪
⎪⎪

_q µ(B∖ A)^1/p ≤

⎪⎪
⎪⎪

_q є^1/p.

As є>0 was arbitrary, we see that ∫_A g  d µ=0. As g is positive on A, we conclude that g=0 almost everywhere on A.

A similar argument applied to the set {x∈[0,1] : g(x)≤0} allows us to conclude that g=0 almost everywhere. If K=ℂ, then take real and imaginary parts. □

14.3 Continuous functions

Let K be a compact (always assumed Hausdorff) topological space.

Definition 21 The Borel σ-algebra , B(K), on K, is the σ-algebra generated by the open sets in K (recall what this means from Section 11.5). A member of B(K) is a Borel set .

Notice that if f:K→K is a continuous function, then clearly f is B(K)-measurable (the inverse image of an open set will be open, and hence certainly Borel). So if µ:B(K)→K is a finite real or complex charge (for K=ℝ or K=ℂ respectively), then f will be µ-summable (as f is bounded) and so we can define

φ_µ:C_K(K) → K, φ_µ(f) =

∫

f  d µ (f∈ C_K(K)).

Clearly φ_µ is linear. Suppose for now that µ is positive, so that

⎪
⎪

φ_µ(f)

⎪
⎪

≤

∫

⎪
⎪

 d µ ≤

⎪⎪
⎪⎪

_∞ µ(K) (f∈ C_K(K)).

So φ_µ∈ C_K(K)^* with ||φ_µ||≤ µ(K).

The aim of this section is to show that all of C_K(K)^* arises in this way. First we need to define a class of measures which are in a good agreement with the topological structure.

Definition 22 A measure µ:B(K)→[0,∞) is regular if for each A∈B(K), we have

µ(A)

= sup

⎧
⎨
⎩

µ(E) : E⊆ A and E is compact

⎫
⎬
⎭

= inf

⎧
⎨
⎩

µ(U) : A⊆ U and U is open

⎫
⎬
⎭

A charge ν=ν₊−ν₋ is regular if ν₊ and ν₋ are regular measures. A complex measure is regular if its real and imaginary parts are regular.

Note the similarity between this notion and definition of outer measure.

Example 23

Many common measures on the real line, e.g. the Lebesgue measure, point measures, etc., are regular.
An example of the measure µ on [0,1] which is not regular:
µ(∅)=0, µ(
{
1

2

}

)=1, µ(A)=+∞,

for any other subset A⊂[0,1].
Another example of a σ-additive measure µ on [0,1] which is not regular:
µ(A)= ⎧
⎨
⎩
0, if A is at most countable;

+∞ otherwise.

The following subspace of the space of all simple functions is helpful.

As we are working only with compact spaces, for us, “compact” is the same as “closed”. Regular measures somehow interact “well” with the underlying topology on K.

We let M_ℝ(K) and M_ℂ(K) be the collection of all finite, regular real or complex charges (that is, signed or complex measures) on B(K).

Exercise 24 Check that, M_ℝ(K) and M_ℂ(K) are real or complex, respectively, vector spaces for the obvious definition of addition and scalar multiplication.

Recall, Defn. 31, that for µ∈ M_K(K) we define the variation of µ

⎪⎪
⎪⎪

= sup

⎧
⎪
⎨
⎪
⎩

∞

∑

n=1

⎪
⎪

µ(A_n)

⎪
⎪

⎫
⎪
⎬
⎪
⎭

where the supremum is taken over all sequences (A_n) of pairwise disjoint members of B(K), with ⊔_n A_n=K. Such (A_n) are called partitions.

Proposition 25 The variation ||·|| is a norm on M_K(K).

Proof. If µ=0 then clearly ||µ||=0. If ||µ||=0, then for A∈B(K), let A₁=A, A₂=K∖ A and A₃=A₄=⋯=∅. Then (A_n) is a partition, and so

0 =

∞

∑

n=1

⎪
⎪

µ(A_n)

⎪
⎪

µ(A)

⎪
⎪

µ(K∖ A)

⎪
⎪

Hence µ(A)=0, and so as A was arbitrary, we have that µ=0.

Clearly ||aµ|| = | a |||µ|| for a∈K and µ∈ M_K(K).

For µ,λ∈ M_K(K) and a partition (A_n), we have that

∑

⎪
⎪

(µ+λ)(A_n)

⎪
⎪

∑

⎪
⎪

µ(A_n)+λ(A_n)

⎪
⎪

≤

∑

⎪
⎪

µ(A_n)

⎪
⎪

∑

⎪
⎪

λ(A_n)

⎪
⎪

≤

⎪⎪
⎪⎪

As (A_n) was arbitrary, we see that ||µ+λ|| ≤ ||µ|| + ||λ||. □

To get a handle on the “regular” condition, we need to know a little more about C_K(K).

Theorem 26 (Urysohn’s Lemma) Let K be a compact space, and let E,F be closed subsets of K with E∩ F=∅. There exists f:K→[0,1] continuous with f(x)=1 for x∈ E and f(x)=0 for x∈ F (written f(E)={1} and f(F)={0}).

Proof. See a book on (point set) topology. □

Lemma 27 Let µ:B(K)→[0,∞) be a regular measure. Then for U⊆ K open, we have

µ(U) = sup

⎧
⎪
⎪
⎨
⎪
⎪
⎩

∫

f  d µ : f∈ C_ℝ(K), 0≤ f≤χ_U

⎫
⎪
⎪
⎬
⎪
⎪
⎭

Proof. If 0≤ f≤χ_U, then

0 =

∫

0  d µ ≤

∫

f  d µ ≤

∫

χ_U  d µ = µ(U).

Conversely, let F=K∖ U, a closed set. Let E⊆ U be closed. By Urysohn Lemma 26, there exists f:K→[0,1] continuous with f(E)={1} and f(F)={0}. So χ_E ≤ f ≤ χ_U, and hence

µ(E) ≤

∫

f  d µ ≤ µ(U).

As µ is regular,

µ(U) = sup

⎧
⎨
⎩

µ(E) : E⊆ U closed

⎫
⎬
⎭

≤ sup

⎧
⎪
⎪
⎨
⎪
⎪
⎩

∫

f  d µ : 0≤ f≤χ_U

⎫
⎪
⎪
⎬
⎪
⎪
⎭

≤ µ(U).

Hence we have equality throughout. □

The next result tells that the variation coincides with the norm on real charges viewed as linear functionals on C_ℝ(K).

Lemma 28 Let µ∈ M_ℝ(K). Then

⎪⎪
⎪⎪

φ_µ

⎪⎪
⎪⎪

:= sup

⎧
⎪
⎪
⎨
⎪
⎪
⎩

⎪
⎪
⎪
⎪
⎪
⎪

∫

f  d µ

⎪
⎪
⎪
⎪
⎪
⎪

: f∈ C_ℝ(K),

⎪⎪
⎪⎪

_∞≤ 1

⎫
⎪
⎪
⎬
⎪
⎪
⎭

Proof. Let (A,B) be a Hahn decomposition (Thm. 36) for µ. For f∈ C_ℝ(K) with ||f||_∞≤ 1, we have that

⎪
⎪
⎪
⎪
⎪
⎪

∫

f  d µ

⎪
⎪
⎪
⎪
⎪
⎪

≤

⎪
⎪
⎪
⎪
⎪
⎪

∫

f  d µ

⎪
⎪
⎪
⎪
⎪
⎪

∫

f  d µ

⎪
⎪
⎪
⎪
⎪
⎪

∫

f  d µ₊

⎪
⎪
⎪
⎪
⎪
⎪

∫

f  d µ₋

⎪
⎪
⎪
⎪
⎪
⎪

≤

∫

⎪
⎪

 d µ₊ +

∫

⎪
⎪

 d µ₋ ≤

⎪⎪
⎪⎪

_∞

⎛
⎝

µ(A) − µ(B)

⎞
⎠

≤

⎪⎪
⎪⎪

_∞

⎪⎪
⎪⎪

using the fact that µ(B)≤0 and that (A,B) is a partition of K.

Conversely, as µ is regular, for є>0, there exist closed sets E and F with E⊆ A, F⊆ B, and with µ₊(E)> µ₊(A)−є and µ₋(F)>µ₋(B)−є. By Urysohn Lemma 26, there exists f:K→[0,1] continuous with f(E)={1} and f(F)={0}. Let g=2f−1, so g is continuous, g takes values in [−1,1], and g(E)={1}, g(F)={−1}. Then

∫

g  d µ

∫

1  d µ +

∫

−1  d µ +

∫

K∖ (E⋃ F)

g  d µ

= µ(E) − µ(F) +

∫

A∖ E

g  d µ +

∫

B∖ F

g  d µ

As E⊆ A, we have µ(E) = µ₊(E), and as F⊆ B, we have −µ(F)=µ₋(F). So

∫

g  d µ

> µ₊(A)−є + µ₋(B) − є +

∫

A∖ E

g  d µ +

∫

B∖ F

g  d µ

≥

⎪
⎪

µ(A)

⎪
⎪

µ(B)

⎪
⎪

− 2є −

⎪
⎪

µ(A∖ E)

⎪
⎪

−

⎪
⎪

µ(B∖ F)

⎪
⎪

≥

⎪
⎪

µ(A)

⎪
⎪

µ(B)

⎪
⎪

− 4є.

As є>0 was arbitrary, we see that ||φ_µ|| ≥ | µ(A) |+| µ(B) |=||µ||. □

Thus, we know that M_ℝ(K) is isometrically embedded in C_ℝ(K)^*.

14.4 Riesz Representation Theorem

To facilitate an approach to the key point of this Subsection we will require some more definitions.

Definition 29 A functional F on C₍K) is positive if for any non-negative function f≥ 0 we have F(f)≥0.

Lemma 30 Any positive linear functional F on C(X) is continuous and ||F||=F(1), where 1 is the function identically equal to 1 on X.

Proof. For any function f such that ||f||_∞≤ 1 the function 1−f is non negative thus: F(1)−F(f)=F(1−f)>0, Thus F(1)>F(f), that is F is bounded and its norm is F(1). □

So for a positive functional you know the exact place where to spot its norm, while a linear functional can attain its norm in an generic point (if any) of the unit ball in C(X). It is also remarkable that any bounded linear functional can be represented by a pair of positive ones.

Lemma 31 Let λ be a continuous linear functional on C(X). Then there are positive functionals λ₊ and λ₋ on C(X), such that λ=λ₊−λ₋.

Proof. First, for f∈ C_ℝ(K) with f≥0, we define

λ₊(f)

= sup

⎧
⎨
⎩

λ(g) : g∈ C_ℝ(K), 0≤ g≤ f

⎫
⎬
⎭

≥0,

λ₋(f)

= λ₊(f) − λ(f) = sup

⎧
⎨
⎩

λ(g)−λ(f): g∈ C_ℝ(K), 0≤ g≤ f

⎫
⎬
⎭

= sup

⎧
⎨
⎩

λ(h): h∈ C_ℝ(K), 0≤ h+f≤ f

⎫
⎬
⎭

= sup

⎧
⎨
⎩

λ(h): h∈ C_ℝ(K), −f ≤ h ≤ 0

⎫
⎬
⎭

≥ 0.

In a sense, this is similar to the Hahn decomposition (Thm. 36).

We can check that

λ₊(tf) = tλ₊(f), λ₋(tf) = tλ₋(f) (t≥0, f≥0).

For f₁,f₂≥ 0, we have that

λ₊(f₁+f₂)

= sup

⎧
⎨
⎩

λ(g): 0≤ g ≤ f₁+f₂

⎫
⎬
⎭

= sup

⎧
⎨
⎩

λ(g₁+g₂): 0≤ g₁+g₂ ≤ f₁+f₂

⎫
⎬
⎭

≥ sup

⎧
⎨
⎩

λ(g₁) + λ(g₂): 0≤ g₁≤ f₁, 0 ≤ g₂ ≤ f₂

⎫
⎬
⎭

= λ₊(f₁) + λ₊(f₂).

Conversely, if 0≤ g≤ f₁+f₂, then set g₁ = min(g,f₁), so 0≤ g₁ ≤ f₁. Let g₂ = g−g₁ so g₁≤ g implies that 0≤ g₂. For x∈ K, if g₁(x)=g(x) then g₂(x) = 0 ≤ f₂(x); if g₁(x)=f₁(x) then f₁(x)≤ g(x) and so g₂(x) = g(x)−f₁(x) ≤ f₂(x). So 0 ≤ g₂ ≤ f₂, and g = g₁ + g₂. So in the above displayed equation, we really have equality throughout, and so λ₊(f₁+f₂) = λ₊(f₁) + λ₊(f₂). As λ is additive, it is now immediate that λ₋(f₁+f₂) = λ₋(f₁) + λ₋(f₂)

For f∈ C_ℝ(K) we put f₊(x)=max(f(x),0) and f₋(x)=−min(f(x),0). Then f_±≥ 0 and f=f₊−f₋. We define:

λ₊(f) = λ₊(f₊) − λ₊(f₋), λ₋(f) = λ₋(f₊) − λ₋(f₋).

As when we were dealing with integration, we can check that λ₊ and λ₋ become linear functionals; by the previous Lemma they are bounded. □

Finally, we need a technical definition.

Definition 32 For f∈ C_ℝ(K), we define the support of f, written supp(f), to be the closure of the set {x∈ K : f(x)≠0}.

Theorem 33 (Riesz Representation) Let K be a compact (Hausdorff) space, and let λ∈ C_K(K)^*. There exists a unique µ∈ M_K(K) such that

λ(f) =

∫

f  d µ ( f∈ C_K(K) ).

Furthermore, ||λ|| = ||µ||.

Proof. Let us show uniqueness. If µ₁,µ₂∈ M_K(K) both induce λ then µ = µ₁−µ₂ induces the zero functional on C_K(K). So for f∈ C_ℝ(K),

= ℜ

∫

f  d µ =

∫

f  d µ_r

= ℑ

∫

f  d µ =

∫

f  d µ_i.

So µ_r and µ_i both induce the zero functional on C_ℝ(K). By Lemma 28, this means that ||µ_r|| = ||µ_i||=0, showing that µ = µ_r + iµ_i = 0, as required.

Existence is harder, and we shall only sketch it here. Firstly, we shall suppose that K=ℝ and that λ is positive.

Motivated by the above Lemmas 27 and 28, for U⊆ K open, we define

µ^*(U) = sup

⎧
⎨
⎩

λ(f): f∈ C_ℝ(K), 0≤ f≤χ_U, supp(f)⊆ U

⎫
⎬
⎭

For A⊆ K general, we define

µ^*(A) = inf

⎧
⎨
⎩

µ^*(U): U⊆ K is open, A⊆ U

⎫
⎬
⎭

We then proceed to show that

µ^* is an outer measure: this requires a technical topological lemma, where we make use of the support condition in the definition.
We then check that every open set in µ^*-measurable.
As B(K) is generated by open sets, and the collection of µ^*-measurable sets is a σ-algebra, it follows that every member of B(K) is µ^*-measurable.
By using results from Section 12, it follows that if we let µ be the restriction of µ^* to B(K), then µ is a measure on B(K).
We then check that this measure is regular.
Finally, we show that µ does induce the functional λ. Arguably, it is this last step which is the hardest (or least natural to prove).

If λ is not positive, then by Lemma 31 represent it as λ=λ₊−λ₋ for positive λ_±. As λ₊ and λ₋ are positive functionals, we can find µ₊ and µ₋ positive measures in M_ℝ(K) such that

λ₊(f) =

∫

f  d µ₊, λ₋(f) =

∫

f  d µ₋ (f∈ C_ℝ(K)).

Then if µ = µ₊ − µ₋, we see that

λ(f) = λ₊(f) − λ₋(f) =

∫

f  d µ (f∈ C_ℝ(K)).

Finally, if K=ℂ, then we use the same “complexification” trick from the proof of the Hahn-Banach Theorem 15. Namely, let λ∈ C_ℂ(K)^*, and define λ_r, λ_i∈ C_ℝ(K)^* by

λ_r(f) = ℜ λ(f), λ_i(f) = ℑ λ(f) ( f∈ C_ℝ(K) ).

These are both clearly ℝ-linear. Notice also that | λ_r(f) | = | ℜ λ(f) | ≤ | λ(f) | ≤ ||λ|| ||f||_∞, so λ_r is bounded; similarly λ_i.

By the real version of the Riesz Representation Theorem, there exist charges µ_r and µ_i such that

ℜ λ(f) = λ_r(f) =

∫

f  d µ_r, ℑλ(f) = λ_i(f) =

∫

f  d µ_i (f∈ C_ℝ(K) ).

Then let µ=µ_r+iµ_i, so for f∈ C_ℂ(K),

∫

f  d µ

∫

f  d µ_r + i

∫

f  d µ_i

∫

ℜ (f)  d µ_r + i

∫

ℑ(f)  d µ_r + i

∫

ℜ (f)  d µ_i −

∫

ℑ(f)  d µ_i

= λ_r(ℜ (f)) + iλ_r(ℑ(f)) + iλ_i(ℜ (f)) − λ_i(ℑ(f))

= ℜ λ(ℜ (f)) + iℜ λ(ℑ(f)) +iℑλ(ℜ (f)) − ℑλ(ℑ(f))

= λ(ℜ (f) + iℑ(f)) = λ(f),

as required. □

Notice that we have not currently proved that ||µ|| = ||λ|| in the case K=ℂ. See a textbook for this.

15 Fourier Transform

In this section we will briefly present a theory of Fourier transform focusing on commutative group approach. We mainly follow footsteps of [5]*Ch. IV.

15.1 Convolutions on Commutative Groups

Let G be a commutative group, we will use + sign to denote group operation, respectively the inverse elements of g∈ G will be denoted −g. We assume that G has a Hausdorff topology such that operations (g₁,g₂)↦ g₁+g₂ and g↦ −g are continuous maps. We also assume that the topology is locally compact, that is the group neutral element has a neighbourhood with a compact closure.

Example 1 Our main examples will be as follows:

G=ℤ the group of integers with operation of addition and the discrete topology (each point is an open set).
G=ℝ the group of real numbers with addition and the topology defined by open intervals.
G=T the group of Euclidean rotations the unit circle in ℝ² with the natural topology. Another realisations of the same group:
- Unimodular complex numbers under multiplication.
- Factor group ℝ/ℤ, that is addition of real numbers modulo 1.
There is a homomorphism between two realisations given by z=e^{2πi t}, t∈[0,1), | z |=1.

We assume that G has a regular Borel measure which is invariant in the following sense.

Definition 2 Let µ be a measure on a commutative group G, µ is called invariant (or Haar measure ) if for any measurable X and any g∈ G the sets g+X and −X are also measurable and µ(X)=µ(g+X)=µ(−X).

Such an invariant measure exists if and only if the group is locally compact, in this case the measure is uniquely defined up to the constant factor.

Exercise 3 Check that in the above three cases invariant measures are:

G=ℤ, the invariant measure of X is equal to number of elements in X.
G=ℝ the invariant measure is the Lebesgue measure.
G=T the invariant measure coincides with the Lebesgue measure.

Definition 4 A convolution of two functions on a commutative group G with an invariant measure µ is defined by:

(f₁*f₂)(x)=

∫

f₁(x−y) f₂(y) d µ(y)=

∫

f₁(y) f₂(x−y) d µ(y). (83)

Theorem 5 If f₁, f₂∈L₁(G,µ), then the integrals in (83) exist for almost every x∈ G, the function f₁*f₂ is in L₁(G,µ) and ||f₁*f₂||≤ ||f₁||·||f₂||.

Proof. If f₁, f₂∈L₁(G,µ) then by Fubini’s Thm. 50 the function φ(x,y)=f₁(x)*f₂(y) is in L₁(G× G, µ× µ) and ||φ||=||f₁||· ||f₂||.

Let us define a map τ: G× G → G× G such that τ(x,y)=(x+y,y). It is measurable (send Borel sets to Borel sets) and preserves the measure µ×µ. Indeed, for an elementary set C=A× B⊂ G× G we have:

(µ×µ)(τ(C))

∫

G× G

χ_τ(C)(x,y) d µ(x) d µ(y)

∫

G× G

χ_C(x−y,y) d µ(x) d µ(y)

∫

⎛
⎜
⎜
⎜
⎜
⎝

∫

χ_C(x−y,y) d µ(x)

⎞
⎟
⎟
⎟
⎟
⎠

dµ(y)

∫

µ(A+y) d µ(y)=µ(A)× µ(B)=(µ×µ)(C).

We used invariance of µ and Fubini’s Thm. 50. Therefore we have an isometric isomorphism of L₁(G× G,µ× µ) into itself by the formula:

Tφ(x,y)=φ(τ(x,y))=φ(x−y,y).

If we apply this isomorphism to the above function φ(x,y)=f₁(x)*f₂(y) we shall obtain the statement. □

Definition 6 Denote by S(k) the map S(k): f↦ k*f which we will call convolution operator with the kernel k.

Corollary 7 If k∈L₁(G) then the convolution S(k) is a bounded linear operator on L₁(G).

Theorem 8 Convolution is a commutative, associative and distributive operation. In particular S(f₁)S(f₂)=S(f₂)S(f₁)=S(f₁*f₂).

Proof. Direct calculation using change of variables. □

It follows from Thm. 5 that convolution is a closed operation on L₁(G) and has nice properties due to Thm. 8. We fix this in the following definition.

Definition 9 L₁(G) equipped with the operation of convolution is called convolution algebra L₁(G).

The following operators of special interest.

Definition 10 An operator of shift T(a) acts on functions by T(a): f(x)↦ f(x+a).

Lemma 11 An operator of shift is an isometry of L_p(G), 1≤ p≤∞.

Theorem 12 Operators of shifts and convolutions commute:

T(a)(f₁*f₂)=T(a)f₁*f₂=f₁*T(a)f₂,

T(a)S(f)=S(f)T(a)=S(T(a)f).

Proof. Just another calculation with a change of variables. □

Remark 13 Note that operator of shifts T(a) provide a representation of the group G by linear isometric operators in L_p(G), 1≤ p≤ ∞. A map f↦ S(f) is a representation of the convolution algebra

There is a useful relation between support of functions and their convolutions.

Lemma 14 For any f₁, f₂∈L₁(G) we have:

supp(f₁*f₂)⊂supp(f₁)+supp(f₂).

Proof. If x∉supp(f₁)+supp(f₂) then for any y∈supp(f₂) we have x−y∉supp(f₁). Thus for such x convolution is the integral of the identical zero. □

Exercise 15 Suppose that the function f₁ is compactly supported and k times continuously differentiate in ℝ, and that the function f₂ belongs to L₁(ℝ). Prove that the convolution f₁* f₂ has continuous derivatives up to order k.
[Hint: Express the derivative d/d x as the limit of operators (T(h)−I)/h when h→ 0 and use Thm. 12.]

15.2 Characters of Commutative Groups

Our purpose is to map the commutative algebra of convolutions to a commutative algebra of functions with point-wise multiplication. To this end we first represent elements of the group as operators of multiplication.

Definition 16 A character χ: G→ T is a continuous homomorphism of an abelian topological group G to the group T of unimodular complex numbers under multiplications:

χ(x+y)=χ(x)χ(y).

Note, that a character is an eigenfunction for a shift operator T(a) with the eigenvalue χ(a). Furthermore, if a function f on G is an eigenfunction for all shift operators T(a), a∈ G then the collection of respective eigenvalues λ(a) is a homomorphism of G to ℂ and f(a)=α λ(a) for some α∈ℂ. Moreover, if T(a) act by isometries on the space containing f(a) then λ(a) is a homomorphism to T.

Lemma 17 The product of two characters of a group is again a character of the group. If χ is a character of G then χ⁻¹=χ is a character as well.

Proof. Let χ₁ and χ₂ be characters of G. Then:

χ₁(gh)χ₂(gh)	=	χ₁(g)χ₁(h)χ₂(g)χ₂(h)
	=	(χ₁(g)χ₂(g))(χ₁(h)χ₂(h))∈T.

□

Definition 18 The dual group Ĝ is collection of all characters of G with operation of multiplication.

The dual group becomes a topological group with the uniform convergence on compacts: for any compact subset K⊂ G and any ε>0 there is N∈ℕ such that | χ_n(x)−χ(x) |<ε for all x∈ K and n>N.

Exercise 19 Check that

The sequence f_n(x)=xⁿ does not converge uniformly on compacts if considered on [0,1]. However it does converges uniformly on compacts if considered on (0,1).
If X is a compact set then the topology of uniform convergence on compacts and the topology uniform convergence on X coincide.

Example 20 If G=ℤ then any character χ is defined by its values χ(1) since

χ(n)=[χ(1)]ⁿ. (84)

Since χ(1) can be any number on T we see that $ℤ^_$ is parametrised by T.

Theorem 21 The group $ℤ^_$ is isomorphic to T.

Proof. The correspondence from the above example is a group homomorphism. Indeed if χ_z is the character with χ_z(1)=z, then χ_z₁χ_z₂=χ_{z₁ z₂}. Since ℤ is discrete, every compact consists of a finite number of points, thus uniform convergence on compacts means point-wise convergence. The equation (84) shows that χ_{z_n}→ χ_z if and only if χ_{z_n}(1)→ χ_z(1), that is z_n→ z. □

Theorem 22 The group $T^_$ is isomorphic to ℤ.

Proof. For every n∈ℤ define a character of T by the identity

χ_n(z)=zⁿ, z∈T. (85)

We will show that these are the only characters in Cor. 26. The isomorphism property is easy to establish. The topological isomorphism follows from discreteness of $T^_$. Indeed due to compactness of T for n≠ m:

max

z∈T

⎪
⎪

χ_n(z)−χ_m(z)

⎪
⎪

²=

max

z∈T

⎪
⎪

1−ℜ z^m−n

⎪
⎪

²=2²=4.

Thus, any convergent sequence (n_k) have to be constant for sufficiently large k, that corresponds to a discrete topology on ℤ. □

The two last Theorem are an illustration to the following general statement.

Principle 23 (Pontryagin’s duality) For any locally compact commutative topological group G the natural map G→ Ĝ, such that it maps g∈ G to a character f_g on Ĝ by the formula:

f_g(χ)=χ(g), χ∈Ĝ, (86)

is an isomorphism of topological groups.

Remark 24

The principle is not true for commutative group which are not locally compact.
Note the similarity with an embedding of a vector space into the second dual.

In particular, the Pontryagin’s duality tells that the collection of all characters contains enough information to rebuild the initial group.

Theorem 25 The group $ℝ^_$ is isomorphic to ℝ.

Proof. For λ∈ℝ define a character χ_λ∈$ℝ^_$ by the identity

χ_λ(x)=e^{2π i λ x}, x∈ℝ. (87)

Moreover any smooth character of the group G=(ℝ, +) has the form (87). Indeed, let χ be a smooth character of ℝ. Put c=χ′(t)|_t=0∈ ℂ. Then χ′(t)=cχ(t) and χ(t)=e^ct. We also get c∈ iℝ and any such c defines a character. Then the multiplication of characters is: χ₁(t)χ₂(t)=e^c₁te^c₂t=e^(c₂+c₁)t. So we have a group isomorphism.

For a generic character we can apply first the smoothing technique and reduce to the above case.

Let us show topological homeomorphism. If λ_n→ λ then χ_{λ_n}→ χ_λ uniformly on any compact in ℝ from the explicit formula of the character. Reverse, let χ_{λ_n}→ χ_λ uniformly on any interval. Then χ_{λ_n−λ}(x)→ 1 uniformly on any compact, in particular, on [0,1]. But

sup

[0,1]

⎪
⎪

χ_{λ_n−λ}(x)− 1

⎪
⎪

sup

[0,1]

⎪
⎪

sinπ (λ_n−λ)x

⎪
⎪

⎧
⎪
⎨
⎪
⎩

⎪
⎪

λ_n−λ

⎪
⎪

≥ 1/2,

sinπ

⎪
⎪

λ_n−λ

⎪
⎪

λ_n−λ

⎪
⎪

≤ 1/2.

Thus λ_n→ λ. □

Corollary 26 Any character of the group T has the form (85).

Proof. Let χ∈$T^_$, consider χ₁(t)=χ(e^{2π
i t}) which is a character of ℝ. Thus χ₁(t)=e^{2π i λ t} for some λ∈ℝ. Since χ₁(1)=1 then λ=n∈ℤ. Thus χ₁(t)=e^{2π i n
t}, that is χ(z)=zⁿ for z=e^{2π i t}. □

Remark 27 Although $ℝ^_$ is isomorphic to ℝ there is no a canonical form for this isomorphism (unlike for ℝ→ $ℝ^_$). Our choice is convenient for the Poisson formula below, however some other popular definitions are λ→ e^{i λ x} or λ→ e^{−i λ x}.

We can unify the previous three Theorem into the following statement.

Theorem 28 Let G=ℝⁿ× ℤ^k× T^l be the direct product of groups. Then the dual group is Ĝ=ℝⁿ× T^k× ℤ^l.

15.3 Fourier Transform on Commutative Groups

Definition 29 Let G be a locally compact commutative group with an invariant measure µ. For any f∈L₁(G) define the Fourier transform f by

f(χ)=

∫

f(x) χ(x) d µ(x), χ∈Ĝ. (88)

That is the Fourier transform f is a function on the dual group Ĝ.

Example 30

If G=ℤ, then f∈L₁(Z) is a two-sided summable sequence (c_n)_n∈ℤ. Its Fourier transform is the function f(z)=∑_n=−∞^∞c_n zⁿ on T. Sometimes f(z) is called generating function of the sequence (c_n).
If G=T, then the Fourier transform of f∈L₁(T) is its Fourier coefficients , see Section 5.1.
If G=ℝ, the Fourier transform is also the function on ℝ given by the Fourier integral :
f(λ)= ∫

ℝ

f(x)  e^{−2πiλ x} d x. (89)

The important properties of the Fourier transform are captured in the following statement.

Theorem 31 Let G be a locally compact commutative group with an invariant measure µ. The Fourier transform maps functions from L₁(G) to continuous bounded functions on Ĝ. Moreover, a convolution is transformed to point-wise multiplication:

(f₁*f₂)^ (χ)=f₁(χ)·f₂(χ), (90)

a shift operator T(a), a∈ G is transformed in multiplication by the character f_a∈Ĝ:

(T(a)f)^ (χ)=f_a(χ)·f(χ), f_a(χ)=χ(a) (91)

and multiplication by a character χ∈Ĝ is transformed to the shift T(χ⁻¹):

(χ· f)^ (χ₁)=T(χ⁻¹)f(χ₁)=f(χ⁻¹χ₁). (92)

Proof. Let f∈L₁(G). For any ε>0 there is a compact K⊂ G such that ∫_{G∖ K} | f | d µ<ε. If χ_n→ χ in Ĝ, then we have the uniform convergence of χ_n→ χ on K, so there is n(ε) such that for k>n(ε) we have | χ_k(x)−χ(x) | <ε for all x∈ K. Then

⎪
⎪

f(χ_n)−f(χ)

⎪
⎪

≤

∫

⎪
⎪

f(x)

⎪
⎪

χ_n(x)− χ(x)

⎪
⎪

 d µ(x)+

∫

G∖ K

⎪
⎪

f(x)

⎪
⎪

χ_n(x)− χ(x)

⎪
⎪

 d µ(x)

≤

⎪⎪
⎪⎪

+2ε.

Thus f is continuous. Its boundedness follows from the integral estimations. Algebraic maps (90)–(92) can be obtained by changes of variables under integration. For example, using Fubini’s Thm. 50 and invariance of the measure:

(f₁*f₂)^ (χ )

∫

f₁(s) f₂(t−s) d s χ(t) d t

∫

f₁(s) χ(s) f₂(t−s) χ(t−s) d s d t

f₁(χ)f₂(χ).

□

15.4 The Schwartz space of smooth rapidly decreasing functions

We say that a function f is rapidly decreasing if lim_{x→ ±∞} | x^kf(x) |=0 for any k∈ℕ.

Definition 32 The Schwartz space denoted by S or space of rapidly decreasing functions on Rn is the space of infinitely differentiable functions such that:

S =

⎧
⎪
⎨
⎪
⎩

f∈ C^∞(ℝ):

sup

x∈ ℝ

⎪
⎪

x^αf^(β)(x)

⎪
⎪

<∞ ∀ α ,β ∈ ℕ

⎫
⎪
⎬
⎪
⎭

. (93)

Example 33 An example of a rapidly decreasing function is the Gaussian e^{−π x²}.

It is worth to notice that S⊂ L_p(ℝ) for any 1<p<∞. Moreover, S is dense in L_p(ℝ), for p=1 this can be shown in the following steps (other values of p can be done similarly but require some more care). First we will show that S is an ideal of the convolution algebra L₁(ℝ).

Exercise 34 For any g∈ S and f ∈ L₁(ℝ) with compact support their convolution f*g belongs to S. [Hint: smoothness follows from Ex. 15.]

Define the family of functions g_t(x) for t>0 in S by scaling the Gaussian:

g_t(x) =

e^{−π (x/t)²}.

Exercise 35 Show that g_t(x) satisfies the following properties, cf. Lem 7:

g_t(x)>0 for all x∈ℝ and t>0.
∫_ℝ g_t(x) d x=1 for all t>0. [Hint: use the table integral ∫_ℝ e^−π
x² d x=1.]
For any ε>0 and any δ>0 there exists T>0 such that for all positive t< T we have:
0<
−δ

∫

−∞

+
∞

∫

δ

g_t(x) d x < ε.

It is easy to see, that the above properties 35(1)–35(3) are not unique to the Gaussian and a wide class have them. Such a family a family of functions is known as approximation of the identity [9] due to the next property (94).

Exercise 36

Let f be a continuous function with compact support, then

lim

t→ 0

⎪⎪
⎪⎪ f−g_t*f ⎪⎪
⎪⎪ ₁=0 . (94)

[Hint: use the proof of Thm. 8.]
The Schwartz space S is dense in L₁(ℝ). [Hint: use Prop. 20, Ex. 34 and (94).]

15.5 Fourier Integral

We recall the formula (89):

Definition 37 We define the Fourier integral of a function f∈ L₁(ℝ) by

f(λ)=

∫

ℝ

f(x) e^{−2π i λ
x} d x. (95)

We already know that f is a bounded continuous function on ℝ, a further property is:

Lemma 38 If a sequence of functions (f_n)⊂L₁(ℝ) converges in the metric L₁(ℝ), then the sequence (f_n) converges uniformly on the real line.

Proof. This follows from the estimation:

⎪
⎪

f_n(λ)−f_m(λ)

⎪
⎪

≤

∫

ℝ

⎪
⎪

f_n(x)−f_m(x)

⎪
⎪

 d x.

□

Lemma 39 The Fourier integral f of f∈L₁(ℝ) has zero limits at −∞ and +∞.

Proof. Take f the indicator function of [a,b]. Then f(λ )=1/−2π i λ (e^{−2π i a} −e^{−2π i
b}), λ≠ 0. Thus lim_λ→
±∞ f(λ)=0. By continuity from the previous Lemma this can be extended to the closure of step functions , which is the space L₁(ℝ) by Lem. 17. □

Lemma 40 If f is absolutely continuous on every interval and f′∈L₁(ℝ), then

(f′)^=2πi λ f.

More generally:

(f^(k))^=(2πi λ)^kf. (96)

Proof. A direct demonstration is based on integration by parts, which is possible because assumption in the Lemma.

It may be also interesting to mention that the operation of differentiation D can be expressed through the shift operatot T_a:

lim

Δ t → 0

T_Δ t− I

Δ t

. (97)

By the formula (91), the Fourier integral transforms 1/Δ t(T_{Δ t}− I) into 1/Δ t(χ_λ(Δ t)− 1). Providing we can justify that the Fourier integral commutes with the limit, the last operation is multiplication by χ′_λ(0)=2πi λ. □

Corollary 41 If f^(k)∈L₁(ℝ) then

⎪
⎪

(f^(k))^

⎪
⎪

2π λ

⎪
⎪

→ 0 as λ → ∞,

that is f decrease at infinity faster than | λ |^−k.

Lemma 42 Let f(x) and xf(x) are both in L₁(ℝ), then f is differentiable and

f′=(−2 π i xf)^.

More generally

f^(k)=((−2π i x)^k f)^. (98)

Proof. There are several strategies to prove this results, all having their own merits:

The most straightforward uses the differentiation under the integration sign.
We can use the intertwining property (92) of the Fourier integral and the connection of derivative with shifts (97).
Using the inverse Fourier integral (see below), we regard this Lemma as the dual to the Lemma 40.

□

Corollary 43 The Fourier transform of a smooth rapidly decreasing function is a smooth rapidly decreasing function.

Corollary 44 The Fourier integral of the Gaussian e^{−π x²} is e^{−π λ ²}.

Proof.[12] Note that the Gaussian g(x)=e^{−π x²} is a unique (up to a factor) solution of the equation g′+2π x g=0. Then, by Lemmas 40 and 42, its Fourier transform shall satisfy to the equation 2πi λ ĝ+ iĝ′=0. Thus, ĝ=c· e^{−π λ
²} with a constant factor c, its value 1 can be found from the classical integral ∫_ℝ e^−π
x² d x=1 which represents ĝ(0). □

The relation (96) and (98) allows to reduce many partial differential equations to algebraic one, see § 0.2 and 5.4. To convert solutions of algebraic equations into required differential equations we need the inverse of the Fourier transform.

Definition 45 We define the inverse Fourier transform on L₁(ℝ):

f(λ)=

∫

ℝ

f(x) e^{2π i λ x} d x. (99)

We can notice the formal correspondence f(λ)=f(−λ)=f(λ), which is a manifestation of the group duality $ℝ^_$=ℝ for the real line. This immediately generates analogous results from Lem. 38 to Cor. 44 for the inverse Fourier transform.

Theorem 46 The Fourier integral and the inverse Fourier transform are inverse maps. That is, if g=f then f=ǧ.

Proof.[Sketch of a proof] The exact meaning of the statement depends from the spaces which we consider as the domain and the range. Various variants and their proofs can be found in the literature. For example, in [5, § IV.2.3], it is proven for the Schwartz space S of smooth rapidly decreasing functions.

The outline of the proof is as follows. Using the intertwining relations (96) and (98), we conclude the composition of Fourier integral and the inverse Fourier transform commutes both with operator of multiplication by x and differentiation. Then we need a result, that any operator commuting with multiplication by x is an operator of multiplication by a function f. For this function, the commutation with differentiation implies f′=0, that is f=const. The value of this constant can be evaluated by a Fourier transform on a single function, say the Gaussian e^−π
x² from Cor. 44. □

The above Theorem states that the Fourier integral is an invertible map. For the Hilbert space L₂(ℝ) we can show a stronger property—its unitarity.

Theorem 47 (Plancherel identity) The Fourier transform extends uniquely to a unitary map L₂(ℝ)→ L₂(ℝ):

∫

ℝ

⎪
⎪

² d x=

∫

ℝ

⎪
⎪

² d λ. (100)

Proof. The proof will be done in three steps: first we establish the identity for smooth rapidly decreasing functions, then for L₂ functions with compact support and finally for any L₂ function.

Take f₁ and f₂∈S be smooth rapidly decreasing functions and g₁ and g₂ be their Fourier transform. Then (using Fubini’s Thm. 50):

∫

ℝ

f₁(t) f₂(t) d t

∫

ℝ

∫

ℝ

g₁(λ )  e^{2π i
λ t} d λ  f₂(t) d t

∫

ℝ

g₁(λ )

∫

ℝ

  e^{2π i
λ t}  f₂(t) d t d λ

∫

ℝ

g₁(λ )  ḡ₂(λ ) d λ

Put f₁=f₂=f (and therefore g₁=g₂=f) we get the identity ∫| f |² d x=∫| f |² d λ.

The same identity (100) can be obtained from the property (f₁f₂)^=f₁*f₂, cf. (90), or explicitly:

∫

ℝ

f₁(x) f₂(x) e^{−2π iλ x} d x=

∫

ℝ

f₁(t) f₂(λ−t) d t.

Now, substitute λ=0 and f₂=f₁ (with its corollary f₂(t)=f₁(−t)) and obtain (100).

Next let f∈L₂(ℝ) with a support in (−a,a) then f∈L₁(ℝ) as well, thus the Fourier transform is well-defined. Let f_n∈S be a sequence with support on (−a,a) which converges to f in L₂ and thus in L₁. The Fourier transform g_n converges to g uniformly and is a Cauchy sequence in L₂ due to the above identity. Thus g_n→ g in L₂ and we can extend the Plancherel identity by continuity to L₂ functions with compact support.
The final bit is done for a general f∈L₂ the sequence
f_n(x)= ⎧
⎪
⎨
⎪
⎩
f(x),
if ⎪
⎪ x ⎪
⎪ <n,

0, otherwise;

of truncations to the interval (−n,n). For f_n the Plancherel identity is established above, and f_n→ f in L₂(ℝ). We also build their Fourier images g_n and see that this is a Cauchy sequence in L₂(ℝ), so g_n→ g.

If f∈L₁∩L₂ then the above g coincides with the ordinary Fourier transform on L₁. □

We note that Plancherel identity and the Parseval’s identity (30) are cousins—they both states that the Fourier transform L₂(G)→ L₂(Ĝ) is an isometry for G=ℝ and G=T respectively. They may be combined to state the unitarity of the Fourier transform on L₂(G) for the group G=ℝⁿ× ℤ^k× T^l cf. Thm. 28.

Proofs of the following statements are not examinable Thms. 23, 36, 53, 33, 46, Props. 14, 20.

16 Advances of Metric Spaces

16.1 The Stone–Weierstrass Theorem

Density in metric spaces is an important concept since it allows to approximate any element by an element from the dense set. Furthermore, we can extend a uniformly continuous function from a dense subset by continuity, see Ex. 62. Thus, it is convenient to have some supply of manageable dense subsets of common metric spaces.

A famous case of density is the Theorem of Stone–Weierstrass, which in the original and most known form says that any continuous function on a compact interval can be uniformly approximated by a sequence of polynomials. Polynomials have many nice properties which make this dense subset particularly useful: easy computation, derivation and integration, etc. Yet, we will prove here a more general version of the Stone–Weierstrass theorem that applies to general compact metric spaces.

Theorem 1 (Stone–Weierstrass) Suppose that X is a compact metric space and let C(X,ℝ) be the Banach space of real valued continuous functions on X with norm || · ||_∞. Suppose that A ⊂ C(X,ℝ) is a unital subalgebra of C(X,ℝ), i.e.

A is a linear subspace,
1 ∈ A,
A · A ⊂ A, or in other words f,g ∈ A implies that also f · g ∈ A.

Suppose furthermore that A separates points, i.e. for any two x,y ∈ X with x ≠ y there exists a function f ∈ A such that f(x) ≠ f(y). Then, A is dense in C(X,ℝ).

This is an interestingly sounding theorem: it states that a subset of C(X,ℝ) which is closed under algebraic operations and separates points automatically has topological property—it is dense. Its consequences are striking. Before we prove this theorem let’s look at some of them.

Corollary 2 [Weierstrass approximation theorem] The space of polynomials ℝ[x] is dense in C([a,b],ℝ) for any compact interval [a,b] in the in the || · ||_∞ norm.

In other words, any continuous function can be approximated with arbitrary accuracy by a polynomial.

Corollary 3 The space of polynomials ℝ[x₁,…,x_n] is dense in C(K,ℝ) for any compact subset K of ℝⁿ in the || · ||_∞ norm.

This is the higher dimensional version of the above theorem and states that a continuous functions of n-variables can be approximated by polynomials in n variables.

Corollary 4 Let C(S¹,ℝ) be the space of continuous functions on the unit circle, or, equivalently, the space of 2 π-periodic real valued functions on ℝ. Then the finite linear span of the set

∪

m ∈ ℕ

{1, sin(m x), cos(m x)}

is dense in C(S¹,ℝ).

The Stone–Weierstrass theorem is actually a consequence of the following theorem by Stone. This is a good illustration to inventor’s paradox stated by Polya [13].

Here is some notation first for two functions f and g:

	f ∧ g = min{f,g},
	f ∨ g = max{f,g}

Note that of f and g are continuous, then so are f ∧ g and f ∨ g (demonstrate this!).

Theorem 5 (Stone’s Theorem) Let X be a compact metric space and suppose that there is a subset A of C(X,ℝ) such that

A is closed under the operations ∧ and ∨, this means f,g ∈ A implies f ∧ g ∈ A and f ∨ g ∈ A.
for any pair of points x ≠y and numbers a,b ∈ ℝ there is a function f ∈ A such that f(x)=a and f(y)=b.

Then, A is dense in C(X,ℝ) in the topology induced by the norm || · ||_∞ (the uniform topology).

Proof. We need to prove that any function g can be approximated by elements in A. For each two points x,y choose a function f_x,y ∈ A such that f_x,y(x)=g(x) and f_x,y(y)=g(y). Such a function exists by our hypothesis for every pair of points. Now, for an ε>0 the sets

O_x,y={z ∈ X ∣ f_x,y(z) < g(z) + ε}

are open and form a cover of X even if we fix x. This is because O_x,y contains both x and y together with some neighbourhoods of these points. Now find a finite subcover for each fixed x. That is there are finitely many points y₁, …, y_n such that O_{x,y_i} is an open cover. Now define the function

f _x= f_x,y₁ ∧ … ∧ f_{x,y_n}.

By hypothesis f_x is in A for any x ∈ X and it has the property that

f_x(z) < g(z) + ε

but now for all z. Moreover, f_x(x) = g(x). Again, the sets

O_x={z ∈ X ∣ f_x(z) > g(z) − ε}

make an open cover and therefore there is a finite subcover. This means there are finitely many points x₁,…,x_k such that O_{x_i} is an open cover of X. Now the function

f = f_x₁ ∨ … ∨ f_{x_k}

is in A and satisfies

g(x)−ε < f(x) < g(x) + ε

for all x, or in other words || f −g ||_∞ < ε. □

It may be not obvious why conditions of Stone’s theorem 5 are more general than in Thm. 1. This will be seen from the following proof. We employ what Polya called leading particular case [14, § 4.4]—we will show that the particular algebra of polynomials on [0,1] approximate the particular function √x and then reduce the general situation to it.

Proof.[Sketch of proof of the Stone Weierstrass theorem 1.] First we observe that if B is the closure in C(X,ℝ) of A from Thm. 1, then B will also be a unital point separating subalgebra of C(X,ℝ) (exercise!).

Step 1: If f is non-negative and in B, so is √f. To see this note that it is enough to show this for 0≤ f < 1 because in case f ≠0 we can compute √f by

√

|| f||_∞

√

2 || f ||_∞

Now the Taylor series ∑_k=0^∞a_n x^k for √1−x

√

1−x

∞

∑

k=0

a_k x^k = 1−

x−

x²−

x³−

128

x⁴−

256

x⁵−…

converges absolutely and uniformly on any interval [0,1−δ). Therefore, the series

∞

∑

k=0

a_k (1−f−δ)^k

converges in the Banach space B because all the partial sums are actually in B as B is a subalgebra. The limit of this sequence is, of course, √f+δ. If we let δ go to zero, we can see that also √f ∈ B. This works because

√

f(x) + δ

−

√

f(x)

| = δ (

√

f(x) + δ

√

f(x)

)⁻¹ ≤

√

so that the approximation is uniform.

Step 2: Since | f | = √f² we have that f ∈ B implies | f | ∈ B. Since moreover,

f ∧ g =

f + g

−

| f −g |

f ∨ g =

f + g

| f −g |

we conclude from this that B is closed under the operations ∧ and ∨.

Step 3: Assume that x ≠ y are points in X and assume that a,b are real numbers. Then, by assumption there is an element f in B such that

f(x) ≠ f(y).

Since B is a subspace that contains the constant functions, the function

g(z)

= a + (b−a)

f(x)−f(z)

f(x)−f(y)

b f(x)−af(y)

f(x)−f(y)

−

b−a

f(x)−f(y)

f(z)

is also in B and it satisfies g(x)=a and g(y)=b.

Final Step: As we can see all the conditions of Stone’s theorem are satisfied and therefore B is dense in C(X,ℝ). Since B is closed in C(X,ℝ) this means that B=C(X,ℝ). Thus, A is dense in C(X,ℝ). □

We can extend the result from real scalars to complex one through identities ℜ z = 1/2(z +z) and ℑ z = 1/2(z −z).

Corollary 6 [Stone–Weierstrass (complex version)] Suppose that X is a compact metric space and let C(X,ℂ) be the complex Banach space of complex valued continuous functions on X with norm || · ||_∞. Suppose that A ⊂ C(X,ℂ) is a unital *-subalgebra of C(X,ℂ), i.e.

A is a linear subspace,
1 ∈ A,
A · A ⊂ A, or in other words f,g ∈ A implies that also f · g ∈ A.
if f ∈ A the also f ∈ A.

Suppose furthermore that A separates points, i.e. for any two x,y ∈ X with x ≠ y there exists a function f ∈ A such that f(x) ≠ f(y). Then, A is dense in C(X,ℂ).

Using this complex version of the theorem (or simply the Euler identity e^{i φ} = cosφ +i sinφ) we obtain the complex version of Cor. 4:

Corollary 7 The linear span of the set { e^{i m ϕ} ∣ m ∈ ℤ} is dense in C(S¹,ℂ).

Note that we need both positive and negative values of m in e^{i m ϕ}, the set { e^{i m ϕ} ∣ m ∈ ℕ₀} is not dense in C(S¹,ℂ).

The Stone–Weierstrass also says something about the separability of certain Banach spaces. Remember what it means for a topological space to be separable.

Definition 8 (Separable Metric Space) A metric space X is called separable if there exists a countable dense subset of X.

In other words, a separable metric space consists of accumulation points of a single sequence.

Suppose K ⊂ ℝⁿ is compact. Then the space of polynomials with real coefficients and n-variables is dense in the space of continuous functions C(K). Of course every polynomial with real coefficients may be approximated by one with rational coefficients. Thus the set of rational polynomials ℚ[x₁,…,x_n] is dense in C(K). However, the space of rational polynomials is a countable set. In this way one obtains

Corollary 9 Let K be a compact subset of ℝⁿ then the Banach space C(K) is separable.

The following statement shows that continuous functions make only a tiny fraction of all bounded functions:

Exercise 10 Let X be an infinite set, show that the space B(X) of bounded functions on X is not separable. (Hint: present a set of disjoint balls of radius 1/2 parametrised by all real numbers.)

16.2 Contraction mappings and fixed point theorems

16.2.1 The Banach fixed point theorem

An important tool in numerical Analysis, but also in constructions of solutions of differential equations are fixed point approximations. In order to understand this, suppose that (X,d) is a metric space and f: X → X a self-map. Then a point x ∈ X is called fixed point of f if f(x) = x. For example the function cos defines a self-map on the interval [0,1], and by starting with x₁=0 and inductively computing x_n+1 = cosx_n one converges to the value roughly 0.739085 which is a fixed point of cos, i.e. solves the equation cos(x) = x. Under certain conditions one can show that such sequences always converge to a fixed point. This is the statement of the Banach fixed point theorem (contraction mapping principle).

Definition 11 (Contraction Mapping) Let (X,d) be a metric space. Then a map f: X → X is called contraction if there exists a constant C<1 such that

d(f(x),f(y)) ≤ C d(x,y).

Note that any contraction is (uniformly) continuous.

Theorem 12 (Banach Fixed Point Theorem) Suppose that f: X → X is a contraction on a complete metric space (X,d). Then f has a unique fixed point y. Moreover, for any x ∈ X the sequence (x_n) defined recursively by

x_n+1 = f(x_n), x₁ = x,

converges to y.

Proof. Let us start with uniqueness. If x,y are both fixed points in X, then since f is a contraction:

d(x,y) ≤ C d(x,y)

for some constant C<1. Hence, d(x,y)=0 and therefore x=y.

To prove the remaining claims we start with any x in X and we will show that the sequence x_n defined by x₁=x and x_n+1=f(x_n) converges. Since f is continuous the limit of (x_n) must be a fixed point. Since (X,d) is complete we only need to show that (x_n) is Cauchy. To see this note that

d(x_n+1,x_n) ≤ C d(x_n,x_n−1)

and therefore inductively,

d(x_n+1,x_n) ≤ Cⁿ⁻¹ d(x₂,x₁).

By the triangle inequality we have for any n,m>0

d(x_N+m,x_N) ≤ (C^N−1 + C^N + … C^N+m−2) d(x₂,x₁) ≤ C^N−1

1−C

d(x₂,x₁) .

Since C<1 this can be made arbitrarily small by choosing N large enough. □

Corollary 13 Suppose that (X,d) is a complete metric space and f: X → X a map such that fⁿ is a contraction for some n ∈ ℕ. Then f has a unique fixed point.

Proof. Since fⁿ is a contraction it has a unique fixed point x ∈ X, i.e.


f ∘ f … ∘ f
◥	▼	◤

n−times

(x) =x.

Now note that

fⁿ(f(x)) = fⁿ ∘ f (x) = fⁿ⁺¹(x) = f∘ fⁿ(x) = f(fⁿ(x))=f(x)

and therefore f(x) is also a fixed point of fⁿ. By uniqueness we must have f(x)=x. □

The question arises how to show that a given map f is a contraction. In subsets of ℝ^m there is a simple criterion. Recall that an open set U ⊂ ℝ is called convex if for any two points x,y ∈ U the line { t x + (1−t) y ∣ t ∈ [0,1] } is contained in U.

Theorem 14 (Mean Value Inequality) Suppose that U ⊂ ℝ^m is an open set with convex closure U and let f: U → ℝ^m be a C¹-function. Let d f be the total derivative (or Jacobian) understood as a function on U with values in m × m-matrices. Suppose that || df(x) || ≤ M for all x ∈ U. Then f: U → ℝ^m satisfies

|| f(x) −f(y) || ≤ M || x − y ||

for all x,y ∈ U.

Proof. Given x,y ∈ U let γ(t) = t x + (1−t )y. Then d/dt γ(t) = x −y.

f(x) − f(y) =

∫

d t

f(γ(t))   d t =

∫

(d f) ·

d γ

d t

(t)   d t.

Using the triangle inequality (this can be used for Riemann integrals too because these are limits of finite sums), one gets

|| f(x) −f (y) || ≤

∫

|| (d f) ·

d γ

d t

(t) ||   d t ≤ M

∫

|| x − y ||   d t = M || x−y||.

By continuity this inequality extends to U. □

Example 15 Consider the map f: ℝ² ⊃ B₁(0) → B₁(0), (x,y) ↦ (x²/4+y/3+1/3,y²/4−x/2). Then

df =

⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝

−

⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠

The operator norm || df|| can be estimated by the Hilbert–Schmidt norm. Recall || A ||_HS = (tr(A^* A))^1/2, so we get

|| df|| ≤ || df||_HS = (

(x²+y²) +

)^1/2 <1.

Therefore f is a contraction. We can find the fixed point by starting, for example, with the point (0,0) and iterating. We get iterations:

	(0,0), (0.333333, 0.), (0.361111, −0.166667),
	(0.310378, −0.173611), (0.299547, −0.147654),
	(0.306547, −0.144323), (0.308719, −0.148066),
	(0.307805, −0.148878), (0.307393, −0.148361),
	(0.307502, −0.148194), (0.307575, −0.148261),
	(0.307564, −0.148292), (0.307551, −0.148284),
	(0.307552, −0.148279), (0.307554, −0.148279).

Example 16 Put a map of the country of your current presence on the floor, there’s a point on the map that is touching the actual point it refers to!

16.2.2 Applications of fixed point theory: The Picard-Lindelöf Theorem

Let f: K → ℝ be a function on a compact rectangle of the form K=[T₁,T₂] × [L₁,L₂] in ℝ². Consider the initial value problem (IVP)

= f(t,y), y(t₀) = y₀, (101)

where y: [T₁,T₂] → ℝ, t ↦ y(t) is a function. The function f and the initial value y₀ ∈ [L₁,L₂], and t₀ ∈ [T₁,T₂] are given and we are looking for a function y satisfying the above equations.

Figure 19: Vector fields and their integral curves from Ex. 17–20.

Example 17 Let f(t,x)=x and y₀ =1, t₀=0. Then the initial value problem is

= y, y(0) =1.

We know from other courses that there is a unique solution y(t) = e^t, see Fig. 19 top-left.

Example 18 Let f(t,x)=x² and y₀ =1, t₀=0. Then the initial value problem is

= y², y(0) =1.

We know from other courses that there is a unique solution y(t) = 1/1−t which exists only on the interval (−∞,1), see Fig. 19 top-right.

Example 19 Let f(t,x)=x²−t and y₀ =1, t₀=0. Then the initial value problem is

= y²−t, y(0) =1.

One can show that there exists a solution for small |t|, however this solution cannot be expressed in terms of elementary functions, see Fig. 19 bottom-left.

Example 20 Let f(t,x)=x^2/3 and y₀ =0, t₀=0. Then the initial value problem is

= y

, y(0) =0.

It has at least two solutions, namely y=0 and y=t³/27, see Fig. 19 bottom-right.

Hence, there are two fundamental questions here: existence and uniqueness of solutions. The following theorem is one of the basic results in the theorem of ordinary differential equation and establishes existence and uniqueness under rather general assumptions.

Theorem 21 (Picard–Lindelöf theorem) Suppose that f: [T₁,T₂] × [y₀−C,y₀+C] → ℝ is a continuous function such that for some M>0 we have

|f(t,y₁) − f(t,y₂)| ≤ M | y₁−y₂| (Lipschitz condition)

for all t ∈ [T₁,T₂], y₁,y₂ ∈ [y₀−C,y₀+C]. Then, for any t₀ ∈ [T₁,T₂] the initial value problem

(t) = f(t,y(t)), y(t₀) = y₀,

has a unique solution y in C¹[a,b], where [a,b] is the interval [t₀−R, t₀+R] ∩ [T₁,T₂], where

R= ||f||_∞⁻¹ C.

(The solution exists for all times t such that |t−t₀| ≤ R).

Remark 22 Note, that the Lipschitz condition implies uniform continuity and is significantly stronger requirement.

Proof. Using the Fundamental Theorem of Calculus we can write the IVP as a fixed point equation F(y) = y for a map defined by

F(y) (t) = y₀ +

∫

t₀

f(s,y(s))   d s.

This is a map that will send a continuous function y ∈ C[T₁,T₂] to a continuous function F(y) ∈ C[T₁,T₂]. As a metric space we take

X = C([a,b], [y₀ −C, y₀ +C])

that is, the set of continuous functions on [a,b] taking values in the interval [y₀ −C, y₀ +C]. This is a closed (why?) subset of the Banach space C[a,b] and is therefore a complete metric space.

First, we show that F: X → X, i.e. F maps X to itself. Indeed,

| F(y)(t) − y₀ | =

⎪
⎪
⎪
⎪

∫

t₀

f(s,y(s))   d s

⎪
⎪
⎪
⎪

≤ R || f ||_∞≤ C.

Next, we show that F^N is a contraction for N large enough and thus establish the existence of a unique fixed point. It is the place to use the Lipschitz condition. Observe that for two functions y, y′ ∈ X we have

| F(y)(t) − F(y′)(t) |

⎪
⎪
⎪
⎪

∫

t₀

f(s,y(s)) − f(s,y′(s))   d s

⎪
⎪
⎪
⎪

≤

∫

t₀

| f(s,y(s)) − f(s,y′(s)) |   d s ≤ |t−t₀| M || y − y′ ||_∞.

(102)

We did not assume that (t−t₀) M ≤ RM <1, so F will in general not be a contraction. There are several ways to resolve this situations. For example, we can argue in either of the following two manners:

We use both the result and the method from (102) to compute distances for higher powers of F, starting from the squares:

| F²(y)(t) − F²(y′)(t) |

≤

∫

t₀

| f(s,F(y)(s)) − f(s,F(y′)(s)) |  d s

≤

∫

t₀

|s−t₀| · M · || F(y) − F(y′)||_∞  d s

≤

∫

t₀

|s−t₀| · M² · || y − y′ ||_∞  d s

|t−t₀|²

M² || y − y′ ||_∞,

and iterating this gives for any natural N:

|| F^N(y) − F^N(y′) ||_∞≤

|t−t₀|^N

N !

M^N || y − y′ ||_∞.

Since the factorial will overgrow the respective power, for N large enough, F^N is a contraction and we deduce the existence of a unique solution from Cor. 13. This solution is in C¹ since it can be written as the integral of a continuous function.

The inequality (102) shows existence and uniqueness of a solution only in the space of functions C([t₀−r,t₀+r], [y₀−C,y₀+C]) where r < M⁻¹ and therefore |t−t₀| M< 1 in (102). Now suppose we have two solutions y and y′. They coincide at t₀. Application of (102) to other initial points where the solutions coincide shows that the set E={ x ∈ [a,b] ∣ y(x) = y′(x)} is open. It is also the pre-image of the closed set {0} under the continuous map y−y′. So we have that E is a closed and open subset of [a,b] that is non-empty. It must therefore be [a,b]. Hence, we get y = y′, establishing uniqueness in the whole C[a,b].

□

Note that this not only gives uniqueness and existence, but also gives a constructive method to compute the solution by iterating the map F starting for example with the constant function y(t)=y₀. The iteration

y_n+1(t) = y₀ +

∫

t₀

f(s,y_n(s))   d s

is called Picard iteration. It will converge to the solution uniformly. See Fig. 20 for an illustration of few first iterations for the exponent functions.

Figure 20: Few initial Picard iterations for the differential equation y′=y: constant f₀, linear f₁, quadratic f₂, etc.

Remark 23 The proof also gives a bound on the solution, namely if the assumptions are satisfied one gets | y(t) − y₀ | ≤ C for t ∈ [a,b].

Remark 24 The proof works in the same way if y takes values in ℝ^m and therefore f : ℝ × ℝ^m ⊃ [T₁,T₂] × B_C(0) → ℝ^m. In fact, the target space may even be a Banach space (the derivative for Banach space-valued functions appropriately defined). Higher order differential equations may be written as systems of first order equations and hence the theorem applies to these as well. For example y″(t) + y(t) =0, y(0) = 1, y′(0) = 0 can be written as

⎛
⎜
⎝

⎞
⎟
⎠

⎛
⎜
⎝

−y

⎞
⎟
⎠

⎛
⎜
⎝

⎞
⎟
⎠

(0) =

⎛
⎜
⎝

⎞
⎟
⎠

So here the function f is f(t,(x₁,x₂)) = (x₂, −x₁).

Example 25 Consider the IVP

= y² t +1, y(0) = 1.

Hence, f(t,x) = x²t +1. If we take f to be defined on the square [−T,T] × [1−C,1+C] then we obtain ||f||_∞= (1+C)² T +1 (the value at the top-right corner). In this case the solution will exist up to time

min

⎧
⎪
⎨
⎪
⎩

(1+C)² T +1

⎫
⎪
⎬
⎪
⎭

If we choose, for example C=2 and T=1/2 we get that a unique solution exists up to time | t | ≤ 4/11. This solution will then satisfy | y(t) −1 | ≤ 2 for | t | ≤ 4/11.

In fact one can show that the solution can be expressed in a complicated way in terms of the Airy-Bi-function and it blows up at t=1.

16.2.3 Applications of fixed point theory: Inverse and Implicit Function Theorems

It is an easy exercise in Analysis to show that if a function f ∈ C¹[a,b] has nowhere vanishing derivative, then f is invertible on its image. To be more precise, f⁻¹: Im(f) → [a,b] exists and has derivative (f′(x))⁻¹ at the point y=f(x). In higher dimensions a statement like this can not be correct as the following counterexample shows. Let 0<a<b and define

	f: [a,b] × ℝ → ℝ²,
	(r,θ) ↦ (r cosθ, r sinθ).

This maps has invertible derivative

f′(r,θ) =

⎛
⎜
⎝

cosθ	−r sinθ
sinθ	r cosθ.

⎞
⎟
⎠

, detf′(r,θ) = r >0.

at any point, the map is however not injective, see Fig. 21 for a cartoon illustration of the difference between one- and two-dimensional cases. However, for any point we can restrict domain and co-domain, so that the restriction of the function is invertible. In such a case we say that f is locally invertible. This concept will be explained in more detail below.

Figure 21: Flat and spiral staircases: can we return to the same value going just in one way?

Definition 26 (Local Invertibility) Suppose U₁, U₂ ⊂ ℝ^m are open subsets of ℝ^m. Then a map f: U₁ → U₂ is called locally invertible at x ∈ U₁ if there exists an open neighbourhood U of x such that f |_U : U → f(U) is invertible. The function f is said to be locally invertible it it is locally invertible at x for any x ∈ U₁.

Often, say for differential equations, we need a map which preserves differentiability of functions in both directions.

Definition 27 (Diffeomorphism) Suppose U₁, U₂ ⊂ ℝ^m are open subsets of ℝ^m. Then a map f: U₁ → U₂ is called C^k-diffeomorphism if f ∈ C^k(U₁,U₂) and if there exists a g ∈ C^k(U₂,U₁) such that

f ∘ g = 1_U₂, g ∘ f = 1_U₁,

where 1_U₁ and 1_U₂ are the identity maps on U₁ and U₂ respectively.

There is also a local version of the above definition.

Definition 28 (Local Diffeomorphism) Suppose U₁, U₂ ⊂ ℝ^m are open subsets of ℝ^m. Then a map f: U₁ → U₂ is called a local-C^k- diffeomorphism at x ∈ U₁ if there exists an open neighbourhood U of x such that f |_U: U → f(U) is a C^k-diffeomorphism. It is called a local-C^k- diffeomorphism if it is a local diffeomorphism at any point x ∈ U₁.

Not every invertible C^k-map is a diffeomorphism. An example is the function f(x) = x³ whose inverse g(x) = x^1/3 fails to be differentiable.

Theorem 29 (Inverse Function Theorem) Let U ⊂ ℝ^m be an open subset and suppose that f ∈ C^k(U,ℝ^m) such that f′(x) is invertible at every point x ∈ U. Then f is a local C^k-diffeomorphism.

Before we can prove this theorem we need a Lemma, which basically says that under the assumptions of the inverse function theorem an inverse function must be in C¹. That is, differentiability is the leading particular case [14, § 4.4] for the general case of k-differentiable functions.

Lemma 30 Suppose that f ∈ C¹(U₁,U₂) is bijective with continuous inverse. Assume that the derivative of f is invertible at any point, then f is a C¹-diffeomorphism, and g′(f(x)) = (f′(x))⁻¹.

Proof. Denote the inverse of f by g: U₂ → U₁. The continuity of f and g imply that x_n → x₀ if and only if f(x_n) → f(x₀). We will show that g is differentiable at the point y₀ = f(x₀). If y=f(x) is very close to y₀ (so that the line interval between x and x₀ is contained in U₁) then, by the MVT, cf. Thm. 14, there exists a ξ on this line such that y−y₀ = f(x) − f(x₀) = f′(ξ) · (x−x₀). Therefore, g(y)−g(y₀) = (f′(ξ))⁻¹ · (y−y₀). If y tends to y₀, then ξ will tend to x₀, and therefore, by continuity of f′ the value of (f′(ξ))⁻¹ will tend to (f′(x₀))⁻¹. Thus, the partial derivatives of g exist and are continuous, so g ∈ C¹. Note that we have used here that matrix inversion is continuous. □

Now we can proceed with the general situation.

Proof.[Proof of the Inverse Function Theorem 29] Let x₀ ∈ U and let y₀=f(x₀). We need to show that there exists an open neighborhood U₁ of f(x₀) such that f: f⁻¹(U₁) → U₁ is a C^k-diffeomorphism. As a first step we construct a continuous inverse. Since f′(x₀)=A is an invertible m × m-matrix we can change coordinates x = A⁻¹ y + x₀, so that we can assume without loss of generality that f′(x₀)= 1 and x₀=0. Replacing f by f−y₀ we also assume w.l.o.g. that y₀=0. Since f′(x) is continuous there exists an ε>0 such that || f′(x) − 1 || ≤ 1/2 for all x ∈ B_ε(0). This ε>0 can also be chosen such that B_ε(0) ⊂ U. Thus, || x−f(x) || ≤ 1/2|| x|| for all x ∈ B_ε(0) by MVT and for each y ∈ B_ε/2(0) the map

x ↦ x + y − f(x)

is a contraction on B_ε(0). Indeed, by MVT again:

||x + y − f(x) − (x′ + y − f(x′))||

= ||x − f(x) − (x′ − f(x′))||

= ||(f′(ξ) − 1) (x−x′)||

≤

||x−x′||,

(103)

where ||·|| is the norm of vectors in ℝ^m. Consider the complete metric space X=C(B_ε/2(0),B_ε(0)) and define the map

F: X → X, u ↦ F(u), F(u)(y) = u(y) + y − f(u(y)).

By the above this map is well defined and it also is a contraction

|| F(u)(y) −F(v)(y) ||

= || u(y) − f(u(y)) −

⎛
⎝

v(y) −f(v(y))

⎞
⎠

≤

|| u(y) − v(y) ||

[by (103)]

≤

|| u − v ||_∞.

Hence, there exists a unique fixed point g. This fixed point yields a continuous inverse g of f|_U defined on U =B_ε/2(0) ∩ f⁻¹(B_ε/2(0)). By the previous Lemma this implies that g is differentiable. Now simply note that g′ = (f′)⁻¹ ∘ g. Since matrix inversion is smooth and f′ is in C^k−1 this implies that for m ≤ k−1 we get the conclusion (g ∈ C^m) (g ∈ C^m+1). Hence, g is in C^k. □

The implicit function theorem is actually a rather simple consequence of the inverse function theorem. It gives a nice criterion for local solvability of equations in many variables.

Theorem 31 (Implicit Function Theorem) Let U₁ ⊂ ℝⁿ × ℝ^m and U₂ ⊂ ℝ^m be open subsets and let

F: U₁ → U₂, (x₁, …, x_n, y₁,…,y_m) ↦ F(x₁, …, x_n, y₁,…,y_m)

be a C^k-map. Suppose that F(x₀,y₀)=0 for some point (x₀,y₀) ∈ U₁ and that the m × m-matrix ∂_y F(x₀,y₀) is invertible. Then there exists an neighborhood U of (x₀,y₀) ∈ ℝⁿ × ℝ^m, an open neighborhood V of x₀ in ℝⁿ, and a C^k-function f: V → ℝ^m such that

{ (x,y) ∈ U ∣ F(x,y) =0 } = { (x , f(x)) ∈ U ∣ x ∈ V }.

The function f has derivative

f′(x₀)=−(∂_y F(x₀,y₀))⁻¹ ∂_x F(x₀,y₀)

at x₀.

Proof. This is proved by reducing it to the inverse function theorem. Just design the map

G : U₁ → ℝⁿ × ℝ^m, (x,y) ↦ (x, F(x,y))

and then note that

G′(x₀,y₀) =

⎛
⎜
⎝

1	0
∂_x F(x₀,y₀)	∂_y F(x₀,y₀)

⎞
⎟
⎠

is invertible with inverse

(G′(x₀,y₀))⁻¹ =

⎛
⎜
⎝

1	0
−(∂_y F(x₀,y₀))⁻¹ ∂_x F(x₀,y₀)	(∂_y F(x₀,y₀))⁻¹

⎞
⎟
⎠

By the inverse function theorem there exists a local inverse G⁻¹: U₃ → U₄, where U₃ is an open neighborhood of 0 and U₄ an open neighborhood of (x₀,y₀). Now define f by (x,f(x)) = G⁻¹(x,0). □

Example 32 Consider the system of equations

	x₁² + x₂² + y₁² + y₂² = 2,
	x₁ + x₂³ + y₁ + y₂³ =2.

We would like to know if this system implicitly determines functions y₁(x₁,x₂) and y₂(x₁,x₂) near the point (0,0,1,1), which solves the equation. For this one simply applies the implicit function theorem to

F(x₁,x₂,y₁,y₂) = ( x₁² + x₂² + y₁² + y₂² − 2, x₁ + x₂³ + y₁ + y₂³ − 2).

The derivatives are

∂_xF =

⎛
⎜
⎝

2 x₁	2 x₂
1	3 x₂²

⎞
⎟
⎠

, ∂_yF =

⎛
⎜
⎝

2 y₁	2 y₂
1	3 y₂²

⎞
⎟
⎠

The values of these derivatives at the point (0,0,1,1) are

∂_xF(0,0,1,1) =

⎛
⎜
⎝

0	0
1	0

⎞
⎟
⎠

, ∂_yF(0,0,1,1) =

⎛
⎜
⎝

2	2
1	3

⎞
⎟
⎠

The latter matrix is invertible and one computes

−(∂_y F(x₀,y₀))⁻¹ ∂_x F(x₀,y₀)(0,0,1,1) =

⎛
⎜
⎝

1/2	0
−1/2	0

⎞
⎟
⎠

We conclude that there is an implicitly defined function (y₁,y₂)= f(x₁,x₂) whose derivative at (0,0) is given by

⎛
⎜
⎝

1/2	0
−1/2	0

⎞
⎟
⎠

The geometric meaning is that near the point (0,0,1,1) the system defines a two-dimensional manifold that is locally given by the graph of a function. Its tangent plane is spanned by the vectors (1/2,0,1,0) and (−1/2,0,0,1).

Example 33 Consider the system of equations

	x² + y² + z² = 1,
	x + y z + z³ =1.

This is the intersection of a sphere (drawn in light green on Figure 22) with some cubic surface defined by the second equation (drawn in light blue). The point (0,0,1) solves the equation and is pictured as a little orange dot. By the implicit function theorem the intersection is a smooth curve (drawn in red) near this point which can be parametrised by x coordinate. Indeed, we can express y and z along the curve as functions of x because the resulting matrix

∂_(y,z)F(0,1)=

⎛
⎜
⎝

2y	2z
z	y+3z²

⎞
⎟
⎠

⎪
⎪
⎪

y=0,z=1

⎛
⎜
⎝

0	2
1	3

⎞
⎟
⎠

is invertible.

Figure 22: Example of the implicit theorem: the intersection (red) of the unit sphere (green) and a cubic surface (blue).

Exercise 34 Fig. 22 suggests that the intersection curve can be alternatively parametrised by the coordinates y and cannot by z (why?). Check these claims by verifying conditions of Thm. 31.

16.3 The Baire Category Theorem and Applications

We are going to see another example of an abstract result which has several non-trivial consequences for real analysis.

16.3.1 The Baire’s Categories

Let us first prove the following result and then discuss its meaning and name. We may recognise some techniques similar to proof of Thm. 68.

Theorem 35 (Baire’s category theorem) Let (X,d) be a complete metric space and U_n a sequence of open dense sets. Then the intersection S=∩_n U_n is dense.

Proof. The proof is rather straightforward. We need to show that any ball B_ε(x₀) contains an element of S. Let us therefore fix x₀ and ε>0. Since U₁ is dense the intersection of B_ε(x₀) with U₁ is non-trivial. Thus there exists a point x₁ ∈ B_ε(x₀) ∩ U₁. Now choose ε₁ < ε/2 so that B_ε₁(x₁) ⊂ B_ε(x) ∩ U₁ (note the closure of the ball). Since U₂ is dense, the intersection B_ε₁(x₁) ∩ U₂ ⊂ B_ε(x₀) ∩ U₁ ∩ U₂ is non-empty. Choose a point x₂ and ε₂ < ε₁ /2 such that B_ε₂(x₂) ⊂ B_ε₁(x₁) ∩ U₂ ⊂ B_ε(x₀) ∩ U₁ ∩ U₂. Continue inductively, to obtain a sequence x_n such that

B_{ε_n}(x_n)

⊂ B_{ε_n−1}(x_n−1) ⋂ U_n ⊂ B_ε(x₀) ⋂ U₁ ⋂ U₂ ⋂ … ⋂ U_n,

and ε_n < 2⁻ⁿ ε. In particular, for any n>N we have

x_n ∈ B_2^−Nε(x_N),

which implies that x_n is a Cauchy sequence. Hence x_n has a limit x, by completeness of (X,d). Consequently, x is contained in the closed ball B_{ε_N}(x_N) for any N, and therefore it is contained in B_ε(x₀) ∩ (∩_n U_n), as claimed. □

Completeness is essential here. For example, the conclusion does not hold for the metric space ℚ: take bijection ψ: ℕ → ℚ, and consider the open dense sets

U_n = { ψ(1), ψ(1), …, ψ(n)}^c = {ψ(n+1), ψ(n+2),… }.

The intersection ∩_n U_n is empty.

The following historic terminology, due to Baire, is in use.

Definition 36 (Baire’s categories) A subset Y of a metric space X is called

nowhere dense if the interior of Y is empty;
of first category if there is a sequence (Y_k) of nowhere dense sets with Y = ∪_k Y_k;
of second category if it is not of first category.

Example of nowhere dense sets are ℤ ⊂ ℝ, the circle in ℝ², or the set { 1/n ∣ n ∈ ℕ } ⊂ ℝ. Note that the complement of the closure of a nowhere dense set is a dense open set.

Corollary 37 In a complete metric space the complement of a set of the first category is dense.

Proof. Follows from relations for complements

Y^c = (⋃_k Y_k)^c = ⋂_k Y_k^c ⊃ ⋂_k

Y_k

and the fact that Y_k^c is dense. □

The following corollary is also called Baire’s category theorem in some sources:

Corollary 38 A complete metric space is of second category in itself, or plainly speaking it is never the union of a countable number of nowhere dense sets.

The theorem is often used to show abstract existence results. Here is an example.

Theorem 39 There exists a function f ∈ C[0,1] that is nowhere differentiable.

Proof. For each n ∈ ℕ define

U_n =

⎧
⎪
⎨
⎪
⎩

f ∈ C[0,1] s.t. sup

⎧
⎪
⎨
⎪
⎩

⎪
⎪
⎪
⎪

f(x+h)−f(x)

⎪
⎪
⎪
⎪

over 0 < |h| ≤

⎫
⎪
⎬
⎪
⎭

> n, ∀ x ∈ [0,1]

⎫
⎪
⎬
⎪
⎭

We will show that the U_n are open and dense. By the Category theorem their intersection is also dense.

U_n is open: Let f ∈ U_n. For each x ∈ [0,1] choose δ_x>0 such that

sup

⎧
⎪
⎨
⎪
⎩

⎪
⎪
⎪
⎪

f(x+h)−f(x)

⎪
⎪
⎪
⎪

over 0 < |h| ≤

⎫
⎪
⎬
⎪
⎭

> n + δ_x,

hence there is a h_x < 1/n with

⎪
⎪
⎪
⎪

f(x+h_x)−f(x)

h_x

⎪
⎪
⎪
⎪

> n + δ_x.

By continuity of f there is an open neighborhood I_x of x such that

⎪
⎪
⎪
⎪

f(y+h_x)−f(y)

h_x

⎪
⎪
⎪
⎪

> n + δ_x.

for all y ∈ I_x. These I_x form an open cover. We choose a finite subcover (I_{x_k})_k=1,…,N. Let δ= min{δ_x₁, …, δ_{x_N}} > 0 . Then, for y ∈ I_{x_k}:

⎪
⎪
⎪
⎪

f(y+h_{x_k})−f(y)

h_{x_k}

⎪
⎪
⎪
⎪

> n + δ.

Now let g ∈ B_ε(f), where ε>0 is chosen so that ε < 1/2 δ h_{x_k} for all k. Then by an ε/3-style argument:

⎪
⎪
⎪
⎪

g(y+h_{x_k}) − g(y)

h_{x_k}

⎪
⎪
⎪
⎪

≥

⎪
⎪
⎪
⎪

f(y+h_{x_k}) − f(y)

h_{x_k}

⎪
⎪
⎪
⎪

− 2

|| f−g||_∞

h_{x_k}

> n + δ − 2 ε h_{x_k}⁻¹ >n,

and therefore g∈ U_n. We conclude that U_n is open.

U_n is dense: For each ε>0 and f ∈ C[0,1] choose a polynomial p such that || f − p || < ε/2 and a sequence of continuous function g_m ∈ C[0,1] such that || g ||_∞< ε/2 and such that for all x ∈ [0,1]:

sup

⎧
⎪
⎨
⎪
⎩

g_m(x+h)−g_m(x)

over 0 < |h| ≤

⎫
⎪
⎬
⎪
⎭

> m

by using a “zigzag” function. Then, for large enough m we have p+g_m ∈ U_n.
(Exercise: why had we approximated f by a polynomial p and just did not make the same claim about f+g_m itself?) □

The above proof actually shows much more, namely that the set of nowhere differentiable functions is dense in C[0,1]. It is also useful to compare it with the construction of the continuous nowhere differentiable Weierstrass function and identify some common elements.

16.3.2 Banach–Steinhaus Uniform Boundedness Principle

Another consequence of the Baire Category theorem is the Banach–Steinhaus uniform boundedness principle. Recall that, if X and Y are normed spaces, T: X → Y is called a bounded operator if it is a bounded linear map.

Theorem 40 (Banach–Steinhaus Uniform Boundedness Principle) Let X be a Banach space and Y a normed space, and let (T_α)_{α ∈ I} be a family of bounded operators T_α: X → Y. Suppose that

∀ x ∈ X:

sup

|| T_αx || < ∞.

Then we have sup_α || T_α|| < ∞, i.e. the family T_α is bounded in the set B(X,Y) of bounded operators from X to Y.

Proof. Define X_n = {x ∈ X ∣ sup_α || T_αx || ≤ n }. By assumption X = ∪_n X_n. Note that all the X_n are closed. By the Baire category theorem at least one of these sets must have non-empty interior, since otherwise the Banach space X would be a countable union of nowhere dense sets. Hence, there exists N ∈ ℕ, y ∈ X_N, and ε>0 such that B_ε(y) ∈ X_N. Note that X_N is symmetric under reflections x↦ −x and convex. So we get the same statement for −y. Hence, x ∈ B_ε(0) implies

x =

⎛
⎝

(x + y) + (x−y)

⎞
⎠

∈

⎛
⎝

X_N + X_N

⎞
⎠

⊂ X_N. (104)

This means that || x || ≤ ε implies || T_αx || ≤ N, and therefore || T_α|| ≤ ε⁻¹ N for all α ∈ I. □

Recall that the Fourier series of a C¹-function on a circle (identified with 2 π-periodic functions) converges uniformly to the function. We will now show that a statement like that can not hold for all continuous functions.

Corollary 41 There exist continuous periodic functions whose Fourier series do not converge point-wise.

Proof. We will show that there exists a continuous function whose Fourier series does not converge at x=0. Suppose by contradiction such functions would not exist, so we would have point-wise convergence of the Fourier series

a₀ +

∞

∑

m =1

⎛
⎝

a_m cos(m x) + b_m sin(m x)

⎞
⎠

for every f ∈ C(S¹) = C_per(ℝ). Here we identify continuous functions on the unit circle with continuous 2 π-periodic functions C_per(ℝ). Hence we have a map

T_n : C(S¹) → ℝ: f ↦

a₀ +

∑

m =1

a_m

by mapping the function f to the n-th partial sum of its Fourier series at x=0. This is a family of bounded operators T_n: C(S¹) → ℝ and by assumption we have for every f that

sup

| T_n(f) | < ∞.

By Banach–Steinhaus theorem we have sup_n || T_n || = sup_{n, || f ||_∞=1} | T_n(f) | < ∞. Now one computes the norm of the map

T_n : C(S¹) → ℝ: f ↦

∫

−π

f(x)

⎛
⎜
⎜
⎝

∑

k=1

cos(k x)

⎞
⎟
⎟
⎠

  d x =

2 π

∫

−π

f(x) D_n(x)   d x

where

D_n(x) =

sin

⎛
⎜
⎜
⎝

(n+

) x

⎞
⎟
⎟
⎠

sin

⎛
⎜
⎜
⎝

⎞
⎟
⎟
⎠

is the Dirichlet kernel , cf. Lem. 6. This norm equals 1/2 π ∫_−π^π| D_n(x) |   d x = 1/2 π ∫₀^{2 π} | D_n(x) |   d x (Exercise) which goes to ∞ as n → ∞. Indeed, using sin(x/2) ≤ x/2 and substituting we get

2 π

∫

| D_n(x) |   d x

≥

2π

∫

|sin((n+

) x)|

x/2

  d x

[since sins ≤ s]

(2n+1)π

∫

|sin(t)|

  d t

[change of variables t=(n+

) x]

≥

∑

k=0

(k+1) π

∫

k π

|sint|

  d t

[split integral into intervals]

≥

⎪
⎪
⎪
⎪

∑

k=0

∫

sint

(k+1)

  d t

⎪
⎪
⎪
⎪

[since t ≤ k+1 for t∈ (k,k+1) ]

= 2

∑

k=0

k+1

[evaluating the integral],

which is the harmonic series divergent as n → ∞. This gives a contradiction. □

Another corollary of the Banach–Steinhaus principle is an important continuity statement. Recall that of X and Y are normed spaces them so is the Cartesian product X × Y equipped with the norm || (x,y) || = ( || x ||_X² + || y ||_Y² )^1/2. It is easy to see that a sequence (x_n,y_n) converges to (x,y) in this norm if and only if x_n → x and y_n → y.

Theorem 42 Suppose that X, Y are Banach spaces and suppose that B: X × Y → ℝ is a bilinear form on X × Y that is separately continuous, i.e. B(·, y) is continuous on X for every y ∈ Y and B(x,·) is continuous on Y for every x ∈ X. Then B is continuous.

Proof. Suppose that (x_n,y_n) is a sequence that converges to (x,y). First note that

B(x_n−x,y_n−y)= B(x_n,y_n) − B(x_n,y) − B(x,y_n) + B(x,y),

where B(x_n,y) → B(x,y) as well as B(x,y_n) → B(x,y). So it is sufficient to show that B(x_n−x,y_n−y) → 0 or, equivalently, B(x′_n,y′_n) → 0 for any x′_n→ 0 and y′_n→ 0. Now, the linear mappings T_n(x)= B(x,y′_n): X → ℝ are bounded, by assumption. Since ||y′_n||→ 0 the sequence T_n(x)→ 0 and is bounded for every x∈ X. Then, by the Banach–Steinhaus theorem there exists a constant C such that ||T_n|| ≤ C for all n. That is |T_n(x)| = B(x, y′_n) ≤ C ||x|| for all n and x∈ X. Therefore, |B(x′_n,y′_n)| ≤ C ||x′_n|| → 0. □

Remark 43 Recall that already on ℝ² separate continuity does not imply joint continuity for any function. The standard example from Analysis is the function

f(x,y) =

⎧
⎪
⎪
⎨
⎪
⎪
⎩

x y

x² + y²

if (x,y) ≠ (0,0);

if (x,y) = 0,

which is continuous in x or y separately but is not jointly continuous.

16.3.3 The open mapping theorem

Recall that for a continuous map the pre-image of any open set is open. This does of course not mean that the image of any open set is open (for example, sin: ℝ → ℝ has image [−1,1], which is not open). A map f: X → Y between metric space is called open if the image of every open set is open. If a map is invertible then it is open if and only if its inverse is continuous. We start with a simple observation for linear maps. We will denote open balls in normed spaces X and Y by B_r^X(x) and B_s^Y(y) respectively, or simply B_r^X and B_s^Y if they are centred at the origin.

Lemma 44 Let X and Y be normed spaces. Then a linear map T: X → Y is open if and only if there exists ε>0 such that B_ε^Y(0) ⊂ T (B₁^X(0)), i.e. the image of the unit ball contains a zero’s neighbourhood.

Proof. If the map T is open it clearly has this property. Suppose conversely, that B_ε^Y(0) ⊂ T (B₁^X(0)) for some ε>0. Then, by scaling, B_{ε δ}^Y(0) ⊂ T (B_δ^X(0)) for any δ>0. Suppose that U is open. Suppose that y ∈ T(U), that is there exists x∈ U such that y=T(x). Then there exists δ>0 with x + B_δ^X(0) ⊂ U and therefore

TU ⊃ T B_δ^X(x) = { T x } + T B_δ^X(0) ⊃ { y } + B_δ ε^Y(0) =B_δ ε^Y(y) .

□

Theorem 45 (Open Mapping Theorem) Let T : X → Y be a continuous surjective linear operator between Banach spaces. Then T is open.

Proof. Since T is surjective we have Y = ∪_n T B_n^X. Therefore trivially, Y = ∪_n T B_n^X. By the Baire category theorem one of the T B_n^X must have an interior point. Rescaling implies that T B₁^X has an interior point y₀.

Since T B₁^X is symmetric under reflection y→ −y, the point −y₀ must also be an interior point. Therefore, by convexity of T B₁^X there exists a δ>0 with B_δ^Y ⊂ T B₁^X, cf. (104). By linearity this means B_{δ 2⁻ⁿ}^Y ⊂ T B_2⁻ⁿ^X for any natural n.

We will show that T B₁^X ⊂ T B₂^X, with the implication from above that B_δ^Y⊂ T B₂^X, which will complete the proof by the previous Lemma. So, let y ∈ TB₁^X be arbitrary. Then, there exists x₁ ∈ B₁^X such that y − T x₁ ∈ B_δ/2^Y ⊂ TB_1/2^X. Repeating this, there exists x₂ ∈ B_1/2^X such that y − T x₁ − T x₂ ∈ B_δ/4^Y and ||x₂||≤ 1/2.

Continuing inductively, we obtain a sequence (x_n) with the property that || x_n || < 2⁻ⁿ⁺¹ and

y −

∑

k=1

Tx_n ∈ B_{δ 2⁻ⁿ⁺¹}^Y. (105)

By completeness of X, the absolute convergent series ∑x_n converges to an element x∈ X of norm || x|| < 2. By linearity and continuity of T we get from (105) that y = T x. Thus y∈ TB₂. □

If the map T is also injective (and, therefore, bijective with the inverse T⁻¹) we can quickly conclude continuity of T⁻¹.

Corollary 46 Suppose that T: X → Y is a bijective bounded linear map between Banach spaces. Then T has a bounded inverse T⁻¹.

It is not rare that we may have two different norms ||·|| and ||·||_* on the same Banach space X. We say that ||·|| and ||·||_* are equivalent if there are constants c>0 and C>0 such that:

c ||x|| ≤ ||x||_* ≤ C ||x|| for all x ∈ X. (106)

Exercise 47

Check that (106) defines an equivalence relations on the set of all norms on X.
If a sequence is Cauchy/convergent/bounded in a norm then it is also Cauchy/convergent/bounded in any equivalent norm.

The Cor. 46 implies that if the identity map (X,||·||)→ (X,||·||_*) is bounded then both norms are equivalent.

Corollary 48 Let (X,||·||) be a Banach space and ||·||_* be a norm on X in which X is complete. If ||·|| ≤ C ||·||_* for some C>0 the norms are equivalent.

In particular, any two norm on a finite dimensional vector space are equivalent.

16.3.4 The closed graph theorem

Suppose that X, Y are Banach spaces and suppose that D ⊂ X is a linear subspace (not necessarily closed). Now suppose that T : D → Y is a linear operator. Then the graph gr(T) is defined as the subset {(x,Tx) ∣ x ∈ D} ⊂ X × Y. This is a linear subspace in the Banach space X × Y, which can be equipped with the norm ||(x,y)||² = ||x||_X² + || y||_Y². One often uses the equivalent norm ||(x,y)|| = ||x||_X + || y||_Y but the first choice makes sure that the product X × Y is also a Hilbert space if X and Y are Hilbert spaces. We will refer to T as an operator from X to Y with domain D.

Definition 49 The operator T is called closed if and only if its graph is a closed subset of X × Y.

It is easy to see that T is closed if an only if x_n → x and T x_n → y imply that T x_n → T x. Note the difference with continuity of T!!!

If T is an operator T : D → Y, where D is a subspace of X, then its graph is a subset of X × Y. If we close this subset the resulting set may fail to be the graph of an operator. If the closure is the graph as well, we say that T is closable and its closure is the operator whose graph is obtained by closing the graph of T.

Differential operators are often closed but not bounded. Let L²[a,b] be the Hilbert space of functions such that (C[a,b],|| ·||₂) is its dense subspace, cf. Prop. 60. Then D=C¹[a,b] is a dense subspace in L²[a,b] and the operator d/dx: C¹[a,b] → L²[a,b] is of the above type. This operator is not closed, however it is closable and its closure therefore defines a closed operator with dense domain. We have already seen that this operator is unbounded and therefore it cannot be continuous.

Of course, the map D → (x,Tx) is a bijection from D to gr(T). We can use the norm on gr(T) to define a norm on D, which is then

|| x ||_D =

⎛
⎝

|| x ||_X² + || T x ||_Y²

⎞
⎠

Obviously, T is closed if and only of D with norm ||·||_D is a Banach space. We are now ready to state the closed graph theorem. It is easy to check that T continuously maps (D, || · ||_D) to Y.

Theorem 50 (Closed Graph Theorem) Suppose that X and Y are Banach spaces and suppose that T: X → Y is closed. Then T is bounded.

Proof. Since in this case we have D=X with have two norms ||·||_X and || · ||_D on X that are both complete. Clearly,

||·||_X ≤ ||·||_D,

and by Cor. 48 the norms are therefore equivalent. Hence,

|| T x ||_Y ≤ || x ||_D ≤ C || x ||_X

for some constant C>0. □

16.4 Semi-norms and locally convex topological vector spaces

Definition 51 (Semi-Norm) Let X be a vector space, then a map p: X → ℝ is called semi-norm if

p(x) ≥ 0 for all x ∈ X,
p(λ x) = |λ| p(x), for all λ ∈ ℝ, x ∈ X,
p(x+y) ≤ p(x) + p(y), for all x,y ∈ X.

An example of a semi-norm on C¹[0,1] is p(f):=|| f ′ ||_∞. If (p_α)_α is a family of semi-norms with the property that

( ∀ α ∈ I, p_α(x) =0 ) x=0

then we say X with that family is a locally convex topological vector space. There is a topology (that is, a description of all open sets) on such a vector space, by declaring a subset U ⊂ X to be open if and only if for every point x ∈ U and any index α ∈ I there exists ε>0 such that { y ∣ p_α(y−x) < ε } ⊂ U. The notion of convergence one gets is x_n → x if and only of p_α(x_n −x) → 0 for all α. The topology of point-wise convergence on the space of functions S → ℝ is for example of this type, with the family of semi-norms given by (p_x)_{s x ∈ S}, p_x(f) = | f(x) |.

Another example is the vector space C^∞(ℝ^m) with the topology of uniform convergence of all derivatives on compact sets. Here the family of semi-norms p_α,K is indexed by all multi-indices α ∈ ℕ₀^m and all compact subsets K ⊂ ℝ and is given by

p_α, K(f) =

sup

x ∈ K

| ∂^αf(x) |.

If the family of semi-norms is countable then this topology is actually coming from a metric (so the space is a metric space)

d(x,y) =

∞

∑

k=1

2^k

p_k(x−y)

1+p_k(x−y)

Such a metric space is called Frechet space. Note that C^∞(ℝ^m) is a Frechet space because the family of semi-norms above can be replaced by a countable one by taking a countable exhaustion of ℝ^m by compact subsets.

A Tutorial Problems

These are tutorial problems intended for self-assessment of the course understanding.

A.1 Tutorial problems I

All spaces are complex, unless otherwise specified.

1 Show that ||f||=|f(0)|+sup|f′(t)| defines a norm on C¹[0,1], which is the space of (real) functions on [0,1] with continuous derivative.

2 Show that the formula ⟨ (x_n),(y_n)⟩ =∑_n=1^∞x_ny_n/n² defines an inner product on l_∞, the space of bounded (complex) sequences. What norm does it produce?

3 Use the Cauchy–Schwarz inequality for a suitable inner product to prove that for all f ∈ C[0,1] the inequality

⎪
⎪
⎪
⎪

∫

f(x)x d x

⎪
⎪
⎪
⎪

≤ C

⎛
⎜
⎜
⎝

∫

|f(x)|²  d x

⎞
⎟
⎟
⎠

1/2

holds for some constant C>0 (independent of f) and find the smallest possible C that holds for all functions f (hint: consider the cases of equality).

4 We define the following norm on l_∞, the space of bounded complex sequences:

||(x_n)||_∞=

sup

n ≥ 1

|x_n|.

Show that this norm makes l_∞ into a Banach space (i.e., a complete normed space).

5 Fix a vector (w₁,…,w_n) whose components are strictly positive real numbers, and define an inner product on ℂⁿ by

⟨ x,y ⟩ =

∑

k=1

w_k x_k

_k.

Show that this makes ℂⁿ into a Hilbert space (i.e., a complete inner-product space).

A.2 Tutorial problems II

6 Show that the supremum norm on C[0,1] isn’t given by an inner product, by finding a counterexample to the parallelogram law.

7 In l₂ let e₁=(1,0,0,…), e₂=(0,1,0,0,…), e₃=(0,0,1,0,0,…), and so on. Show that Lin (e₁,e₂,…)=c₀₀, and that CLin (e₁,e₂,…)=l₂. What is CLin (e₂,e₃,…)?

8 Let C[−1,1] have the standard L₂ inner product, defined by

⟨ f, g⟩ =

∫

−1

f(t)

g(t)

 d t.

Show that the functions 1, t and t²−1/3 form an orthogonal (not orthonormal!) basis for the subspace P₂ of polynomials of degree at most 2 and hence calculate the best L₂-approximation of the function t⁴ by polynomials in P₂.

9 Define an inner product on C[0,1] by

⟨ f,g⟩=

∫

√

  f(t)  

g(t)

 d t.

Use the Gram–Schmidt process to find the first 2 terms of an orthonormal sequence formed by orthonormalising the sequence 1, t, t², ….

10 Consider the plane P in ℂ⁴ (usual inner product) spanned by the vectors (1,1,0,0) and (1,0,0,−1). Find orthonormal bases for P and P^⊥, and verify directly that (P^⊥)^⊥=P.

A.3 Tutorial Problems III

11 Let a and b be arbitrary real numbers with a < b. By using the fact that the functions 1/√2πe^inx, n ∈ ℤ, are orthonormal in L₂[0,2π], together with the change of variable x=2π(t−a)/(b−a), find an orthonormal basis in L₂[a,b] of the form e_n(t)=α e^{i n λ t}, n ∈ ℤ, for suitable real constants α and λ.

12 For which real values of α is

∞

∑

n=1

n^αe^int

the Fourier series of a function in L₂[−π,π]?

13 Calculate the Fourier series of f(t)=e^t on [−π,π] and use Parseval’s identity to deduce that

∞

∑

n=−∞

n²+1

tanhπ

14 Using the fact that (e_n) is a complete orthonormal system in L₂[−π,π], where e_n(t)=exp(int)/√2π, show that e₀,s₁,c₁,s₂,c₂,… is a complete orthonormal system, where s_n(t)=sinnt/√π and c_n(t)= cosnt/√π. Show that every L₂[−π,π] function f has a Fourier series

a₀+

∞

∑

n=1

a_ncosnt + b_n sinnt,

converging in the L₂ sense, and give a formula for the coefficients.

15 Let C(T) be the space of continuous (complex) functions on the circle T={ z ∈ ℂ: |z|=1 } with the supremum norm. Show that, for any polynomial f(z) in C(T)

∫

|z|=1

f(z)  d z=0.

Deduce that the function f(z)=z is not the uniform limit of polynomials on the circle (i.e., Weierstrass’s approximation theorem doesn’t hold in this form).

A.4 Tutorial Problems IV

16 Define a linear functional on C[0,1] (continuous functions on [0,1]) by α(f)=f(1/2). Show that α is bounded if we give C[0,1] the supremum norm. Show that α is not bounded if we use the L₂ norm, because we can find a sequence (f_n) of continuous functions on [0,1] such that ||f_n||₂ ≤ 1, but f_n(1/2) → ∞.

17 The Hardy space H₂ is the Hilbert space of all power series f(z)=∑_n=0^∞a_n zⁿ, such that ∑_n=0^∞|a_n|² < ∞, where the inner product is given by

⟨
⟨
⟨
⟨

∞

∑

n=0

a_nzⁿ,

∞

∑

n=0

b_nzⁿ

⟩
⟩
⟩
⟩

∞

∑

n=0

a_n

b_n

Show that the sequence 1, z, z², z³, … is an orthonormal basis for H₂.

Fix w with |w|<1 and define a linear functional on H₂ by α(f)=f(w). Write down a formula for the function g(z) ∈ H₂ such that α(f)=⟨ f, g ⟩. What is ||α||?

18 The Volterra operator V: L₂[0,1] → L₂[0,1] is defined by

(Vf)(x)=

∫

f(t)  d t.

Use the Cauchy–Schwarz inequality to show that |(Vf)(x)| ≤ √x||f||₂ (hint: write (Vf)(x)=⟨ f, J_x⟩ where J_x is a function that you can write down explicitly).

Deduce that ||Vf||₂² ≤ 1/ 2||f||₂², and hence ||V|| ≤ 1/√2.

19 Find the adjoints of the following operators:

A:l₂ → l₂, defined by A(x₁,x₂,…)=(0,x₁ / 1, x₂/ 2, x₃/ 3, …);
and, on a general Hilbert space H:
The rank-one operator R, defined by Rx=⟨ x,y ⟩ z, where y and z are fixed elements of H;
The projection operator P_M, defined by P_M(m+n)=m, where m ∈ M and n ∈ M^⊥, and H=M ⊕ M^⊥ as usual.

20 Let U ∈ B(H) be a unitary operator. Show that (Ue_n) is an orthonormal basis of H whenever (e_n) is.

Let l₂(ℤ) denote the Hilbert space of two-sided sequences (a_n)_n=−∞^∞ with

||(a_n)||²=

∞

∑

n=−∞

|a_n|² < ∞.

Show that the bilateral right shift , V:l₂(ℤ) → l₂(ℤ) defined by V((a_n))=(b_n), where b_n=a_n−1 for all n∈ ℤ, is unitary, whereas the usual right shift S on l₂=l₂(ℕ) is not unitary.

A.5 Tutorial Problems V

21 Let f∈ C[−π,π] and let M_f be the multiplication operator on L₂(−π,π), given by (M_fg)(t)=f(t) g(t), for g ∈ L₂(−π,π). Find a function f′ ∈ C[−π,π] such that M_f^*=M_f′.

Show that M_f is always a normal operator. When is it Hermitian? When is it unitary?

22 Let T be any operator such that Tⁿ=0 for some integer n (such operators are called nilpotent ). Show that I−T is invertible (hint: consider I+T+T²+…+Tⁿ⁻¹). Deduce that I−T/λ is invertible for any λ ≠ 0.

What is σ(T)? What is r(T)?

23 Let (λ_n) be a fixed bounded sequence of complex numbers, and define an operator on l₂ by T((x_n))=((y_n)), where y_n=λ_nx_n for each n. Recall that T is a bounded operator and ||T||=||(λ_n)||_∞. Let Λ={λ₁,λ₂,…}. Prove the following:

Each λ_k is an eigenvalue of T, and hence is in σ(T).
If λ ∉Λ, then the inverse of T−λ I exists (and is bounded).

Deduce that σ(T)=Λ. Note, that then any non-empty compact set could be a spectrum of some bounden operator.

24 Let S be an isomorphism between Hilbert spaces H and K, that is, S: H → K is a linear bijection such that S and S⁻¹ are bounded operators. Suppose that T ∈ B(H). Show that T and STS⁻¹ have the same spectrum and the same eigenvalues (if any).

25 Define an operator U: l₂(ℤ) → L₂(−π,π) by U((a_n))=∑_n=−∞^∞a_n e^int/√2π. Show that U is a bijection and an isometry, i.e., that ||Ux||=||x|| for all x ∈ l₂(ℤ).

Let V be the bilateral right shift on l₂(ℤ), the unitary operator defined on Question 20. Let f ∈ L₂(−π,π). Show that (UVU⁻¹f)(t)=e^itf(t), and hence, using Question 24, show that σ(V)=T, the unit circle, but that V has no eigenvalues.

A.6 Tutorial Problems VI

26 Show that K(X) is a closed linear subspace of B(X), and that AT and TA are compact whenever T ∈ K(X) and A ∈ B(X). (This means that K(X) is a closed ideal of B(X).)

27 Let A be a Hilbert–Schmidt operator, and let (e_n)_{n≥ 1} and (f_m)_{m≥ 1} be orthonormal bases of A. By writing each Ae_n as Ae_n=∑_m=1^∞⟨ Ae_n, f_m ⟩ f_m, show that

∞

∑

n=1

||Ae_n||²=

∞

∑

m=1

||A^*f_m||².

Deduce that the quantity ||A||_HS²=∑_n=1^∞||Ae_n||² is independent of the choice of orthonormal basis, and that ||A||_HS=||A^*||_HS. (||A||_HS is called the Hilbert–Schmidt norm of A.)

Let T∈ K(H) be a compact operator. Using Question 26, show that T^*T and TT^* are compact Hermitian operators.
Let (e_n)_{n≥ 1} and (f_n)_{n ≥ 1} be orthonormal bases of a Hilbert space H, let (α_n)_{n ≥ 1} be any bounded complex sequence, and let T ∈ B(H) be an operator defined by
Tx=
∞

∑

n=1

α_n ⟨ x, e_n ⟩ f_n.

Prove that T is Hilbert–Schmidt precisely when (α_n) ∈ l₂. Show that T is a compact operator if and only if α_n → 0, and in this case write down spectral decompositions for the compact Hermitian operators T^*T and TT^*.

29 Solve the Fredholm integral equation φ−λ Tφ=f, where f(x)=x and

(Tφ)(x)=

∫

xy² φ(y)  d y (φ ∈ L₂(0,1)),

for small values of λ by means of the Neumann series.

For what values of λ does the series converge? Write down a solution which is valid for all λ apart from one exception. What is the exception?

30 Suppose that h is a 2π-periodic L₂(−π,π) function with Fourier series ∑_n=−∞^∞a_n e^int. Show that each of the functions φ_k(y)=e^iky, k ∈ ℤ, is an eigenvector of the integral operator T on L₂(−π,π) defined by

(Tφ)(x)=

∫

−π

h(x−y) φ(y)  d y,

and calculate the corresponding eigenvalues.

Now let h(t)=−log(2(1−cost)). Assuming, without proof, that h(t) has the Fourier series ∑_{n ∈ ℤ, n ≠ 0} e^int/|n|, use the Hilbert–Schmidt method to solve the Fredholm equation φ−λ Tφ=f, where f(t) has Fourier series ∑_n=−∞^∞c_n e^int and 1/λ ∉σ(T).

A.7 Tutorial Problems VII

31 Use the Gram–Schmidt algorithm to find an orthonormal basis for the subspace X of L₂(−1,1) spanned by the functions t, t² and t⁴.

Hence find the best L₂(−1,1) approximation of the constant function f(t)=1 by functions from X.

32 For n=1,2,… let φ_n denote the linear functional on l₂ defined by

φ_n(x)=x₁+x₂+…+x_n,

where x=(x₁,x₂,…) ∈ l₂. Use the Riesz–Fréchet theorem to calculate ||φ_n||.

33 Let T be a bounded linear operator on a Hilbert space, and suppose that T=A+iB, where A and B are self-adjoint operators. Express T^* in terms of A and B, and hence solve for A and B in terms of T and T^*.

Deduce that every operator T can be written T=A+iB, where A and B are self-adjoint, in a unique way.

Show that T is normal if and only if AB=BA.

34 Let P_n be the subspace of L₂(−π,π) consisting of all polynomials of degree at most n, and let T_n be the subspace consisting of all trigonometric polynomials of the form f(t)=∑_k=−nⁿ a_k e^ikt. Calculate the spectrum of the differentiation operator D, defined by (Df)(t)=f′(t), when

D is regarded as an operator on P_n, and
D is regarded as an operator on T_n.

Note that both P_n and T_n are finite-dimensional Hilbert spaces.

Show that T_n has an orthonormal basis of eigenvectors of D, whereas P_n does not.

35 Use the Neumann series to solve the Volterra integral equation φ−λ Tφ=f in L₂[0,1], where λ∈ ℂ, f(t)= 1 for all t, and (Tφ)(x)=∫₀^x t²φ(t)  d t. (You should be able to sum the infinite series.)

B Solutions of Tutorial Problems

0=0Solutions of the tutorial problems will be distributed due in time on the paper.

1<0

B.1 Solution of Tuitorial Problem I

1 Clearly the norm is non-negative. If ||f||=0, then f is constantly zero (since then f(0)=0 and f′(t)=0 everywhere). Also

\|\|λ f \|\|	=	\|λ f(0)\|+ sup\|λ f′(t)\|
	=	\|λ\| \|f(0)\|+\|λ\|sup\| f′(t)\| =\|λ\| \|\|f\|\|,

and

\|\|f+g\|\|	=	\|f(0)+g(0)\| +sup\|f′(t)+g′(t)\|
	≤	\|f(0)\|+\|g(0)\|+sup\|f′(t)\|+sup\|g′(t)\|
	=	\|\|f\|\|+\|\|g\|\|.

2 Clearly the sum is absolutely convergent (since ∑1/n² < ∞), and checking the conditions:

⟨ y,x ⟩ = ⟨ x,y⟩, ⟨λ x, y⟩ = λ ⟨ x,y ⟩, ⟨ x+y, z ⟩=⟨ x,z⟩ + ⟨ y,z⟩, and

⟨ x,x ⟩ > 0 except when x=0, when ⟨ x,x⟩ =0 ,

is fairly straightforward algebra.

Also ||(x_n)|| = ⟨(x_n),(x_n)⟩^1/2 = (∑_n=1^∞|x_n|²/n²)^1/2.

3 With the usual inner product ⟨ f,g⟩=∫f ḡ d x, we have that |⟨ f,g⟩ | ≤ ||f|| ||g||, where g is the function g(x)=x. Now ||g||=1/√3 and this is the smallest possible constant C, since we do have ⟨ g,g⟩ =||g|| ||g||.

4 I’ll omit the proof that this is a norm (but if it gives any trouble, ask me). The proof of completeness is a bit like the l₂ proof, only simpler. Suppose that (x⁽ⁿ⁾) is a Cauchy sequence in l_∞. Then, for each coordinate k, |x_k⁽ⁿ⁾−x_k^(m)| ≤ ||x⁽ⁿ⁾ − x^(m)||, and so (x_k⁽ⁿ⁾) is a Cauchy sequence of complex numbers, converging to x_k, say. Also |x_k⁽ⁿ⁾−x_k^(m)| < є for n and m greater than or equal to N_є, say. Letting m→∞ we get that |x_k⁽ⁿ⁾−x_k| ≤ є for n ≥ N_є, so (x⁽ⁿ⁾−x) ∈ l_∞ and hence x ∈ l_∞. Also ||x⁽ⁿ⁾−x|| ≤ є for n ≥ N_є, and so x⁽ⁿ⁾ → x.

5 Clearly ⟨ y, x⟩=∑_k=1ⁿ w_k y_k x_k = ⟨ x,y⟩, and the properties ⟨ x+y, z⟩=⟨ x,z⟩ + ⟨ y, z⟩ and ⟨ λ x, y⟩ =λ ⟨ x, y⟩ are also straightforward to check. The norm produced is

||x||=⟨ x,x⟩^1/2=

⎛
⎜
⎜
⎝

∑

k=1

w_k |x_k|²

⎞
⎟
⎟
⎠

1/2

which is strictly positive unless each x_k is zero. For the completeness, note that a Cauchy sequence (x^(m)) has the property that, for each є>0 there is a number M_є with ||x^(m)−x^(p)||<є for m, p ≥ M_є. That is,

∑

k=1

w_k |x_k^(m)−x_k^(p)|²<є². (107)

Thus w_k|x_k^(m)−x_k^(p)|² < є², which is enough to show that in the kth coordinate we have a Cauchy sequence of complex numbers.

Define a vector x ∈ ℂⁿ by x_k=lim_{m → ∞} x_k^(m) for each k. Now we have x^(m) → x, since

∑

k=1

w_k |x_k^(m)−x_k|²≤ є²,

for m ≥ M_є, as we see on letting p → ∞ in (107).

2<0

B.2 Solutions of Tutorial Problems II

6 Lin(e₁, e₂, …) = c₀₀ because a sequence is a finite linear combination of the e_i if and only if it has finitely many nonzero terms. Taking the closure we get all of l₂ since anything in l₂ is the limit of a sequence in c₀₀. This is so, because

||(x₁, x₂, …) − (x₁, x₂, …, x_N, 0, 0, …)||² =

∞

∑

n=N+1

|x_n|²,

which tends to zero as N → ∞. Finally CLin(e₂,e₃,…) is the same except that we only get sequences whose first term is zero, i.e. we get {x=(x_n) ∈ l₂: x₁ = 0}.

7 Calculate ⟨ 1,t⟩, ⟨ 1,t²−1/3⟩ and ⟨ t,t²−1/3⟩: they are all zero. Clearly the set is a basis for P₂. Normalise the functions, to get e₁(t)=1/√2, e₂(t)=t√3/2 and e₃(t)=√45/8(t²−1/3), an orthonormal sequence. It now follows that, writing f(t)=t⁴, the best approximation is

g(t)	=⟨	f,e₁⟩ e₁+⟨ f,e₂⟩ e₂+⟨ f,e₃⟩ e₃
	=	(2/5)(1/2)+(0)t+(45/8)(16/105)(t²−1/3) = (−3/35)+(6/7)t².

As a cross-check, note that f−g is indeed orthogonal to g.

8 ⟨ 1,1⟩ =2/3, so e₁(t)=√3/2. Now form f(t)=t−⟨ t,e₁⟩ e₁=t−⟨ t,1⟩ (3/2)=t−3/5. Since ⟨ f,f⟩ =8/175 we take e₂(t)=√175/8(t−3/5).

9 The Gram–Schmidt process gives e₁=(1,1,0,0)/√2, then y₂=(1,0,0,−1)−(1,1,0,0)/2=(1/2,−1/2,0,−1), and then e₂=y₂/||y₂||=(1,−1,0,−2)/√6.

The plane P^⊥ consists of all vectors orthogonal to (1,1,0,0) and (1,0,0,−1), and hence is

{(x,y,z,w)∈ ℂ⁴: x+y=0, x−w=0},

with general solution (a,−a,b,a), and basis (1,−1,0,1) and (0,0,1,0), which are already orthogonal.

We can thus take e₃=(1,−1,0,1)/√3 and e₄=(0,0,1,0) as a basis for P^⊥.

Finally, (P^⊥)^⊥ consists of all vectors orthogonal to (1,−1,0,1) and (0,0,1,0), namely

{(x,y,z,w)∈ ℂ⁴: x−y+w=0, z=0},

to which the general solution is (−a+b,b,0,a), with basis (−1,0,0,1) and (1,1,0,0). We are clearly back at P.

3<0

B.3 Solutions of Tutorial Problems III

10 The given change of variables

2π(t−a)

b−a

or t=a+

(b−a)x

2π

takes t∈ [a,b] to x ∈ [0,2π]. We know that

2π

∫

2π

e^inxe^−imx  d x=

⎧
⎨
⎩

1	if n=m,
0	if n ≠ m,

so we obtain

2π

∫

e^{in λ (t−a)} e^{−im λ (t−a)}

2 π

b−a

 d t=

⎧
⎨
⎩

1	if n=m,
0	if n ≠ m,

where λ = 2π/(b−a). Hence the functions e_n(t)=1 / √b−a e^{inλ t} form an orthonormal set in L₂[a,b]. They are even an orthonormal basis, since the same coordinate change shows that their closed linear span contains all f ∈ C[a,b] such that f(a)=f(b).

11 By the Riesz–Fischer theorem, ∑c_ne_n converges to an L₂ function iff ∑|c_n|² < ∞. Here c_n=n^α√2π so we require ∑n^2α < ∞, i.e. α<−1/2.

12 We calculate

⟨ f,e_n⟩ =

∫

−π

e^t e^−int d t/

√

2π

= (−1)ⁿ

e^π−e^−π

(1−in)

√

2π

Also, by Parseval’s identity

∞

∑

n=−∞

|⟨ f,e_n⟩ |² = ||f||₂² =

∫

−π

e^2t d t = (e^2π − e^−2π)/2.

That is,

∞

∑

n=−∞

(e^π−e^−π)²

(1+n²)2π

(e^2π−e^−2π)

Thus

∞

∑

n=−∞

1+n²

= π

e^2π − e^−2π

(e^π−e^−π)²

which gives the result.

13 To check that the sequence is orthonormal it is probably easiest to write

s_n=

√

(e_n−e_−n)/2i and c_n=

√

(e_n+e_−n)/2,

and use the orthonormality of (e_n) to calculate ⟨ s_n, s_m ⟩, ⟨ s_n, c_m ⟩ and ⟨ c_n, c_m ⟩.

Since exp(± int)=cos(nt)± isin(nt) and also cos(nt)=(exp(int)−exp(−int))/2 and sin(nt)=(exp(int)−exp(−int))/2i, the linear spans of (e_n) and {e₀,s₁,c₁,s₂,c₂,…} are the same. Hence their closed linear spans are the same, so {e₀,s₁,c₁,s₂,c₂,…} also forms an o.n.b.

As we have an orthonormal basis we have an expansion

f=⟨ f,e₀⟩ e₀ +

∞

∑

⟨ f,c_n⟩ c_n +

∞

∑

⟨ f,s_n⟩ s_n

converging in L₂. Hence

a₀=(1/2π)

∫

−π

f(t)  d t, a_n = (1/π)

∫

−π

f(t) cosnt  d t and b_n = (1/π)

∫

−π

f(t) sinnt  d t.

14 Complex analysis method: Clearly there is a polynomial g such that f(z)=g′(z). By the fundamental theorem of the calculus, ∫g′ = 0 because the contour is closed. Or you could use Cauchy’s theorem.

Direct method:

∫

2π

f(e^iθ) ie^iθ d θ =

∑

k=0

a_k

∫

2π

ie^i(k+1)θ d θ=0,

where f(z)=∑_k=0ⁿ a_kz^k.

Now

∫

|z|=1

z  d z =

∫

|z|=1

(1/z)  d z =

∫

2π

e^−iθ ie^iθ d θ= 2π i.

(Again, there are several ways of doing this, as above.) But if f_n → f uniformly then ∫f_n → ∫f, which is impossible if f_n are polynomials and f(z)=z.

4<0

B.4 Solutions of Tutorial Problems IV

15 Since |α(f)|=|f(1/2)|≤ sup_[0,1]|f(x)|=||f||_∞, we have that ||α|| ≤ 1 (actually it equals 1) when we use the supremum norm.

Suppose now we define f_n (starting at n=2, say) to be zero except on

[1/2−1/n, 1/2+1/n],

piecewise linear on [1/2−1/n, 1/2] and [1/2, 1/2+1/n], with f_n(1/2)=A_n, where A_n is a positive number that we’ll choose in a minute. (The graph is going to be a thin steep triangle.) Now ||f_n||₂² ≤ (2/n) × A_n², since f_n is zero except on a set of length 2/n and always at most A_n (we could work it out exactly, but why bother?) So if we choose A_n=√n/2 we get ||f_n||₂ ≤ 1, and α(f_n)=A_n so α(f_n) → ∞, which means that α is unbounded in the L₂ norm.

16 Clearly we get orthonormality—just compute ⟨ z^k, z^l ⟩. The fact that it is an orthonormal basis (i.e., complete) follows since the only function orthogonal to every z^k has all its coefficients zero, so is the 0 function.

Now α(∑a_nzⁿ)=∑a_n wⁿ, and this is ∑a_n b_n only if we take b_n=wⁿ for each n. Hence

g(z)=

∞

∑

n=0

ⁿ zⁿ = 1 /(1−

z).

Finally ||α||=||g||, so compute the H₂ norm of g to get

⎛
⎜
⎜
⎝

∞

∑

n=0

|²ⁿ

⎞
⎟
⎟
⎠

1/2

⎛
⎜
⎜
⎝

1−|w|²

⎞
⎟
⎟
⎠

1/2

17 The function J_x must be χ_[0,x], where

χ_[0,x](t)=

⎧
⎨
⎩

1	on [0,x],
0	elsewhere.

It has L₂ norm equal to the square root of ∫₀^x 1²  d t, i.e., √x. Hence

|(Vf)(x)| ≤ ||χ_[0,x]||₂   ||f||₂ =

√

  ||f||₂.

Now integrate

∫

|(Vf)(x)|²  d x ≤

∫

x ||f||₂²   dx =

||f||₂², as required.

18 The strategy here is to solve the equation ⟨ Ax, y⟩ = ⟨ x, A^* y⟩, etc.

(i) ⟨ Ax, y ⟩=x₁y₂/1+x₂y₃/2+…. This must be ⟨ x, A^* y⟩, and so

A^* y=(y₂/1, y₃/2, y₄/3, …).

(ii) ⟨ Rx, u ⟩ = ⟨ x,y ⟩  ⟨ z, u⟩ = ⟨ x, R^* u ⟩, where R^* u = ⟨ z, u⟩ y = ⟨ u, z ⟩ y.

(iii) Let’s take two vectors m+n and m′+n′, with m, m′∈ M and n, n′∈ M^⊥. Then

⟨ P_M(m+n), m′+n′ ⟩ = ⟨ m, m′+n′⟩= ⟨ m,m′⟩.

This is the inner product ⟨ m+n,m′⟩ so P_M^*(m′+n′)=m′, which means that P_M^*=P_M again.

19 Clearly

⟨ Ue_n, Ue_m⟩=

⎧
⎨
⎩

1	if m=n,
0	if m ≠ n,

since U preserves the inner product (see notes). Also, if ⟨ x,Ue_n ⟩=0 for all n, then ⟨ U^*x, e_n ⟩=0 for all n, so U^*x=0, because (e_n) is an o.n.b., and so x=0. Hence (Ue_n) is an o.n.b.

To show that the bilateral right shift V is unitary, you can check any of the equivalent definitions. It is perhaps easiest just to observe that V is clearly a surjection and that ||Vx||=||x|| for all x. Alternatively, you can check that V^* is the bilateral left shift, i.e., V^*=V⁻¹, by using the identity

⟨
⟨
⟨
⟨

∞

∑

n=−∞

x_ne_n , V^*

∞

∑

m=−∞

y_me_m

⟩
⟩
⟩
⟩

⟨
⟨
⟨
⟨

∞

∑

n=−∞

x_ne_n ,

∞

∑

m=−∞

y_me_m

⟩
⟩
⟩
⟩

⟨
⟨
⟨
⟨

∞

∑

n=−∞

x_ne_n+1 ,

∞

∑

m=−∞

y_me_m

⟩
⟩
⟩
⟩

∞

∑

n=−∞

x_n

y_n+1

which tells you that

V^*

∞

∑

m=−∞

y_me_m =

∞

∑

m=−∞

y_m+1e_m.

We saw in the lectures that S is not unitary, since SS^* ≠ I.

5<0

B.5 Solutions of Tutorial Problems V

20 Use the definition of adjoint:

⟨ M_f g, h ⟩ = ⟨ g, M_f^* h ⟩

for g, h ∈ L₂(−π,π), which means that

⟨ g, M_f^* h ⟩ =

∫

−π

f(t) g(t)

h(t)

 d t.

This is the inner product between g and the function taking values f(t)h(t), so that M_f^* g = M_f′g, where f′(t)=f(t).

Clearly M_f M_f^* g = M_f^* M_f g, and is the function whose value at t is f(t) f(t) g(t).

Now M_f is Hermitian if and only if M_f=M_f^*, or f=f′. So f must be real-valued.

Also M_f is unitary if and only if M_f^*=(M_f)⁻¹, which means that f(t)f(t)=1 for all t, i.e. |f(t)|=1 for all t.

21 Calculate: (I+T+…+Tⁿ⁻¹)(I−T)=(I−T)(I+T+…+Tⁿ⁻¹)=I−Tⁿ=I. So we have an inverse for I−T.

Of course T/λ is also nilpotent, so I−T/λ is invertible, and so (multiplying by λ, which is nonzero), we have λ I−T invertible, and λ ∉σ(T).

The spectrum is nonempty, so can only be {0}; indeed it’s obvious that T is not invertible when Tⁿ=0. Hence r(T)=0 as well.

22 (i) Since Te_k=λ_k e_k, where (e_n) is the usual orthonormal basis of l₂, we see that λ_k is an eigenvalue, with eigenvector e_k. Eigenvalues are always in the spectrum.

(ii) T−λ I takes (x_n) to ((λ_n−λ)x_n), and so its inverse must take (y_n) to (y_n/(λ_n−λ)). This is a bounded operator, since the sequence (λ_n−λ)⁻¹ is bounded when λ ∉Λ.

Now σ(T) is a closed set. It contains Λ, so contains Λ. Indeed σ(T)=Λ, as it contains no points outside Λ, by (ii). Thus any nonempty compact set is the spectrum of some operator!

23 Note that STS⁻¹−λ I = S(T−λ I)S⁻¹, and so STS⁻¹−λ I is invertible if and only if T−λ I is invertible—indeed in that case its inverse would be S(T−λ I)⁻¹S⁻¹. Hence σ(T)=σ(STS⁻¹).

Also, if Tu=λ u, then STS⁻¹ (Su)=STu=λ Su, and Su ≠ 0 if u ≠ 0. Thus any eigenvector u of T corresponds to an eigenvector Su of STS⁻¹, and vice-versa.

24 The fact that U is a bijection and an isometry follows from the fact that the functions e_n(t)=e^int/√2π, n ∈ ℤ form an orthonormal basis of L₂(−π,π) (see notes), so that a function f is in L₂(−π,π) if and only if f(t)=∑_−∞^∞a_n e_n, where (a_n) ∈ l₂, and also ||f||₂=||(a_n)||₂ (Parseval).

Now UVU⁻¹f= UVU⁻¹ ∑_n=−∞^∞a_n e_n = ∑_n=−∞^∞a_n e_n+1, because V is the shift.

But if f(t)=∑_n=−∞^∞a_n e_n(t), then the function ∑_n=−∞^∞a_n e_n+1(t) is just f(t)e^it, since e_n(t)e^it=e_n+1(t) for all t.

Now we work with the operator T=UVU⁻¹ = M_e on L₂(−π,π), where e(t)=e^it. Using Question 21, we see that this is a unitary operator and so σ(T) ⊆ T, but we can argue more directly.

The operator (T−λ I) is multiplication by e^it−λ and its inverse, if it exists, is multiplication by h_λ(t)=1/(e^it−λ). For λ ∉T, h_λ∈ C[−π,π] and so T−λ I has a bounded inverse. However, if λ ∈ T, then multiplication by h_λ does not give a bounded operator (indeed, M_{h_λ} e₀=h_λ/√2π, which is not even in L₂). Hence σ(V)=σ(T)=T.

Also T has no eigenvalues, as, no matter which λ ∈ ℂ we choose, there will be no nonzero function f such that f(t)e^it=λ f(t) for all t. Hence V has no eigenvalues either, by Question 24.

6<0

B.6 Solutions of Tutorial Problems VI

25 If T₁ and T₂ are compact, and (x_n) is bounded, then we can find a subsequence (x_n(k)) of (x_n) such that (T₁x_n(k)) converges, and a further subsequence (x_n(k(l))) such that both (T₁x_n(k(l))) and (T₂x_n(k(l))) converge. Then ((a₁T₁+a₂T₂)x_n(k(l))) converges for any a₁, a₂ ∈ ℂ, so a₁T₁+a₂T₂ is compact. Since the norm limit of compact operators is compact, they form a closed subspace.

Given (x_n) bounded, we can find a subsequence (x_n(k)) such that (Tx_n(k)) converges, and hence so does (ATx_n(k)), since A is continuous; hence AT is compact. Also (Ax_n) is bounded so there is a subsequence of (TAx_n) that converges, and TA is compact.

26 ∑_n=1^∞||Ae_n||² = ∑_n=1^∞∑_m=1^∞|⟨ Ae_n, f_m ⟩|², since (f_m) is an o.n.b. This equals

∞

∑

n=1

∞

∑

m=1

|⟨ e_n, A^*f_m ⟩|²,

or, summing over n first, ∑_m=1^∞||A^*f_m||², since (e_n) is also an o.n.b. Since the right hand side of the displayed formula doesn’t mention (e_n) it clearly makes no difference if we replace (e_n) by a different o.n.b. It is also clear that ||A||_HS=||A^*||_HS as the LHS is just ||A||_HS² and the RHS is ||A^*||_HS².

27 (a) ⟨ T^*Tx,y⟩=⟨ Tx, Ty⟩=⟨ x,T^*T y⟩, so T^*T is Hermitian.

Also ⟨ TT^*x,y⟩=⟨ T^*x,T^*y⟩ = ⟨ x,T^**T^*y⟩=⟨ x,TT^*y⟩, since T=T^**, and hence TT^* is Hermitian. Both are compact, since the product of a compact operator and a bounded operator is always compact (by Question 1).

(b) The point is that Te_n=α_nf_n, and so ∑||Te_n||² = ∑|α_n|²< ∞ if and only if (α_n) ∈ l₂.

If α_n → 0, then T is the limit of finite rank operators T_mx=∑_n=1^m α_n ⟨ x,e_n⟩ f_n (cf. what we proved in the course about diagonal operators), and if α_n ¬→0, then, for some δ>0, ||Te_n(k)||=|α_n(k)| ≥ δ, and (Te_n(k)) has no convergent subsequence—again, see how we did this for diagonal operators.

⟨ Te_n, f_m ⟩ = ⟨ e_n, T^* f_m ⟩=

⎧
⎨
⎩

α_n	if n=m,
0	otherwise.

Hence T^* maps f_m to α_me_m, so T^*x=∑_m=1^∞α_m⟨ x,f_m⟩ e_m.

This gives

T^*Tx=

∞

∑

n=1

α_n ⟨ x,e_n⟩ T^*f_n=

∞

∑

n=1

|α_n|² ⟨ x,e_n⟩ e_n,

and

TT^*x=

∞

∑

m=1

α_m

⟨ x,f_m⟩ Te_m=

∞

∑

m=1

|α_m|²⟨ x,f_m⟩ f_m.

28 The Neumann series is

(I−λ T)⁻¹=1+λ T + λ T² + …,

valid for sufficiently small λ (e.g. |λ|  ||T|| < 1).

Taking f(x)=x, we find that (Tf)(x)=∫₀¹ xy³   dy=x/4, and in general (Tⁿf)(x)=x/4ⁿ.

The solution we obtain is φ=(1−λ T)⁻¹f, which gives

φ(x)=x+λ x/4 + λ² x²/16 + …,

which converges to

φ(x)=x/(1−λ/4)=4x/(4−λ),

at least for |λ| < 4. It is easily seen that this solution is valid for all λ ≠ 4.

29 Calculate

(Tφ_k)(x)=

∫

−π

h(x−y) e^iky  d y.

Make the change of variables t=x−y to get

(Tφ_k)(x)=

∫

x+π

x−π

h(t)e^ik(x−t)  d t = 2π a_k e^ikx,

using orthogonality and periodicity properties, so that φ_k is an eigenvector with eigenvalue 2π a_k.

Now T is a Hilbert–Schmidt operator with an orthonormal basis of eigenvectors, namely (e_k)=(φ_k)/√2π. We can now work with either the (e_k) or the (φ_k). If φ has Fourier series ∑_n=−∞^∞d_n e^int, then φ−λ Tφ has Fourier series

∞

∑

n=−∞

c_ne^int=

∞

∑

n=−∞

d_n(1−λ λ_n) e^int,

where

λ_n=

⎧
⎨
⎩

0	if n=0,
2π /\|n\|	if n ≠ 0,

so the solution is

φ(t)=

∞

∑

n=−∞

c_n

1−λ λ_n

e^int.

B.7 Solutions of Tutorial Problems VII

30 Take e₁=t/||t||, and ||t||²=∫₋₁¹ t²  d t = 2/3, so e₁(t)=√3/2 t.

Let w₂(t)=t²−⟨ t²,e₁⟩ e₁=t², and normalize to get e₂(t)=√5/2 t².

Let w₃(t)=t⁴−⟨ t⁴,e₁⟩ e₁ − ⟨ t⁴,e₂⟩ e₂ = t⁴ − 0 − (2/7)(5/2)t² = t⁴− 5t²/7. Now

||w₃||²=

∫

−1

(t⁸−10t⁶/7+25t⁴/49)  d t= (2/9)−(20/49)+(10/49)= 8/441,

so we take e₃(t)=(21/√8)(t⁴−5t²/7).

The best approximation in X to f is g=∑_k=1³ ⟨ f,e_k⟩ e_k, giving

g(t)=0 + (2/3)(5/2)t² + (−8/105)(441/8)(t⁴−5t²/7),

which reduces to g(t)=14t²/3−21t⁴/5.

As a check, note that f−g is orthogonal to each of the functions t, t² and t⁴.

31 We see that φ_n(x)=⟨ x, u_n⟩, where u_n=(1,1,…,1,0,0,…), with n nonzero terms. Now ||φ_n||=||u_n||=√n.

32 To get the adjoint calculate

⟨ (A+iB)x,y⟩ = ⟨ Ax,y⟩ + i⟨ Bx,y⟩ = ⟨ x,Ay⟩ + i⟨ x,By⟩ = ⟨ x, (A−iB)y ⟩,

so T^*=A−iB.

Now A=(T+T^*)/2 and B=(T−T^*)/(2i) (very like the formulae for real and imaginary parts of a complex number).

Since these formulae do define self-adjoint operators A and B, it is clear that every operator T has a unique decomposition as T=A+iB.

Note

T^*T=(A−iB)(A+iB)=A²−iBA+iAB+B²

and

TT^*=(A+iB)(A−iB)=A²+iBA−iAB+B²,

so that T^*T−TT^*=2i(AB−BA), and T is normal if and only if AB=BA.

33 All we need to do is look for eigenvalues, as the spaces are finite-dimensional.

(i) Df=λ f is impossible unless λ=0, since the degree of Df is lower than the degree of f. So σ(D)={0}. Indeed Dⁿ⁺¹=0, so D is nilpotent, which also implies that its spectrum is {0}, see earlier examples sheets. The only eigenvectors in P_n are constant functions, so we do not get a basis of eigenvectors.

(ii) D(e^ikt)=ike^ikt, so σ(D)={0,± i, ± 2i, …, ± ni}. Now D has (2n+1) distinct eigenvalues, and T_n has an orthonormal basis of eigenvectors, namely (e^ikt/√2π)_k=−nⁿ.

34 We get φ=(I−λ T)⁻¹f=f+λ Tf + λ² T² f + ….

Now Tf(x)=∫₀^x t²  d t=x³/3, (T²f)(x)=∫₀^x (t⁵/3)  d t=x⁶/18, ….

In general (Tⁿ f)(x)=x³ⁿ/(3ⁿ n!). Summing the series we find that

φ(x)=exp(λ x³/3),

and the series converges for all λ ∈ ℂ.

C Course in the Nutshell

C.1 Some useful results and formulae (1)

1 A norm on a vector space, ||x||, satisfies ||x||≥ 0, ||x||=0 if and only if x=0, ||λ x||=|λ|  ||x||, and ||x+y|| ≤ ||x|| + ||y|| (triangle inequality). A norm defines a metric and a complete normed space is called a Banach space.

2 An inner-product space is a vector space (usually complex) with a scalar product on it, ⟨ x,y⟩ ∈ ℂ such that ⟨ x,y⟩=⟨ y,x⟩, ⟨ λ x,y⟩=λ⟨ x,y⟩, ⟨ x+y,z⟩ =⟨ x,z⟩ +⟨ y,z⟩ , ⟨ x,x⟩ ≥ 0 and ⟨ x,x⟩ =0 if and only if x=0. This defines a norm by ||x||²=⟨ x,x⟩ . A complete inner-product space is called a Hilbert space. A Hilbert space is automatically a Banach space.

3 The Cauchy–Schwarz inequality. |⟨ x,y⟩ | ≤ ||x|| ||y|| with equality if and only if x and y are linearly dependent.

4 Some examples of Hilbert spaces. (i) Euclidean ℂⁿ. (ii) l₂, sequences (a_k) with ||(a_k)||₂²=∑|a_k|² < ∞. In both cases ⟨ (a_k),(b_k)⟩=∑a_kb_k. (iii) L₂[a,b], functions on [a,b] with ||f||₂²=∫_a^b |f(t)|²   dt < ∞. Here ⟨ f,g ⟩=∫_a^b f(t) g(t) d t. (iv) Any closed subspace of a Hilbert space.

5 Other examples of Banach spaces. (i) C_b(X), continuous bounded functions on a topological space X. (ii) l_∞(X), all bounded functions on a set X. The supremum norms on C_b(X) and l_∞(X) make them into Banach spaces. (iii) Any closed subspace of a Banach space.

6 On incomplete spaces. The inner-product (L₂) norm on C[0,1] is incomplete. c₀₀ (sequences eventually zero), with the l₂ norm, is another incomplete i.p.s.

7 The parallelogram identity. ||x+y||² + ||x−y||² = 2||x||² + 2||y||² in an inner-product space. Not in general normed spaces.

8 On subspaces. Complete =⇒ closed. The closure of a linear subspace is still a linear subspace. Lin (A) is the smallest subspace containing A and CLin (A) is its closure, the smallest closed subspace containing A.

9 From now on we work in inner-product spaces.

10 The orthogonality. x ⊥ y if ⟨ x,y⟩ =0. An orthogonal sequence has ⟨ e_n,e_m⟩ =0 for n ≠ m. If all the vectors have norm 1 it is an orthonormal sequence (o.n.s.), e.g. e_n=(0,…,0,1,0,0,…) ∈ l₂ and e_n(t)=(1/√2π) e^int in L₂(−π,π).

11 Pythagoras’s theorem: if x⊥ y then ||x+y||²=||x||²+||y||².

12 The best approximation to x by a linear combination ∑_k=1ⁿλ_ke_k is ∑_k=1ⁿ ⟨ x,e_k⟩ e_k if the e_k are orthonormal. Note that ⟨ x,e_k⟩ is the Fourier coefficient of x w.r.t. e_k.

13 Bessel’s inequality. ||x||² ≥ ∑_k=1ⁿ |⟨ x,e_k⟩ |² if e₁,…,e_n is an o.n.s.

14 Riesz–Fischer theorem. For an o.n.s. (e_n) in a Hilbert space, ∑λ_n e_n converges if and only if ∑|λ_n|² < ∞; then ||∑λ_n e_n ||² = ∑|λ_n|².

15 A complete o.n.s. or orthonormal basis (o.n.b.) is an o.n.s. ( e_n) such that if ⟨ y,e_n⟩ =0 for all n then y=0. In that case every vector is of the form ∑λ_n e_n as in the R-F theorem. Equivalently: the closed linear span of the (e_n) is the whole space.

16 Gram–Schmidt orthonormalization process. Start with x₁, x₂, … linearly independent. Construct e₁, e₂, … an o.n.s. by inductively setting y_n+1=x_n+1−∑_k=1ⁿ ⟨ x_n+1,e_k⟩ e_k and then normalizing e_n+1=y_n+1/||y_n+1||.

17 On orthogonal complements. M^⊥ is the set of all vectors orthogonal to everything in M. If M is a closed linear subspace of a Hilbert space H then H=M ⊕ M^⊥. There is also a linear map, P_M the projection from H onto M with kernel M^⊥.

18 Fourier series. Work in L₂(−π,π) with o.n.s. e_n(t)=(1/√2π)e^int. Let CP(−π,π) be the continuous periodic functions, which are dense in L₂. For f ∈ CP(−π,π) write f_m=∑_n=−m^m ⟨ f,e_n⟩ e_n, m ≥ 0. We wish to show that ||f_m−f||₂ → 0, i.e., that (e_n) is an o.n.b.

19 The Fejér kernel. For f∈ CP(−π,π) write F_m=(f₀+…+f_m)/(m+1). Then F_m(x)=(1/2π) ∫_−π^πf(t) K_m(x−t)  d t where K_m(t)=(1/(m+1)) ∑_k=0^m ∑_n=−k^k e^int is the Fejér kernel. Also K_m(t)=(1/(m+1)) [sin² (m+1)t/2] / [sin² t/2].

20 Fejér’s theorem. If f ∈ CP(−π,π) then its Fejér sums tend uniformly to f on [−π,π] and hence in L₂ norm also. Hence CLin ((e_n)) ⊇ CP(−π,π) so must be all of L₂(−π,π). Thus (e_n) is an o.n.b.

21 Corollary. If f ∈ L₂(−π,π) then f(t)=∑c_n e^int with convergence in L₂, where c_n=(1/2π) ∫_−π^πf(t)e^−int d t.

22 Parseval’s formula. If f, g∈ L₂(−π,π) have Fourier series ∑c_n e^int and ∑ d_n e^int then (1/2π)⟨ f,g⟩ = ∑c_n d_n.

23 Weierstrass approximation theorem. The polynomials are dense in C[a,b] for any a<b (in the supremum norm).

C.2 Some useful results and formulae (2)

24 On dual spaces. A linear functional on a vector space X is a linear mapping α:X → ℂ (or to ℝ in the real case), i.e., α(ax+by)=aα(x)+bα(y). When X is a normed space, α is continuous if and only if it is bounded, i.e., sup{|α(x)|: ||x|| ≤ 1} < ∞. Then we define ||α|| to be this sup, and it is a norm on the space X^* of bounded linear functionals, making X^* into a Banach space.

25 Riesz–Fréchet theorem. If α:H → ℂ is a bounded linear functional on a Hilbert space H, then there is a unique y ∈ H such that α(x)=⟨ x,y⟩ for all x ∈ H; also ||α||=||y||.

26 On linear operator. These are linear mappings T: X → Y, between normed spaces. Defining ||T||=sup{||T(x)||: ||x|| ≤ 1}, finite, makes the bounded (i.e., continuous) operators into a normed space, B(X,Y). When Y is complete, so is B(X,Y). We get ||Tx|| ≤ ||T||   ||x||, and, when we can compose operators, ||ST|| ≤ ||S||   ||T||. Write B(X) for B(X,X), and for T ∈ B(X), ||Tⁿ|| ≤ ||T||ⁿ. Inverse S=T⁻¹ when ST=TS=I.

27 On adjoints. T ∈ B(H,K) determines T^* ∈ B(K,H) such that ⟨ Th, k ⟩_K = ⟨ h, T^*k ⟩_H for all h ∈ H, k ∈ K. Also ||T^*||=||T|| and T^**=T.

28 On unitary operator. Those U ∈ B(H) for which UU^*=U^*U=I. Equivalently, U is surjective and an isometry (and hence preserves the inner product).

Hermitian operator or self-adjoint operator. Those T ∈ B(H) such that T=T^*.

On normal operator. Those T ∈ B(H) such that TT^*=T^*T (so including Hermitian and unitary operators).

29 On spectrum. σ(T)={λ ∈ ℂ: (T−λ I) is not invertible in B(X)}. Includes all eigenvalues λ where Tx=λ x for some x ≠ 0, and often other things as well. On spectral radius: r(T)=sup{|λ|: λ∈ σ(T)}. Properties: σ(T) is closed, bounded and nonempty. Proof: based on the fact that (I−A) is invertible for ||A|| < 1. This implies that r(T) ≤ ||T||.

30 The spectral radius formula. r(T)=inf_{n ≥ 1} ||Tⁿ||^1/n = lim_{n → ∞} ||Tⁿ||^1/n.

Note that σ(Tⁿ)={λⁿ: λ ∈ σ(T)} and σ(T^*)={λ: λ ∈ σ(T)}. The spectrum of a unitary operator is contained in {|z|=1}, and the spectrum of a self-adjoint operator is real (proof by Cayley transform: U=(T−iI)(T+iI)⁻¹ is unitary).

31 On finite rank operator. T∈ F(X,Y) if Im  T is finite-dimensional.

On compact operator. T ∈ K(X,Y) if: whenever (x_n) is bounded, then (Tx_n) has a convergent subsequence. Now F(X,Y) ⊆ K(X,Y) since bounded sequences in a finite-dimensional space have convergent subsequences (because when Z is f.d., Z is isomorphic to l₂ⁿ, i.e., ∃ S:l₂ⁿ → Z with S, S⁻¹ bounded). Also limits of compact operators are compact, which shows that a diagonal operator Tx=∑λ_n⟨ x,e_n ⟩ e_n is compact iff λ_n → 0.

32 Hilbert–Schmidt operators. T is H–S when ∑ ||Te_n||² < ∞ for some o.n.b. (e_n). All such operators are compact—write them as a limit of finite rank operators T_k with T_k∑_n=1^∞a_ne_n=∑_n=1^k a_n (Te_n). This class includes integral operators T: L₂(a,b)→ L₂(a,b) of the form

(Tf)(x)=

∫

K(x,y) f(y) d y,

where K is continuous on [a,b] × [a,b].

33 On spectral properties of normal operators. If T is normal, then (i) kerT=kerT^*, so Tx=λ x =⇒ T^*x=λx; (ii) eigenvectors corresponding to distinct eigenvalues are orthogonal; (iii) ||T||=r(T).

If T ∈ B(H) is compact normal, then its set of eigenvalues is either finite or a sequence tending to zero. The eigenspaces are finite-dimensional, except possibly for λ=0. All nonzero points of the spectrum are eigenvalues.

34 On spectral theorem for compact normal operators. There is an orthonormal sequence (e_k) of eigenvectors of T, and eigenvalues (λ_k), such that Tx=∑_k λ_k ⟨ x,e_k ⟩ e_k. If (λ_k) is an infinite sequence, then it tends to 0. All operators of the above form are compact and normal.

Corollary. In the spectral theorem we can have the same formula with an orthonormal basis, adding in vectors from kerT.

35 On general compact operators. We can write Tx=∑µ_k ⟨ x, e_k ⟩ f_k, where (e_k) and (f_k) are orthonormal sequences and (µ_k) is either a finite sequence or an infinite sequence tending to 0. Hence T ∈ B(H) is compact if and only if it is the norm limit of a sequence of finite-rank operators.

36 On integral equations. Fredholm equations on L₂(a,b) are Tφ=f or φ−λ Tφ=f, where (Tφ)(x)=∫_a^b K(x,y)φ(y) d y. Volterra equations similar, except that T is now defined by (Tφ)(x)=∫_a^x K(x,y)φ(y) d y.

37 Neumann series. (I−λ T)⁻¹=1+λ T+λ² T² + …, for ||λ T ||<1.

On separable kernel. K(x,y)=∑_j=1ⁿ g_j(x)h_j(y). The image of T (and hence its eigenvectors for λ≠ 0) lies in the space spanned by g₁,…,g_n.

38 Hilbert–Schmidt theory. Suppose that K ∈ C([a,b]× [a,b]) and K(y,x)=K(x,y). Then (in the Fredholm case) T is a self-adjoint Hilbert–Schmidt operator and eigenvectors corresponding to nonzero eigenvalues are continuous functions. If λ≠ 0 and 1/λ ∉σ(T), the the solution of φ−λ Tφ=f is

φ=

∞

∑

k=1

⟨ f,v_k ⟩

1−λλ_k

v_k.

39 Fredholm alternative. Let T be compact and normal and λ≠ 0. Consider the equations (i) φ−λ Tφ=0 and (ii) φ−λ Tφ=f. Then EITHER (A) The only solution of (i) is φ=0 and (ii) has a unique solution for all f OR (B) (i) has nonzero solutions φ and (ii) can be solved if and only if f is orthogonal to every solution of (i).

D Supplementary Sections

D.1 Reminder from Complex Analysis

The analytic function theory is the most powerful tool in the operator theory. Here we briefly recall few facts of complex analysis used in this course. Use any decent textbook on complex variables for a concise exposition. The only difference with our version that we consider function f(z) of a complex variable z taking value in an arbitrary normed space V over the field ℂ. By the direct inspection we could check that all standard proofs of the listed results work as well in this more general case.

Definition 1 A function f(z) of a complex variable z taking value in a normed vector space V is called differentiable at a point z₀ if the following limit (called derivative of f(z) at z₀) exists:

f′(z₀)=

lim

Δ z→ 0

f(z₀+Δ z)−f(z₀)

Δ z

. (108)

Definition 2 A function f(z) is called holomorphic (or analytic) in an open set Ω⊂ℂ it is differentiable at any point of Ω.

Theorem 3 (Laurent Series) Let a function f(z) be analytical in the annulus r<z<R for some real r<R, then it could be uniquely represented by the Laurent series:

f(z)=

∞

∑

k=−∞

c_k z^k, for some c_k∈ V. (109)

Theorem 4 (Cauchy–Hadamard) The radii r′ and R′, (r′<R′) of convergence of the Laurent series (109) are given by

r′=

liminf

n→ ∞

⎪⎪
⎪⎪

c_n

⎪⎪
⎪⎪

^1/n and

R′

limsup

n→ ∞

⎪⎪
⎪⎪

c_n

⎪⎪
⎪⎪

^1/n. (110)

References

[1]: Nicholas Young. An Introduction to Hilbert Space. Cambridge University Press, Cambridge, 1988. MR # 90e:46001.
[2]: Walter Rudin. Real and Complex Analysis. McGraw-Hill Book Co., New York, third edition, 1987. MR # 88k:00002.
[3]: Béla Bollobás. Linear analysis. An introductory course. Cambridge University Press, Cambridge, second edition, 1999. MR # 2000g:46001.
[4]: Erwin Kreyszig. Introductory functional analysis with applications. John Wiley & Sons Inc., New York, 1989. MR # 90m:46003.
[5]: Alexander A. Kirillov and Alexei D. Gvishiani. Theorems and Problems in Functional Analysis. Problem Books in Mathematics. Springer-Verlag, New York, 1982.
[6]: Michael Reed and Barry Simon. Functional Analysis, volume 1 of Methods of Modern Mathematical Physics. Academic Press, Orlando, second edition, 1980.
[7]: Vladimir V. Kisil. Wavelets in Banach spaces. Acta Appl. Math., 59(1):79–109, 1999. arXiv:math/9807141, On-line.
[8]: Vladimir V. Kisil. Meeting Descartes and Klein somewhere in a noncommutative space. In A. Fokas, J. Halliwell, T. Kibble, and B. Zegarlinski, editors, Highlights of mathematical physics (London, 2000), pages 165–189. Amer. Math. Soc., Providence, RI, 2002. arXiv:math-ph/0112059.
[9]: Vladimir V. Kisil. The real and complex techniques in harmonic analysis from the point of view of covariant transform. Eurasian Math. J., 5:95–121, 2014. arXiv:1209.5072. On-line.
[10]: A. N. Kolmogorov and S. V. Fomīn. Introductory real analysis. Dover Publications Inc., New York, 1975. Translated from the second Russian edition and edited by Richard A. Silverman, Corrected reprinting.
[11]: A. N. Kolmogorov and S. V. Fomin. Measure, Lebesgue integrals, and Hilbert space. Translated by Natascha Artin Brunswick and Alan Jeffrey. Academic Press, New York, 1961.
[12]: Roger Howe. On the role of the Heisenberg group in harmonic analysis. Bull. Amer. Math. Soc. (N.S.), 3(2):821–843, 1980.
[13]: Georg Polya. How To Solve It. Doubleday Anchor Books, New York, 1957. https://archive.org/details/howtosolveitnewa00pl.
[14]: Georg Polya. Mathematical Discovery. John Wiley & Sons, Inc., New York, 1962. https://archive.org/details/GeorgePolyaMathematicalDiscovery.

Index

↑—monotonically converges from below, 14.2
є/3 argument, 8.1
B(X), 6.3
B(X,Y), 6.3
CP[−π,π], 5.1
c₀, 11.1
F(X,Y), 8.1
H₂, A.4
K(X,Y), 8.1
L₁, 13.2
L₂[a,b], 2.3
L_∞, 13.2
L, 12.2
L(X), 6.3
L(X,Y), 6.3
L_p, 14.1
l₂, 2.2, 11.1
l_∞, 11.1
l_p, 11.1
S, Schwartz space, 15.4
S(X), 13.2
I_X, 6.1
kerT, 6.1
1_X (the identity map on X), 16.2.3
supp, 14.4
||·||₁ norm, 1.1.1, 1.1.1, 2.1
||·||₂ norm, 1.1.1, 1.1.1, 2.1
||·||_∞ norm, 1.1.1, 1.1.1, 2.1
⊥, 3.1
σ-additivity, see countable additivity
σ-algebra, 12.1
- Borel, 14.3
σ-finite
- measure, 12.1, 12.2
σ-ring, 12.1
⊔, 12.1
d₁ metric, 1.1.1, 1.1.1, 1.1.1
d₂ metric, 1.1.1, 1.1.1
d_∞ metric, 1.1.1, 1.1.1
S(k), 15.1
Z, 6.1
CLin(A), 2.4
l₁ⁿ, 2.1
l₂ⁿ, 2.1
l_∞ⁿ, 2.1
C_b(X), 2.1
l_∞(X), 2.1
Lin(A), 2.4
“zigzag” function, 16.3.1
a.e., see almost everywhere
absolute continuity, 13.3
absolutely continuous charge, 13.5
abstract completion of metric space, 1.2.1
accumulation point, 1.2.2
additivity, 12.1
- countable, 12.1
adjoint operator, 6.4
adjoints, C.2
algebra
- convolution, 15.1
- of sets, 12.1
- representation, 15.1
almost everywhere, 12.2
- convergence, 13.1
alternative
- Fredholm, 10
analysis, 2.3
- Fourier, 0.3
analytic function, D.1
approximation, 3.2
- by polynomials, 5.4
- identity, of the, 5.2, 15.4
- Weierstrass, of, 5.4
argument
- є/3, 8.1
- diagonal, 8.1
average
- continuous, 14.2
axiom of choice, 11.4
Baire’s categories, 16.3.1
Baire’s category theorem, 16.3.1
Banach Fixed point theorem, 16.2.1
Banach space, 2.1, 11.1, C.1
Banach–Steinhaus Uniform Boundedness theorem, 16.3.2
Bessel’s inequality, 3.2, C.1
Borel σ-algebra, 14.3
Borel set, 14.3
ball
- closed, 1.1.2
- open, 1.1.2
- unit, 2.1
basis, C.2
- orthonormal, 3.3
best approximation, C.1
bilateral right shift, A.4
bounded
- functional, 4.1
- operator, 6.1, 11.2
- set, 1.2.2
bounded linear functional, 4.1
bounded linear operator, 6.1
Cantor
- function, 12.4, see Cantor function
- set, 12.2, 12.4
Cantor function, 2.3
Carathéodory
- measurable set, 12.2
Cauchy integral formula, 5.3
Cauchy sequence, 1.2.1, 2.1
Cauchy–Schwarz inequality, 1.1.1, C.1
Cauchy–Schwarz–Bunyakovskii inequality, 2.2
Cayley transform, 7.3
Cayley transform, C.2
Cesàro sum, 5.1
Chebyshev
- inequality, 13.3
Chebyshev polynomials, 3.4
Closed Graph theorem, 16.3.4
calculus
- functional, 7
category
- first Baire, 16.3.1
- second Baire, 16.3.1
category theory, 2
character, 15.2
charge, 12.3
- absolutely continuous, 13.5
- Hahn decomposition, 12.3
- regular, 14.3
- variation, 12.3, 14.3
charges
- equivalent, 13.5
closable
- operator, 16.3.4
closed
- ball, 1.1.2
- operator, 16.3.4
- set, 1.1.2
closed linear span, 2.4
closure, 1.1.2
- operator, of, 16.3.4
coefficient
- Fourier, 3.3
coefficients
- Fourier, 0.1.2, 15.3
coherent states, 5.3
compact, 1.2.2
- sequentially, 1.2.2
compact operator, 8.1, C.2
- singular value decomposition, 9.2
compact set, 8.1
complement
- orthogonal, 3.5
complete
- measure, 12.2
complete metric space, 2.1
complete o.n.s., C.1
complete orthonormal sequence, 3.3
complete space, 1.2.1
condition
- Lipschitz, 16.2.2
conditions
- integrability, ??
continuity
- absolute, 13.3
- open sets, 1.1.3
- sequential, 1.1.3
continuous
- map, 1.1.3
- uniformly, 1.1.3
continuous on average, 14.2
contraction, 16.2.1
convergence
- almost everywhere, 13.1
- in measure, 13.1
- monotone
  - theorem B. Levi, on, 13.3
- uniform, 13.1
  - on compacts, 15.2
convergent
- sequence, 1.1.3
convex, 3.1
convex set, 2.1, 16.2.1
convolution, 15.1
- algebra, 15.1
- kernel, 15.1
convolution operator, 15.1
coordinates, 2
corollary about orthoprojection, 6.2
cosine
- Fourier coefficients, 0.1.2
countable
- additivity, 12.1
countably
- countable sub-additivity, 12.2
countably additive
- charge, 12.3
cover
- open, 1.2.2
Dirichlet kernel, 16.3.2
decreasing
- rapidly, 15.4
dense
- set, 1.1.2
derivative, D.1
diagonal argument, 8.1
diagonal operator, 6.5
diffeomorphism, 16.2.3
differentiable function, D.1
differential equation
- separation of variables, 0.2.1
discrete
- metric, 1.1.1
disjoint
- pairwise, 12.1
disjunctive measures, 12.3
distance, see metric, 2, 2.1
distance function, 2.1
domain
- fundamental, 0.1.1
- operator, of, 16.3.4
dual group, 15.2
dual space, 4.1
dual spaces, C.2
duality
- Pontryagin’s, 15.2
Egorov’s theorem, 13.1
eigenspace, 9.1
eigenvalue of operator, 7.1
eigenvalues, C.2
eigenvector, 7.1
equation
- Fredholm, 10
  - first kind, 10
  - second kind, 10, 10
- heat, 5.4
- Volterra, 10
equivalent
- norm, 16.3.3
equivalent charges, 13.5
essentially bounded function, 13.2
examples of Banach spaces, C.1
examples of Hilbert spaces, C.1
Fatou’s lemma, 13.3
Fejér
- theorem, 5.2
Fejér kernel, 5.2, C.1
Fejér sum, 5.1
Fejér’s theorem, C.1
Fourier
- coefficients, 15.3
- cosine coefficients, 0.1.2
- integral, 15.3, 15.5
  - inverse, 15.5
- sine coefficients, 0.1.2
- transform, 15.3
  - inverse, 15.5
Fourier transform
- windowed, 5.4
Fourier analysis, 0.3
Fourier coefficient, 3.3
Fourier coefficients, 0.1.2
Fourier series, 0.1.2, C.1
Fourier, Joseph, 0.3
Fredholm equation, 10
- first kind, 10
Fredholm alternative, 10, C.2
Fredholm equation
- second kind, 10
Fredholm equation of the second kind, 10
Fubini theorem, 13.4
finite
- measure, 12.1
finite rank operator, 8.1, C.2
first category, 16.3.1
first resolvent identity, 7.1
fixed point, 16.2.1
formula
- integral
  - Cauchy, 5.3
- Parseval’s, of, 5.3
frame of references, 2
function
- “zigzag” , 16.3.1
- analytic, D.1
- bounded
  - essentially, 13.2
- Cantor, 2.3, 12.4
- differentiable, D.1
- essentially bounded, 13.2
- generating, 15.3
- holomorphic, D.1
- indicator, 13.2
- integrable, 13.2
- seesummable function, 13.2
- measurable, 13.1
- rapidly decreasing, 15.4
- simple, 13.2
  - integral, 13.2
  - summable, 13.2
- square integrable, 2.3
- step, 14.2, 15.5
- summable, 13.2, 13.2
- support, 14.4
functional, see linear functional
- linear, 4.1
  - bounded, 4.1
- positive, 14.4
functional calculus, 7
functions of operators, 7
fundamental domain, 0.1.1
Gaussian, 15.4, 15.5
Gram–Schmidt orthogonalisation, 3.4
Gram–Schmidt orthonormalization process, C.1
general compact operators, C.2
generating function, 15.3
graph
- operator, of, 16.3.4
group
- dual, 15.2
- representation, 15.1
group representations, 5.3
Hölder’s Inequality, 11.1
Haar measure, 15.1
Hahn decomposition of a charge, 12.3
Hahn-Banach theorem, 11.4
Hardy space, A.4
Heine–Borel theorem, 8.1
Heine–Borel theorem, 1.2.2
Hermitian operator, 6.5
Hermitian operator, C.2
Hilbert space, 2.2
Hilbert space, C.1
Hilbert–Schmidt operator, 8.2
Hilbert–Schmidt norm, 8.2, A.6
Hilbert–Schmidt operators, C.2
Hilbert–Schmidt theory, C.2
heat equation, 5.4
holomorphic function, D.1
Inverse, C.2
Inverse Function theorem, 16.2.3
identity
- approximation of the, 5.2, 15.4
- Parseval’s, 5.3, 15.5
- Plancherel, 15.5, 15.5
- parallelogram, of, 2.2
identity operator, 6.1
implicit function theorem, 16.2.3
incomplete spaces, C.1
indicator function, 13.2
inequality
- Bessel’s, 3.2
- Cauchy–Schwarz, 1.1.1
- Cauchy–Schwarz–Bunyakovskii, of, 2.2
- Chebyshev, 13.3
- Hölder’s, 11.1
- Minkowski’s , 11.1
- triangle, of, 2.1, 2.1
inner product, 1.1.1, 2.2
- space, 1.1.1
inner product space, 2.2
- complete, see Hilbert space
inner-product space, C.1
integrability conditions, ??
integrable
- function, 13.2
- seesummable function, 13.2
integral
- Fourier, 15.3, 15.5
- Lebesgue, 2.3, 13.2
- monotonicity, 13.2
- Riemann, 2.3
- simple function, 13.2
integral formula
- Cauchy, 5.3
integral equations, C.2
integral operator, 8.2, 10
- with separable kernel, 10
interior, 1.1.2
invariant measure, 15.1
inventor’s paradox, 16.1
inverse operator, 6.3
inverse Fourier transform, 15.5
inverse image, 1.1.3
invertible operator, 6.3
isometric
- isomorphism, 11.2
isometric metric space, 1.1.1
isometry, 1.1.1, 6.5, 11.2
isomorphic
- isometrically, 11.2
isomorphic spaces, 11.2
isomorphism, 11.2, A.5
- isometric, 11.2
Jacobian, 16.2.1
kernel, 10
- Dirichlet, 16.3.2
- Fejér, 5.2
kernel of convolution, 15.1
kernel of integral operator, 8.2
kernel of linear functional, 4.1
kernel of linear operator, 6.1
Laguerre polynomials, 3.4
Lebesgue
- integral, 13.2
- measure
  - outer, 12.2
- set
  - measurable, 12.2
- theorem, 12.2
- theorem on dominated convergence, 13.3
Lebesgue integration, 2.3
Lebesgue measure, 12.4
Legendre polynomials, 3.4

Levi’s theorem on monotone convergence, 13.3
Lipschitz condition, 16.2.2
ladder
- Cantor, see Cantor function
leading particular case, 16.1, 16.2.3
left inverse, 6.3
left shift operator, 6.3
lemma
- about inner product limit, 2.4
- Fatou’s, 13.3
- Riesz–Fréchet, 4.2
- Urysohn’s , 14.3
- Zorn, 11.4
length of a vector, 2
limit
- two monotonic, 13.1, 13.3, 13.4
  - for sets, 13.1
linear
- operator, 6.1
linear functional
- kernel, 4.1
linear operator
- image, of, 6.1
linear space, 2
linear functional, 4.1, C.2
linear operator, C.2
- norm, of, 6.1
- kernel, of, 6.1
linear span, 2.4
local-C^k- diffeomorphism, 16.2.3
locally compact topology, 15.1
locally convex topological vector space, 16.4
locally invertible function, 16.2.3
Minkowski’s inequality, 11.1
map
- continuous, 1.1.3
- locally invertible, 16.2.3
- open, 16.3.3
mathematical way of thinking, 3
mathematical way of thinking, 2
mean value theorem, 16.2.1
measurable
- function, 13.1
- set
  - Carathéodory, 12.2
  - Lebesgue, 12.2
measure, 12.1
- σ-finite, 12.1, 12.2
- absolutely continuous, 13.5
- complete, 12.2
- disjunctive, 12.3
- finite, 12.1
- Haar, 15.1
- invariant, 15.1
- Lebesgue, 12.4
  - outer, 12.2
- outer, 12.2
  - monotonicity, 12.2, 12.2
- product, 12.4, 13.4
- regular, 14.3
- signed, see charge
metric, 1.1.1, 2.1, 11.1
- d₁, 1.1.1, 1.1.1, 1.1.1
- d₂, 1.1.1, 1.1.1
- d_∞, 1.1.1, 1.1.1
- discrete, 1.1.1
metric space, 2
metric space, 1.1.1
- abstract completion, 1.2.1
- isometric, 1.1.1
monotonicity
- outer measure, 12.2
monotonicity of integral, 13.2
multiplication operator, 6.1
Neumann series, 7.1, 10, C.2
nearest point theorem, 3.1
neighbourhood, 1.1.2
nilpotent, A.5
norm, 1.1.1, 2.1, 11.1, C.1
- seesup-norm, 11.1
- ||·||₁, 1.1.1, 1.1.1, 2.1
- ||·||₂, 1.1.1, 1.1.1, 2.1
- ||·||_∞, 1.1.1, 1.1.1, 2.1
- equivalent, 16.3.3
- Hilbert–Schmidt, 8.2, A.6
- sup, 11.1
norm of linear operator, 6.1
normal operator, 6.5, C.2
normed space, 2.1
normed space, 1.1.1
- complete, see Banach space
nowhere dense set, 16.3.1
open
- ball, 1.1.2
- cover, 1.2.2
- map, 16.3.3
- set, 1.1.2
open mapping theorem, 16.3.3
operator, 11.2
- adjoint, 6.4
- bounded, 11.2
- closable, 16.3.4
- closed, 16.3.4
- closure, 16.3.4
- compact, 8.1
  - singular value decomposition, 9.2
- convolution, 15.1
- diagonal, 6.5
  - unitary, 6.5
- domain, 16.3.4
- eigenvalue of, 7.1
- eigenvector of, 7.1
- finite rank, 8.1
- graph, 16.3.4
- Hermitian, 6.5
- Hilbert–Schmidt, 8.2
- identity, 6.1
- integral, 8.2, 10
  - kernel of, 8.2
  - with separable kernel, 10
- inverse, 6.3
  - left, 6.3
  - right, 6.3
- invertible, 6.3
- isometry, 6.5
- linear, 6.1
  - bounded, 6.1
  - image, of, 6.1
  - kernel, of, 6.1
  - norm, of, 6.1
- nilpotent, A.5
- normal, 6.5
- of multiplication, 6.1
- self-adjoint, see Hermitian operator
- shift
  - left, 6.3
  - right, 6.1
- shift on a group, 15.1
- spectrum of, 7.1
- unitary, 6.5
- Volterra, A.4
- zero, 6.1
operators
- compact
  - normal
    - spectral theorem, 9.2
- normal
  - compact
    - spectral theorem, 9.2
orthogonal
- complement, 3.5
- projection, 6.2
orthogonal complement, 3.5
orthogonal polynomials, 3.4
orthogonal projection, 6.2
orthogonal complements, C.1
orthogonal sequence, 3.1, C.1
orthogonal system, 3.1
orthogonalisation
- Gram–Schmidt, of, 3.4
orthogonality, 2.2, 3, C.1
orthonormal basis, 3.3
orthonormal basis
- theorem, 3.3
orthonormal basis (o.n.b.), C.1
orthonormal sequence, 3.1
- complete , 3.3
orthonormal sequence (o.n.s.), C.1
orthonormal system, 3.1
orthoprojection, 6.2
- corollary, about, 6.2
outer measure, 12.2
- monotonicity, 12.2
Parseval’s
- formula, 5.3
- identity, 5.3, 15.5
Parseval’s formula, C.1
Picard iteration, 16.2.2
Picard–Lindelöf theorem, 16.2.2
Plancherel
- identity, 15.5, 15.5
Pontryagin’s duality, 15.2
Pythagoras’ school, 5.4
Pythagoras’ theorem, 3.1
Pythagoras’s theorem, C.1
pairwise
- disjoint, 12.1
parallelogram identity, 2.2, C.1
partial sum of the Fourier series, 5.1
period, 0.1.1
periodic, 0.1.1
perpendicular
- theorem on, 3.2
point
- accumulation, 1.2.2
- fixed, 16.2.1
polynomial
- trigonometric, 0.1.2
polynomial approximation, 5.4
polynomials
- Chebyshev, 3.4
- Laguerre, 3.4
- Legendre, 3.4
- orthogonal, 3.4
positive
- functional, 14.4
pre-image, 1.1.3
product
- inner, 2.2
- scalar, 2.2
product measure, 12.4, 13.4
projection
- orthogonal, 6.2
quantum mechanics, 2, 2.3
Radon–Nikodym theorem, 13.5
Riesz representation, 14.4
Riesz–Fischer theorem, C.1
Riesz–Fisher theorem, 3.3
Riesz–Fréchet lemma, 4.2
Riesz–Fréchet theorem, C.2
radius
- spectral, 7.2
regular charge, 14.3
regular measure, 14.3
representation
- of group, 5.3
- algebra, of, 15.1
- group, of, 15.1
- Riesz, 14.4
resolvent, 7, 7.1
- identity, first, 7.1
- set, 7.1
resolvent set, 7.1
right shift operator, 6.1
right inverse, 6.3
Schwartz space, 15.4
Segal–Bargmann space, 2.3
Stone’s theorem, 16.1
Stone–Weierstrass theorem, 16.1
- complex version, 16.1
scalar product, 2.2
school
- Pythagoras’, 5.4
second category, 16.3.1
self-adjoint operator, see Hermitian operator, C.2
semi-norm, 16.4
semiring, 12.1
separable Hilbert space, 3.4
separable kernel, 10, C.2
separable metric space, 16.1
separation of variables, 0.2.1
sequence
- Cauchy, 1.2.1, 2.1
- convergent, 1.1.3
- orthogonal, 3.1
- orthonormal, 3.1
  - complete , 3.3
sequential continuity, 1.1.3
sequentially compact, 1.2.2
series
- Fourier, 0.1.2
- Neumann, 7.1, 10
set
- compact, 8.1
- Borel, 14.3
- bounded, 1.2.2
- Cantor, 12.2, 12.4
- closed, 1.1.2
- convex, 2.1, 3.1, 16.2.1
- dense, 1.1.2
- measurable
  - Carathéodory, 12.2
  - Lebesgue, 12.2
- nowhere dense, 16.3.1
- open, 1.1.2
- resolvent, 7.1
- symmetric difference, 12.2
shift
- bilaterial right, A.4
shift operator, 15.1
signed measure, see charge
simple function, 13.2
- integral, 13.2
- summable, 13.2
sine
- Fourier coefficients, 0.1.2
singular value decomposition of compact operator, 9.2
space
- Banach, 2.1, 11.1
- complete, 1.2.1
- dual, 4.1
- Hardy, A.4
- Hilbert, 2.2
  - separable, 3.4
- inner product, 1.1.1, 2.2
  - complete, see Hilbert space
- linear, 2
- locally convex, 16.4
- metric, 1.1.1, 2
  - complete, 2.1
  - isometric, 1.1.1
- normed, 1.1.1, 2.1
  - complete, see Banach space
- of bounded linear operators, 6.3
- Schwartz, 15.4
- Segal–Bargmann, 2.3
- separable metric, 16.1
- vector, see linear space
space of finite sequences, 2.3
span
- linear, 2.4
  - closed, 2.4
spectral radius:, C.2
spectral properties of normal operators, C.2
spectral radius, 7.2
spectral radius formula, C.2
spectral theorem for compact normal operators, 9.2, C.2
spectrum, 7.1, C.2
statement
- Fejér, see theorem
- Gram–Schmidt, see theorem
- Riesz–Fisher, see theorem
- Riesz–Fréchet, see lemma
step
- function, 14.2, 15.5
sub-additive
- countable sub-additivity, 12.2
subcover, 1.2.2
subsequence
- convergent
  - quickly, 13.2, 13.3
- quickly convergent, 13.2, 13.3
subspace, 2.3
subspaces, C.1
sum
- Cesàro, of, 5.1
- Fejér, of, 5.1
summable
- function, 13.2, 13.2
- simple function, 13.2
sup-norm, 11.1
support of function, 14.4
symmetric difference of sets, 12.2
synthesis, 2.3
system
- orthogonal, 3.1
- orthonormal, 3.1
theorem
- Baire’s category, 16.3.1
- Banach fixed point, 16.2.1
- Banach–Steinhaus Uniform Boundedness, 16.3.2
- closed graph, 16.3.4
- Egorov, 13.1
- Fejér, of, 5.2
- Fubini, 13.4
- Gram–Schmidt, of, 3.4
- Hahn-Banach, 11.4
- Heine–Borel, 1.2.2, 8.1
- implicit function, 16.2.3
- inverse function, 16.2.3
- Lebesgue, 12.2
- Lebesgue on dominated convergence, 13.3
- mean value, 16.2.1
- monotone convergence, B. Levi, 13.3
- on nearest point , 3.1
- on orthonormal basis, 3.3
- on perpendicular, 3.2
- open mapping, 16.3.3
- Picard–Lindelöf , 16.2.2
- Pythagoras’, 3.1
- Radon–Nikodym, 13.5
- Riesz–Fisher, of, 3.3
- Stone’s, 16.1
- Stone–Weierstrass, 16.1
  - complex version, 16.1
- spectral for compact normal operators, 9.2
- Weierstrass approximation, 5.4, 16.1
- Zermelo, 11.4
thinking
- mathematical, 2, 3
topology
- locally compact, 15.1
transform
- Cayley, 7.3
- Fourier, 15.3
  - windowed, 5.4
- wavelet, 5.3
triangle inequality, 2.1
triangle inequality, 2.1, C.1
trigonometric
- polynomial, 0.1.2
two monotonic limits, 13.1, 13.3, 13.4
- for sets, 13.1
Urysohn’s lemma, 14.3
uniform convergence, 13.1
- on compacts, 15.2
uniformly
- continuous, 1.1.3
unit ball, 2.1
unitary operator, 6.5
unitary operator, C.2
Volterra equation, 10
Volterra operator, A.4
variation of a charge, 12.3, 14.3
vector
- length of, 2
vector space, 2
vectors
- orthogonal, 3.1
Weierstrass approximation theorem, 5.4, 16.1, C.1
wavelet transform, 5.3
wavelets, 5.3, 5.4
windowed Fourier transform, 5.4
Zermelo’s theorem, 11.4
Zorn’s Lemma, 11.4
zero operator, 6.1

1: Some more “strange” types of orthogonality can be seen in the paper Elliptic, Parabolic and Hyperbolic Analytic Function Theory–1: Geometry of Invariants.
2: Spectrum—ghost, spirit(lat.).


site search by freefind	advanced

Last modified: February 16, 2025.

This document was translated from L^AT_EX by H^EV^EA.

T₁x₁⁽¹⁾	T₁x₂⁽¹⁾	T₁x₃⁽¹⁾	…	T₁x_n⁽¹⁾	…	→	a₁
T₂x₁⁽²⁾	T₂x₂⁽²⁾	T₂x₃⁽²⁾	…	T₂x_n⁽²⁾	…	→	a₂
T₃x₁⁽³⁾	T₃x₂⁽³⁾	T₃x₃⁽³⁾	…	T₃x_n⁽³⁾	…	→	a₃
…	…	…	…	…	…
T_nx₁⁽ⁿ⁾	T_nx₂⁽ⁿ⁾	T_nx₃⁽ⁿ⁾	…	T_nx_n⁽ⁿ⁾	…	→	a_n
…	…	…	…	…	…		↓
						↘
							a

Introduction to Functional Analysis

Vladimir V. Kisil School of Mathematics, University of Leeds, Leeds LS2 9JT, UK email: kisilv@maths.leeds.ac.uk Web: http://v-v-kisil.scienceontheweb.net/

February 16, 2025

Contents

Notations and Assumptions

Integrability conditions

0 Motivating Example: Fourier Series

0.1 Fourier series: basic notions

0.1.1 2π-periodic functions

0.1.2 Integrating the complex exponential function

0.2 The vibrating string

0.2.1 Separation of variables

0.2.2 Principle of Superposition

0.3 Historic: Joseph Fourier

1 Basics of Metric Spaces

1.1 Metric Spaces

1.1.1 Metric spaces: definition and examples

1.1.2 Open and closed sets

1.1.3 Convergence and continuity

1.2 Useful properties of metric spaces

1.2.1 Cauchy sequences and completeness

1.2.2 Compactness

2 Basics of Linear Spaces

2.1 Banach spaces (basic definitions only)

2.2 Hilbert spaces

2.3 Subspaces

2.4 Linear spans

3 Orthogonality

3.1 Orthogonal System in Hilbert Space

3.2 Bessel’s inequality

3.3 The Riesz–Fischer theorem

3.4 Construction of Orthonormal Sequences

3.5 Orthogonal complements

4 Duality of Linear Spaces

4.1 Dual space of a normed space

4.2 Self-duality of Hilbert space

5 Fourier Analysis

5.1 Fourier series

5.2 Fejér’s theorem

5.3 Parseval’s formula

5.4 Some Application of Fourier Series

6 Operators

6.1 Linear operators

6.2 Orthoprojections

6.3 B(H) as a Banach space (and even algebra)

6.4 Adjoints

6.5 Hermitian, unitary and normal operators

7 Spectral Theory

7.1 The spectrum of an operator on a Hilbert space

7.2 The spectral radius formula

7.3 Spectrum of Special Operators

8 Compactness

8.1 Compact operators

8.2 Hilbert–Schmidt operators

9 Compact normal operators

9.1 Spectrum of normal operators

9.2 Compact normal operators

10 Integral equations

11 Banach and Normed Spaces

11.1 Normed spaces

11.2 Bounded linear operators

11.3 Dual Spaces

11.4 Hahn–Banach Theorem

11.5 C(X) Spaces

12 Measure Theory

12.1 Basic Measure Theory

12.2 Extension of Measures

12.3 Complex-Valued Measures and Charges

12.4 Constructing Measures, Products

13 Integration

13.1 Measurable functions

13.2 Lebesgue Integral

13.3 Properties of the Lebesgue Integral

13.4 Integration on Product Measures

13.5 Absolute Continuity of Measures

14 Functional Spaces

14.1 Integrable Functions

14.2 Dense Subspaces in Lp

14.3 Continuous functions

14.4 Riesz Representation Theorem

Vladimir V. Kisil
School of Mathematics, University of Leeds, Leeds LS2 9JT, UK
email: kisilv@maths.leeds.ac.uk
Web: http://v-v-kisil.scienceontheweb.net/

14.2 Dense Subspaces in L_p