Orthogonality

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

3 Orthogonality

Pythagoras is forever!

The catchphrase from TV commercial of Hilbert Spaces course

As was mentioned in the introduction the Hilbert spaces is an analog of our 3D Euclidean space and theory of Hilbert spaces similar to plane or space geometry. One of the primary result of Euclidean geometry which still survives in high school curriculum despite its continuous nasty de-geometrisation is Pythagoras’ theorem based on the notion of orthogonality¹.

So far we was concerned only with distances between points. Now we would like to study angles between vectors and notably right angles. Pythagoras’ theorem states that if the angle C in a triangle is right then c²=a²+b², see Figure 5 .

Figure 5: The Pythagoras’ theorem c²=a²+b²

It is a very mathematical way of thinking to turn this property of right angles into their definition, which will work even in infinite dimensional Hilbert spaces.

Look for a triangle, or even for a right triangle

A universal advice in solving problems from elementary geometry.

3.1 Orthogonal System in Hilbert Space

In inner product spaces it is even more convenient to give a definition of orthogonality not from Pythagoras’ theorem but from an equivalent property of inner product.

Definition 1 Two vectors x and y in an inner product space are orthogonal if ⟨ x,y ⟩=0, written x ⊥ y.

An orthogonal sequence (or orthogonal system ) e_n (finite or infinite) is one in which e_n ⊥ e_m whenever n≠ m.

An orthonormal sequence (or orthonormal system ) e_n is an orthogonal sequence with ||e_n||=1 for all n.

Exercise 2

Show that if x ⊥ x then x=0 and consequently x ⊥ y for any y∈ H.
Show that if all vectors of an orthogonal system are non-zero then they are linearly independent.

Example 3 These are orthonormal sequences:

Basis vectors (1,0,0), (0,1,0), (0,0,1) in ℝ³ or ℂ³.
Vectors e_n=(0,…,0,1,0,…) (with the only 1 on the nth place) in l₂. (Could you see a similarity with the previous example?)
Functions e_n(t)=1/(√2π) e^int , n∈ℤ in C[0,2π]:
⟨ e_n,e_m ⟩=
2π

∫

0

1

2π

e^inte^−imtdt = ⎧
⎨
⎩
1, n=m;

0, n≠ m.

(19)

Exercise 4 Let A be a subset of an inner product space V and x⊥ y for any y∈ A. Prove that x⊥ z for all z∈CLin(A).

Theorem 5 (Pythagoras’) If x ⊥ y then ||x+y||²=||x||²+||y||². Also if e₁, …, e_n is orthonormal then

⎪⎪
⎪⎪
⎪⎪
⎪⎪

∑

a_k e_k

⎪⎪
⎪⎪
⎪⎪
⎪⎪

²=⟨

∑

a_k e_k,

∑

a_k e_k ⟩=

∑

⎪
⎪

a_k

⎪
⎪

².

Proof. A one-line calculation. □

The following theorem provides an important property of Hilbert spaces which will be used many times. Recall, that a subset K of a linear space V is convex if for all x, y∈ K and λ∈ [0,1] the point λ x +(1−λ)y is also in K. Particularly any subspace is convex and any unit ball as well (see Exercise 8(1)).

Theorem 6 (about the Nearest Point) Let K be a non-empty convex closed subset of a Hilbert space H. For any point x∈ H there is the unique point y∈ K nearest to x.

Proof. Let d=inf_{y∈ K} d(x,y), where d(x,y)—the distance coming from the norm ||x||=√⟨ x,x ⟩ and let y_n a sequence points in K such that lim_n→
∞d(x,y_n)=d. Then y_n is a Cauchy sequence. Indeed from the parallelogram identity for the parallelogram generated by vectors x−y_n and x−y_m we have:

⎪⎪
⎪⎪

y_n−y_m

⎪⎪
⎪⎪

²=2

⎪⎪
⎪⎪

x−y_n

⎪⎪
⎪⎪

²+2

⎪⎪
⎪⎪

x−y_m

⎪⎪
⎪⎪

²−

⎪⎪
⎪⎪

2x−y_n−y_m

⎪⎪
⎪⎪

².

Note that ||2x−y_n−y_m||²=4||x−y_n+y_m/2||²≥ 4d² since y_n+y_m/2∈ K by its convexity. For sufficiently large m and n we get ||x−y_m||²≤ d +є and ||x−y_n||²≤ d +є, thus ||y_n−y_m||≤ 4(d²+є)−4d²=4є, i.e. y_n is a Cauchy sequence.

Let y be the limit of y_n, which exists by the completeness of H, then y∈ K since K is closed. Then d(x,y)=lim_{n→ ∞}d(x,y_n)=d. This show the existence of the nearest point. Let y′ be another point in K such that d(x,y′)=d, then the parallelogram identity implies:

⎪⎪
⎪⎪

y−y′

⎪⎪
⎪⎪

²=2

⎪⎪
⎪⎪

x−y

⎪⎪
⎪⎪

²+2

⎪⎪
⎪⎪

x−y′

⎪⎪
⎪⎪

²−

⎪⎪
⎪⎪

2x−y−y′

⎪⎪
⎪⎪

²≤ 4d²−4d²=0.

This shows the uniqueness of the nearest point. □

Exercise^* 7 The essential rôle of the parallelogram identity in the above proof indicates that the theorem does not hold in a general Banach space.

Show that in ℝ² with either norm ||·||₁ or ||·||_∞ form Example 9 the nearest point could be non-unique;
Could you construct an example (in Banach space) when the nearest point does not exists?

Liberte, Egalite, Fraternite!

A longstanding ideal approximated in the real life by something completely different

3.2 Bessel’s inequality

For the case then a convex subset is a subspace we could characterise the nearest point in the term of orthogonality.

Theorem 8 (on Perpendicular) Let M be a subspace of a Hilbert space H and a point x∈ H be fixed. Then z∈ M is the nearest point to x if and only if x−z is orthogonal to any vector in M.

(i) (ii)

Figure 6: (i) A smaller distance for a non-perpendicular direction; and

(ii) Best approximation from a subspace

Proof. Let z is the nearest point to x existing by the previous Theorem. We claim that x−z orthogonal to any vector in M, otherwise there exists y∈ M such that ⟨ x−z,y ⟩≠ 0. Then

⎪⎪
⎪⎪

x−z−є y

⎪⎪
⎪⎪

x−z

⎪⎪
⎪⎪

²−2є ℜ⟨ x−z,y ⟩+є²

⎪⎪
⎪⎪

x−z

⎪⎪
⎪⎪

²,

if є is chosen to be small enough and such that є ℜ⟨ x−z,y ⟩ is positive, see Figure 6(i). Therefore we get a contradiction with the statement that z is closest point to x.

On the other hand if x−z is orthogonal to all vectors in H₁ then particularly (x−z)⊥ (z−y) for all y∈ H₁, see Figure 6(ii). Since x−y=(x−z)+(z−y) we got by the Pythagoras’ theorem:

⎪⎪
⎪⎪

x−y

⎪⎪
⎪⎪

²=

⎪⎪
⎪⎪

x−z

⎪⎪
⎪⎪

² +

⎪⎪
⎪⎪

z−y

⎪⎪
⎪⎪

².

So ||x−y||²≥ ||x−z||² and the are equal if and only if z=y. □

Exercise 9 The above proof does not work if ⟨ x−z,y ⟩ is an imaginary number, what to do in this case?

Consider now a basic case of approximation: let x∈ H be fixed and e₁, …, e_n be orthonormal and denote H₁=Lin{e₁,…,e_n}. We could try to approximate x by a vector y=λ₁ e₁+⋯ +λ_n e_n ∈ H₁.

Corollary 10 The minimal value of ||x−y|| for y∈ H₁ is achieved when y=∑₁ⁿ⟨ x,e_i ⟩ e_i.

Proof. Let z=∑₁ⁿ⟨ x,e_i ⟩ e_i, then ⟨ x−z,e_i ⟩=⟨ x,e_i ⟩−⟨ z,e_i ⟩=0. By the previous Theorem z is the nearest point to x. □

Figure 7: Best approximation by three trigonometric polynomials

Example 11

In ℝ³ find the best approximation to (1,0,0) from the plane V:{x₁+x₂+x₃=0}. We take an orthonormal basis e₁=(2^−1/2, −2^−1/2,0), e₂=(6^−1/2, 6^−1/2, −2· 6^−1/2) of V (Check this!). Then:

z=⟨ x,e₁ ⟩e₁+⟨ x,e₂ ⟩e₂=

⎛
⎜
⎜
⎝

,−

⎞
⎟
⎟
⎠

⎛
⎜
⎜
⎝

,−

⎞
⎟
⎟
⎠

⎛
⎜
⎜
⎝

,−

⎞
⎟
⎟
⎠

In C[0,2π] what is the best approximation to f(t)=t by functions a+be^it+ce^−it? Let

e₀=

√

2π

, e₁=

√

2π

e^it, e₋₁=

√

2π

e^−it.

We find:

⟨ f,e₀ ⟩

2π

∫

√

2π

dt=

⎡
⎢
⎢
⎢
⎢
⎣

t²

√

2π

⎤
⎥
⎥
⎥
⎥
⎦

2π

√

π^3/2;

⟨ f,e₁ ⟩

2π

∫

t e^−it

√

2π

dt=i

√

2π

(Check this!)

⟨ f,e₋₁ ⟩

2π

∫

t e^it

√

2π

dt=−i

√

2π

(Why we may not check this one?)

Then the best approximation is (see Figure 7):

f₀(t)

⟨ f,e₀ ⟩e₀+⟨ f,e₁ ⟩e₁+⟨ f,e₋₁ ⟩e₋₁

√

π^3/2

√

2π

+ie^it−ie^−it=π−2sint.

Corollary 12 (Bessel’s inequality) If (e_i) is orthonormal then

⎪⎪
⎪⎪

²≥

∑

i=1

⎪
⎪

⟨ x,e_i ⟩

⎪
⎪

².

Proof. Let z= ∑₁ⁿ⟨ x,e_i ⟩e_i then x−z⊥ e_i for all i therefore by Exercise 4 x−z⊥ z. Hence:

⎪⎪
⎪⎪

²+

⎪⎪
⎪⎪

x−z

⎪⎪
⎪⎪

≥

⎪⎪
⎪⎪

²=

∑

i=1

⎪
⎪

⟨ x,e_i ⟩

⎪
⎪

².

□

—Did you say “rice and fish for them”?

A student question

3.3 The Riesz–Fischer theorem

When (e_i) is orthonormal we call ⟨ x,e_n ⟩ the nth Fourier coefficient of x (with respect to (e_i), naturally).

Theorem 13 (Riesz–Fisher) Let (e_n)₁^∞ be an orthonormal sequence in a Hilbert space H. Then ∑₁^∞λ_n e_n converges in H if and only if ∑₁^∞| λ_n |² < ∞. In this case ||∑₁^∞λ_n e_n||²=∑₁^∞| λ_n |².

Proof. Necessity: Let x_k=∑₁^k λ_n e_n and x=lim_{k→ ∞} x_k. So ⟨ x,e_n ⟩=lim_k→
∞⟨ x_k,e_n ⟩=λ_n for all n. By the Bessel’s inequality for all k

⎪⎪
⎪⎪

²≥

∑

⎪
⎪

⟨ x,e_n ⟩

⎪
⎪

²=

∑

⎪
⎪

λ_n

⎪
⎪

²,

hence ∑₁^k | λ_n |² converges and the sum is at most ||x||².

Sufficiency: Consider ||x_k−x_m||=||∑_m^k λ_n e_n||=(∑_m^k | λ_n |²)^1/2 for k>m. Since ∑_m^k | λ_n |² converges x_k is a Cauchy sequence in H and thus has a limit x. By the Pythagoras’ theorem ||x_k||²=∑₁^k | λ_n |² thus for k→ ∞ ||x||²=∑₁^∞| λ_n |² by the Lemma about inner product limit. □

Observation: the closed linear span of an orthonormal sequence in any Hilbert space looks like l₂, i.e. l₂ is a universal model for a Hilbert space.

By Bessel’s inequality and the Riesz–Fisher theorem we know that the series ∑₁^∞⟨ x,e_i ⟩ e_i converges for any x∈ H. What is its limit?

Let y=x− ∑₁^∞⟨ x,e_i ⟩ e_i, then

⟨ y,e_k ⟩=⟨ x,e_k ⟩−

∞

∑

⟨ x,e_i ⟩ ⟨ e_i,e_k ⟩=⟨ x,e_k ⟩−⟨ x,e_k ⟩ =0 for all k. (20)

Definition 14 An orthonormal sequence (e_i) in a Hilbert space H is complete if the identities ⟨ y,e_k ⟩=0 for all k imply y=0.

A complete orthonormal sequence is also called orthonormal basis in H.

Theorem 15 (on Orthonormal Basis) Let e_i be an orthonormal basis in a Hilber space H. Then for any x∈ H we have

∞

∑

n=1

⟨ x,e_n ⟩e_n and

⎪⎪
⎪⎪

²=

∞

∑

n=1

⎪
⎪

⟨ x,e_n ⟩

⎪
⎪

².

Proof. By the Riesz–Fisher theorem, equation (20) and definition of orthonormal basis. □

There are constructive existence theorems in mathematics.

An example of pure existence statement

3.4 Construction of Orthonormal Sequences

Natural questions are: Do orthonormal sequences always exist? Could we construct them?

Theorem 16 (Gram–Schmidt) Let (x_i) be a sequence of linearly independent vectors in an inner product space V. Then there exists orthonormal sequence (e_i) such that

Lin{x₁,x₂,…,x_n}=Lin{e₁,e₂,…,e_n}, for all n.

Proof. We give an explicit algorithm working by induction. The base of induction: the first vector is e₁=x₁/||x₁||. The step of induction: let e₁, e₂, …, e_n are already constructed as required. Let y_n+1=x_n+1−∑_i=1ⁿ⟨ x_n+1,e_i ⟩e_i. Then by (20) y_n+1 ⊥ e_i for i=1,…,n. We may put e_n+1=y_n+1/||y_n+1|| because y_n+1≠ 0 due to linear independence of x_k’s. Also

Lin{e₁,e₂,…,e_n+1}	=	Lin{e₁,e₂,…,y_n+1}
	=	Lin{e₁,e₂,…,x_n+1}
	=	Lin{x₁,x₂,…,x_n+1}.

So (e_i) are orthonormal sequence. □

Example 17 Consider C[0,1] with the usual inner product (17) and apply orthogonalisation to the sequence 1, x, x², …. Because ||1||=1 then e₁(x)=1. The continuation could be presented by the table:

e₁(x)=1

y₂(x)=x−⟨ x,1 ⟩1=x−

⎪⎪
⎪⎪

y₂

⎪⎪
⎪⎪

²=

∫

(x−

)² d x=

, e₂(x)=

√

(x−

)

y₃(x)=x²−⟨ x²,1 ⟩1−⟨ x²,x−

⟩(x−

)· 12 , …, e₃=

y₃

⎪⎪
⎪⎪

y₃

⎪⎪
⎪⎪

… … …

Figure 8: Five first Legendre P_i and Chebyshev T_i polynomials

Example 18 Many famous sequences of orthogonal polynomials, e.g. Chebyshev, Legendre, Laguerre, Hermite, can be obtained by orthogonalisation of 1, x, x², …with various inner products.

Legendre polynomials in C[−1,1] with inner product
⟨ f,g ⟩=
1

∫

−1

f(t)

g(t)

 d t. (21)
Chebyshev polynomials in C[−1,1] with inner product
⟨ f,g ⟩=
1

∫

−1

f(t)

g(t)

dt

√

1−t²

(22)
Laguerre polynomials in the space of polynomials P[0,∞) with inner product
⟨ f,g ⟩=
∞

∫

0

f(t)

g(t)

e^−t d t.

See Figure 8 for the five first Legendre and Chebyshev polynomials. Observe the difference caused by the different inner products (21) and (22). On the other hand note the similarity in oscillating behaviour with different “frequencies”.

Another natural question is: When is an orthonormal sequence complete?

Proposition 19 Let (e_n) be an orthonormal sequence in a Hilbert space H. The following are equivalent:

(e_n) is an orthonormal basis.
CLin((e_n))=H.
||x||²=∑₁^∞| ⟨ x,e_n ⟩ |² for all x∈ H.

Proof. Clearly 19(1) implies 19(2) because x=∑₁^∞⟨ x,e_n ⟩e_n in CLin((e_n)) and ||x||²=∑₁^∞⟨ x,e_n ⟩e_n by Theorem 15. The same theorem tells that 19(1) implies 19(3).

If (e_n) is not complete then there exists x∈ H such that x≠ 0 and ⟨ x,e_k ⟩=0 for all k, so 19(3) fails, consequently 19(3) implies 19(1).

Finally if ⟨ x,e_k ⟩=0 for all k then ⟨ x,y ⟩=0 for all y∈Lin((e_n)) and moreover for all y∈CLin((e_n)), by the Lemma on continuity of the inner product. But then x∉CLin((e_n)) and 19(2) also fails because ⟨ x,x ⟩=0 is not possible. Thus 19(2) implies 19(1). □

Corollary 20 A separable Hilbert space (i.e. one with a countable dense set) can be identified with either l₂ⁿ or l₂, in other words it has an orthonormal basis (e_n) (finite or infinite) such that

∞

∑

n=1

⟨ x,e_n ⟩e_n and

⎪⎪
⎪⎪

²=

∞

∑

n=1

⎪
⎪

⟨ x,e_n ⟩

⎪
⎪

².

Proof. Take a countable dense set (x_k), then H=CLin((x_k)), delete all vectors which are a linear combinations of preceding vectors, make orthonormalisation by Gram–Schmidt the remaining set and apply the previous proposition. □

Most pleasant compliments are usually orthogonal to our real qualities.

An advise based on observations

3.5 Orthogonal complements

Orthogonality allow us split a Hilbert space into subspaces which will be “independent from each other” as much as possible.

Definition 21 Let M be a subspace of an inner product space V. The orthogonal complement , written M^⊥, of M is

M^⊥={x∈ V: ⟨ x,m ⟩=0 ∀ m∈ M}.

Theorem 22 If M is a closed subspace of a Hilbert space H then M^⊥ is a closed subspace too (hence a Hilbert space too).

Proof. Clearly M^⊥ is a subspace of H because x, y∈ M^⊥ implies ax+by∈ M^⊥:

⟨ ax+by,m ⟩= a⟨ x,m ⟩+ b⟨ y,m ⟩=0.

Also if all x_n∈ M^⊥ and x_n→ x then x∈ M^⊥ due to inner product limit Lemma. □

Theorem 23 Let M be a closed subspace of a Hilber space H. Then for any x∈ H there exists the unique decomposition x=m+n with m∈ M, n∈ M^⊥ and ||x||²=||m||²+||n||². Thus H=M⊕ M^⊥ and (M^⊥)^⊥=M.

Proof. For a given x there exists the unique closest point m in M by the Theorem on nearest point and by the Theorem on perpendicular (x−m)⊥ y for all y∈ M.

So x= m + (x−m)= m+n with m∈ M and n∈ M^⊥. The identity ||x||²=||m||²+||n||² is just Pythagoras’ theorem and M∩ M^⊥={0} because null vector is the only vector orthogonal to itself.

Finally (M^⊥)^⊥=M. We have H=M⊕ M^⊥=(M^⊥)^⊥⊕ M^⊥, for any x∈(M^⊥)^⊥ there is a decomposition x=m+n with m∈ M and n∈ M^⊥, but then n is orthogonal to itself and therefore is zero. □


site search by freefind	advanced

Last modified: February 16, 2025.