Linear algebra — Quick Math Intuitions https://quickmathintuitions.org/category/linear-algebra/ Sharing quick intuitions for math ideas Thu, 10 Nov 2022 08:26:03 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.4 Overdetermined and underdetermined systems of equations put simply https://quickmathintuitions.org/intuition-for-overdetermined-and-underdetermined-systems-of-equations/?pk_campaign=&pk_source= https://quickmathintuitions.org/intuition-for-overdetermined-and-underdetermined-systems-of-equations/#respond Thu, 08 Apr 2021 13:14:32 +0000 http://quickmathintuitions.org/?p=37 Systems of equations are an evolution of equations, but they are often misunderstood. This article aims at providing real world examples and intuitions for systems of equations, and in particular…

The post Overdetermined and underdetermined systems of equations put simply appeared first on Quick Math Intuitions.

]]>
Systems of equations are an evolution of equations, but they are often misunderstood. This article aims at providing real world examples and intuitions for systems of equations, and in particular for overdetermined and undetermined systems.

Intuition for systems of equations

Intuitively, we can think of a system of equations as a set of requests. Let’s imagine to have a group of people in front of us, and to have to give a task to each of them. An informal example system could be the following:

  •  Anna, solve a system of linear equations;
  • George, go to the beach and have fun;
  • Luke, prevent Anna from ringing social services.

In this form, a solution to the system consists in a list of pairings person-task that satisfies the demands detailed above. In other words, giving a solution to the system amounts to saying what Anna should do, what George should do, and what Luke should do, so that the demands are satisfied. In the example above, Anna should solve a system of equations, George should go to the beach, and Luke should prevent Anna from ringing social services.

It seems pretty obvious, but this intuition will be useful when covering over/underdetermined systems.

Overdetermined systems of equations

Let’s think of having to give orders to a large number of people. It might happen that, when getting to the last person, we forget to whom we have already given an order and to whom not, and we might end up repeating some orders:

  • Anna, do the laundry;
  • George, go to the beach;
  • Luke, get Anna’s laundry dirty;
  • Sophie, prevent Luke from dirtying the laundry;
  • George, go to the beach.

Here George has received his order twice. In these cases, we say that the system is overdetermined, because it has more orders than people. The example above is innocuous, because George is simply told to the same thing twice. The simplest mathematical example of such a system is when two equations are proportional to each other:

\begin{cases} x = 1 \\ 2x = 2 \end{cases}

This is an overdetermined system with a solution: x=1. The second equation is just redundant, like a game in which the second rule states to follow the first.

How about the following instead:

  • Anna, do the laundry;
  • George, go to the beach;
  • Luke, get Anna’s laundry dirty;
  • Sophie, prevent Luke from dirtying the laundry;
  • George, bake a cake.

Here George gets two clashing orders, and is rightfully confused: he cannot go to the beach and bake a cake at the same time. He is going to disappoint us no matter what. Indeed, this system is not only overdetermined, because there are more orders than people, but is also without solution. In fact, we are unable to come up with a list of pairings people-tasks as before. If George would go to the beach, he would be ignoring the baking order; if he would bake a cake, he would be ignoring the beach order. There is no way out: there is no solution! It is a bit like a game where the second rule says not to follow the first: it is impossible to play a game like that!

The simplest mathematical example of such a system is:

\begin{cases} x = 1 \\ x = 2 \end{cases}

which does not have a solution because we ask x to be 1 and 2 at the same time — a bit like asking your neighbor to be male and female at the same time (but not queer).

So once again: when a system of equations has more equations than unknowns, we say it is overdetermined. It means that too many rules are being imposed at once, and some of them may be conflicting. However, it is false to state that an overdetermined system does not have any solution: it may or it may not. If the surplus commands are just reformulations of other orders, then it is not a problem: the system does have a solution.

Underdetermined systems of equations

If we give less orders than the number of people, we say the system is underdetermined. When this happens, at least one person must have not received any command. This time, the idea is that people who do not receive any command are free to do whatever they want.

For example, let’s imagine again to have Anna, George, and Luke lined in front of us. If our commands are:

  • Anna, do the laundry;
  • George, go to the beach.

then Luke has not received any order. Maybe he will go to the park, maybe he will prevent Anna from ringing social services… he is free to do whatever he wants: the options are infinite! In these cases, we say that Luke is a free variable. As long as Anna and George stick to what they are told, each of Luke’s options makes for a solution: that is why the system has an infinite number of solutions.

As a mathematical example, think of being asked to find values for x,y,z satisfying the following system:

\begin{cases} x = 1 \\ y = 2 \end{cases}

great, but what about z? Here z is a free variable and solutions are infinite.

However, there can also be undetermined systems with no solution. That is the case when we give too few orders, and some of them are conflicting with each other. Again with our favorite trio:

  • Anna, do the laundry;
  • George, go to the beach;
  • Anna, go to the park.

Not only are we not saying anything to Luke here, but we are also giving clashing orders to Anna. So even if the absence of commands to Luke would allow infinite solutions, Anna’s impossibility to satisfy the constraints makes it so that no solution exists.

Examples and final remarks

All in all, there are no strict rules. What appears to be an overdetermined system could turn out to be an underdetermined one, and an underdetermined system could have no solution.

Finally, notice that in mathematical reality commands usually address more than one person at a time. A system of equations in real life is something like:

\begin{cases} x + y = 1 \\ x - y = 2 \end{cases}

Here the intuition gets trickier, because each command mixes at least two people, and is hard to render in natural language. Still, the orders analogy is useful in understanding what underdetermined and overdetermined systems are and why they have infinite or no solutions.

Ex. 1 \begin{cases} x - 2y = 1 - z \\ x + z - 1 = 2y \\ x + z = 1 \\ x = 3 - z \end{cases}     in \mathbb{R}^3
An apparently overdetermined system which is actually underdetermined and does not even have a solution.

Ex. 2 \begin{cases} x + z = 1 \\ x = 3 - z \end{cases}     in \mathbb{R}^3
An overdetermined which does not have a solution.

 

The post Overdetermined and underdetermined systems of equations put simply appeared first on Quick Math Intuitions.

]]>
https://quickmathintuitions.org/intuition-for-overdetermined-and-underdetermined-systems-of-equations/feed/ 0
Projection methods in linear algebra numerics https://quickmathintuitions.org/projection-methods-linear-algebra-numerics/?pk_campaign=&pk_source= https://quickmathintuitions.org/projection-methods-linear-algebra-numerics/#respond Mon, 07 Dec 2020 15:33:15 +0000 https://quickmathintuitions.org/?p=409 Linear algebra classes often jump straight to the definition of a projector (as a matrix) when talking about orthogonal projections in linear spaces. As often as it happens, it is…

The post Projection methods in linear algebra numerics appeared first on Quick Math Intuitions.

]]>
Linear algebra classes often jump straight to the definition of a projector (as a matrix) when talking about orthogonal projections in linear spaces. As often as it happens, it is not clear how that definition arises. This is what is covered in this post.

Orthogonal projection: how to build a projector

Case 1 – 2D projection over (1,0)

It is quite straightforward to understand that orthogonal projection over (1,0) can be practically achieved by zeroing out the second component of any 2D vector, at last if the vector is expressed with respect to the canonical basis \{ e_1, e_2 \}. Albeit an idiotic statement, it is worth restating: the orthogonal projection of a 2D vector amounts to its first component alone.

How can this be put math-wise? Since we know that the dot product evaluates the similarity between two vectors, we can use that to extract the first component of a vector v. Once we have the magnitude of the first component, we only need to multiply that by e_1 itself, to know how much in the direction of e_1 we need to go. For example, starting from v = (5,6), first we get the first component as v \cdot e_1 = (5,6) \cdot (1,0) = 5; then we multiply this value by e_1 itself: 5e_1 = (5,0). This is in fact the orthogonal projection of the original vector. Writing down the operations we did in sequence, with proper transposing, we get

    \[e_1^T (e_1 v^T) = \begin{bmatrix} 1 \\ 0 \end{bmatrix} ([1, 0] \begin{bmatrix} 5 \\ 6 \end{bmatrix}) .\]

One simple and yet useful fact is that when we project a vector, its norm must not increase. This should be intuitive: the projection process either takes information away from a vector (as in the case above), or rephrases what is already there. In any way, it certainly does not add any. We may rephrase our opening fact with the following proposition:

PROP 1: ||v|| \geq ||Projection(v)||.

This is can easily be seen through the pitagorean theorem (and in fact only holds for orthogonal projection, not oblique):

    \[||v||^2 = ||proj_u(v)||^2 + ||v - proj_u(v)||^2 \geq ||proj_u(v)||^2\]

Case 2 – 2D projection over (1,1)

Attempt to apply the same technique with a random projection target, however, does not seem to work. Suppose we want to project over (1,1). Repeating what we did above for a test vector [3,0], we would get

    \[\begin{bmatrix} 1 \\ 1 \end{bmatrix} ([3, 0] \begin{bmatrix} 1 \\ 1 \end{bmatrix}) =  [3,3].\]

This violates the previously discovered fact the norm of the projection should be \leq than the original norm, so it must be wrong. In fact, visual inspection reveals that the correct orthogonal projection of [3,0] is [\frac{3}{2}, \frac{3}{2}].

The caveat here is that the vector onto which we project must have norm 1. This is vital every time we care about the direction of something, but not its magnitude, such as in this case. Normalizing [1,1] yields [\frac{1}{\sqrt 2}, \frac{1}{\sqrt 2}]. Projecting [3,0] over [\frac{1}{\sqrt 2}, \frac{1}{\sqrt 2}] is obtained through

    \[\begin{bmatrix} \frac{1}{\sqrt 2} \\ \frac{1}{\sqrt 2} \end{bmatrix} ([3, 0] \begin{bmatrix} \frac{1}{\sqrt 2} \\ \frac{1}{\sqrt 2} \end{bmatrix}) =  [\frac{3}{2}, \frac{3}{2}],\]

which now is indeed correct!

PROP 2: The vector on which we project must be a unit vector (i.e. a norm 1 vector).

Case3 – 3D projection on a plane

A good thing to think about is what happens when we want to project on more than one vector. For example, what happens if we project a point in 3D space onto a plane? The ideas is pretty much the same, and the technicalities amount to stacking in a matrix the vectors that span the place onto which to project.

Suppose we want to project the vector v = [5,7,9] onto the place spanned by \{ [1,0,0], [0,1,0] \}. The steps are the same: we still need to know how much similar v is with respect to the other two individual vectors, and then to magnify those similarities in the respective directions.

    \[\begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} 5 \\ 7 \\ 9 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} 5 \\ 7 \end{bmatrix} = 5 \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} + 7 \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 5 \\ 7 \\ 0 \end{bmatrix}\]

The only difference with the previous cases being that vectors onto which to project are put together in matrix form, in a shape in which the operations we end up making are the same as we did for the single vector cases.

The rise of the projector

As we have seen, the projection of a vector v over a set of orthonormal vectors Z is obtained as

    \[Projection_Z(v) = Z^T Z v^T .\]

And up to now, we have always done first the last product Z v^T, taking advantage of associativity. It should come as no surprise that we can also do it the other way around: first Z^T Z and then afterwards multiply the result by v^T. This Z^T Z makes up the projection matrix. However, the idea is much more understandable when written in this expanded form, as it shows the process which leads to the projector.

THOREM 1: The projection of v over an orthonormal basis Z is

    \[Projection_Z(v) = Z^T Z v^T = \underbrace{P}_{Projector} v^T .\]

So here it is: take any basis of whatever linear space, make it orthonormal, stack it in a matrix, multiply it by itself transposed, and you get a matrix whose action will be to drop any vector from any higher dimensional space onto itself. Neat.

Projector matrix properties

  • The norm of the projected vector is less than or equal to the norm of the original vector.
  • A projection matrix is idempotent: once projected, further projections don’t do anything else. This, in fact, is the only requirement that defined a projector. The other fundamental property we had asked during the previous example, i.e. that the projection basis is orthonormal, is a consequence of this. This is the definition you find in textbooks: that P^2 = P. However, if the projection is orthogonal, as we have assumed up to now, then we must also have P = P^T.
  • The eigenvalues of a projector are only 1 and 0. For an eigenvalue \lambda,

        \[\lambda v = Pv = P^2v = \lambda Pv = \lambda^2 v \Rightarrow \lambda = \lambda^2 \Rightarrow \lambda = \{0,1\}\]

  • It exists a basis X of \mathbb{R}^n such that it is possible to write P as P = [I_k \ 0_{n-k}], with k being the rank of P. If we further decompose X = [X_1, X_2], with X_1 being N \times k and X_2 being N \times N-k, the existence of the basis X shows that P really sends points from \mathbb{R}^N into Im(X_1) = Im(P) and points from \mathbb{R}^N - P(\mathbb{R}^N) into Ker(P). It also shows that \mathbb{R}^N = Im(P) + Ker(P).

Model Order Reduction

Is there any application of projection matrices to applied math? Indeed.

It is often the case (or, at least, the hope) that the solution to a differential problem lies in a low-dimensional subspace of the full solution space. If some \textbf{w}(t) \in \mathbb{R}^N is the solution to the Ordinary Differential Equation

    \begin{equation*} \frac{d\textbf{w}(t)}{dt} = \textbf{f}(\textbf{w}(t), t) \end{equation*}

then there is hope that there exists some subspace \mathcal{S} \subset \mathbb{R}^, s.t. dim(\mathcal{S}) < N in which the solution lives. If that is the case, we may rewrite it as

    \[\textbf{w}(t) = \textbf{V}_\mathcal{S}\textbf{q}(t)\]

for some appropriate coefficients (q_i(t)), which are the components of \textbf{w}(t) over the basis \textbf{V}_\mathcal{S}.

Assuming that the base \textbf{V} itself is time-invariant, and that in general \textbf{Vq(t)} will be a good but not perfect approximation of the real solution, the original differential problem can be rewritten as:

    \begin{equation*} \begin{split} \frac{d}{dt}\textbf{Vq(t)} =  \textbf{f}(Vq(t), t) + \textbf{r}(t) \\ \textbf{V}\frac{d}{dt}\textbf{q(t)} =  \textbf{f}(Vq(t), t) + \textbf{r}(t) \\ \end{split} \end{equation*}

where \textbf{r(t)} is an error.

The post Projection methods in linear algebra numerics appeared first on Quick Math Intuitions.

]]>
https://quickmathintuitions.org/projection-methods-linear-algebra-numerics/feed/ 0
Diagonalizing a matrix NOT having full rank: what does it mean? https://quickmathintuitions.org/diagonalizing-matrix-not-full-rank-what-does-it-mean/?pk_campaign=&pk_source= https://quickmathintuitions.org/diagonalizing-matrix-not-full-rank-what-does-it-mean/#respond Mon, 11 Feb 2019 20:41:19 +0000 http://quickmathintuitions.org/?p=273 This is going to be a quick intuition about what it means to diagonalize a matrix that does not have full rank (i.e. has null determinant). Every matrix can be…

The post Diagonalizing a matrix NOT having full rank: what does it mean? appeared first on Quick Math Intuitions.

]]>
This is going to be a quick intuition about what it means to diagonalize a matrix that does not have full rank (i.e. has null determinant).

Every matrix can be seen as a linear map between vector spaces. Stating that a matrix is similar to a diagonal matrix equals to stating that there exists a basis of the source vector space in which the linear transformation can be seen as a simple stretching of the space, as re-scaling the space. In other words, diagonalizing a matrix is the same as finding an orthogonal grid that is transformed in another orthogonal grid. I recommend this article from AMS for good visual representations of the topic.

Image Taken from AMS - We Recommend a Singular Value Decomposition
Taken from AMS – We Recommend a Singular Value Decomposition

Diagonalization on non full rank matrices

That’s all right – when we have a matrix from \mathbb{R}^3 in \mathbb{R}^3, if it can be diagonalized, we can find a basis in which the transformation is a re-scaling of the space, fine.

But what does it mean to diagonalize a matrix that has null determinant? The associated transformations have the effect of killing at least one dimension: indeed, a nxn matrix of rank k has the effect of lowering the output dimension by n-k. For example, a 3×3 matrix of rank 2 will have an image of size 2, instead of 3. This happens because two basis vectors are merged in the same vector in the output, so one dimension is bound to collapse.

Let’s consider the sample matrix

    \[A = \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix}\]

which has non full rank because has two equal rows. Indeed, one can check that the two vectors (1,0,0); (0,0,1) go in the same basis vector. This means that dim(Im(f_A))=2 instead of 3. In fact, it is common intuition that when the rank is not full, some dimensions are lost in the transformation. Even if it’s a 3×3 matrix, the output only has 2 dimensions. It’s like at the end of Interstellar when the 4D space in which Cooper is floating gets shut.

However, A is also a symmetric matrix, so from the spectral theorem we know that it can be diagonalized. And now to the vital questions: what do we expect? What meaning does it have? Do we expect a basis of three vectors even if the map destroys one dimension?

Pause and ponder.

Diagonalize the matrix A and, indeed, you obtain three eigenvalues:

    \[det(A - \lambda I) = \begin{bmatrix} 0-\lambda & 1 & 0 \\ 1 & 0-\lambda & 1 \\ 0 & 1 & 0-\lambda \end{bmatrix} = - x^3 + 2x = x^2(2-x)\]

The eigenvalues are thus 0, -2 and 2, each giving a different eigenvector.  Taken all together, they form a orthogonal basis of \mathbb{R}^3. The fact that 0 is among the eigenvalues is important: it means that all the vectors belonging to the associated eigenspace all go to the same value: zero. This is the mathematical representation of the fact that one dimension collapses.

At first, I naively thought that, since the transformation destroys one dimension, I should expect to find a 2D basis of eigenvectors. But this was because I confused the source of the map with its image! The point is that we can still find a basis of the source space from the perspective of which the transformation is just a re-scaling of the space. However, that doesn’t tell anything about the behavior of the transformation, whether it will preserve all dimensions: it is possible that two vectors of the basis will go to the same vector in the image!

In fact, the fact that the matrix A has the first and third rows that are the same means that the basis vectors (1,0,0) and (0,0,1) both go into (0,1,0). A basis of Im(f_A) is simply {(0,1,0), (1,0,1)}, and we should not be surprised by the fact that those vectors have three entries. In fact, two vectors (even with three coordinates) only allow to represent a 2D space. In theory, one could express any vector that is combination of the basis above as combination of the usual 2D basis {(1,0), (0,1)}, to confirm that dim(Im(f_A))=2.

The post Diagonalizing a matrix NOT having full rank: what does it mean? appeared first on Quick Math Intuitions.

]]>
https://quickmathintuitions.org/diagonalizing-matrix-not-full-rank-what-does-it-mean/feed/ 0
Quick method to find line of shortest distance for skew lines https://quickmathintuitions.org/quick-method-to-find-line-of-shortest-distance-for-skew-lines/?pk_campaign=&pk_source= https://quickmathintuitions.org/quick-method-to-find-line-of-shortest-distance-for-skew-lines/#respond Sun, 26 Jun 2016 05:48:55 +0000 http://quickmathintuitions.org/?p=6 In linear algebra it is sometimes needed to find the equation of the line of shortest distance for two skew lines. What follows is a very quick method of finding…

The post Quick method to find line of shortest distance for skew lines appeared first on Quick Math Intuitions.

]]>
In linear algebra it is sometimes needed to find the equation of the line of shortest distance for two skew lines. What follows is a very quick method of finding that line.

Let’s consider an example. Start with two simple skew lines:

s : \begin{cases} x = 1 + t \\ y = 0 \\ z = -t \end{cases}

r : \begin{cases} x = - k \\ y = k + 2 \\ z = k \end{cases}

(Observation: don’t make the mistake of using the same parameter for both lines. Each lines exist on its own, there’s no link between them, so there’s no reason why they should should be described by the same parameter. If this doesn’t seem convincing, get two lines you know to be intersecting, use the same parameter for both and try to find the intersection point.)

The directional vectors are:

V_{s} = (1, 0, -1), V_{r} = (- 1, 1, 1)

So they clearly aren’t parallel. They aren’t incidental as well, because the only possible intersection point is for y = 0, but when y = 0, r is at (2, 0, -2), which doesn’t belong to s. It does indeed make sense to look for the line of shortest distance between the two, confident that we will find a non-zero result.

The idea is to consider the vector linking the two lines in their generic points and then force the perpendicularity with both lines.
We will call the line of shortest distance t. In our case, the vector between the generic points is (obtained as difference from the generic points of the two lines in their parametric form):

V_{t} = (1 + t + k, -k - 2, -t - k)

Imposing perpendicularity gives us:

V_{t} \cdot V_{s} = (1 + t + k, -k - 2, -t - k) \cdot (1, 0, -1) = 1 + 2t + 2k = 0

V_{t} \cdot V_{r} = (1 + t + k, -k - 2, -t - k) \cdot (- 1, 1, 1) = - 3 - 2t - 3k = 0

Solving the two simultaneous linear equations we obtain as solution (t, k) = (\frac{3}{2}, -2).

This solution allows us to quickly get three results:

  1. The equation of the line of shortest distance between the two skew lines: just replace t and k in V_{t} with the values found. In our case, V_{t} = (\frac{1}{2}, 0, \frac{1}{2}).
  2. The intersection point between t and s: just replace t in the parametric equation of s. In our case, V_{s} = (\frac{5}{2}, 0, -\frac{3}{2}).
  3. The intersection point between t and r: just replace k in the parametric equation of r. In our case, V_{r} = (2, 0, -2).

Do you have a quicker method? Share it in the comments!

The post Quick method to find line of shortest distance for skew lines appeared first on Quick Math Intuitions.

]]>
https://quickmathintuitions.org/quick-method-to-find-line-of-shortest-distance-for-skew-lines/feed/ 0