The post What is the Rossby number? appeared first on Quick Math Intuitions.

]]>Consider two quantities and , with being a characteristic scale-length of the phenomenon (ex. distance between two peaks, distance between two isobars, length of simulation domain) and the horizontal velocity scale of the motion. The ratio is the time it takes to the motion to cover a distance with velocity . *If this time is bigger than the period of earth’s rotation, then the phenomenon IS affected by the rotation.*

So if , then the phenomenon IS a large-scale one. Thus we can define and say that for a phenomenon is large scale. Phenomena with small Rossby number are dominated by Coriolis force behavior, while those with large Rossby number are dominated by inertial forces (ex: a tornado). However, rotational effects are more evident for low latitudes (i.e. near the equator), so the Rossby number can be different depending on where on earth we are.

(Notice that is in theory equal to , with being the earth rotational velocity and the angle between the axis of rotation and the direction of fluid movement. In the geophysical context, flows are mostly horizontal (also due to density stratification in both atmosphere and ocean), so can be approximated with 1. There is a bunch of different notation, but this is also referred to as , called the Coriolis frequency.)

The post What is the Rossby number? appeared first on Quick Math Intuitions.

]]>The post How do Dirichlet and Neumann boundary conditions affect Finite Element Methods variational formulations? appeared first on Quick Math Intuitions.

]]>with FEM, we first need to derive its weak formulation. This is achieved by multiplying the equation by a test function and then integrating by parts to get rid of second order derivatives:

(1)

A typical FEM problem then reads like:

What is the difference between imposing Dirichlet boundary conditions (ex. ) and Neumann ones () from a math perspective? **Dirichlet conditions go into the definition of the space , while Neumann conditions do not. Neumann conditions only affect the variational problem formulation straight away.**

For example, in one dimension, adding the Dirichlet condition results in the function space change . With this condition, the boundary term would also zero out in the variational problem. because the test function belongs to .

On the other hand, by adding the Neumann condition , the space does not change, even though the boundary term vanishes from the variational problem in the same way as the for the Dirichlet condition. However, that term goes to zero not because of the test function anymore, but because of the value of the derivative . If the Neumann condition had specified a different value, such as , then the boundary term would not zero out!

In other words, **Dirichlet conditions have the effect of further constraining the solution function space**, while Neumann conditions only affect the equations.

The post How do Dirichlet and Neumann boundary conditions affect Finite Element Methods variational formulations? appeared first on Quick Math Intuitions.

]]>The post A gentle (and short) introduction to Gröbner Bases appeared first on Quick Math Intuitions.

]]>Taken from my report for a Computer Algebra course.

We know there are plenty of methods to solve a system of linear equations (to name a few: Gauss elimination, QR or LU factorization). In fact, it is straightforward to check whether a linear system has any solutions, and if it does, how many of them there are. But what if the system is made of *non-linear* equations? The invention of Groebner bases and the field of computational algebra came up to answer these questions.

In this text we will recap the theory behind single-variable polynomials and extend it to multiple-variable ones, ultimately getting to the definition of Groebner bases.

In some cases, the transition from one to multiple variables is smooth and pretty much an extension of the simple case (for example for the Greatest Common Divisor algorithm). In other cases, however, there are conceptual jumps to be made. To give an example, single variable polynomials have always a finite number of roots, while this does not hold for multivariable polynomials. Intuitively, the reason is that a polynomial in one variable describes a curve in the plane, which can only intersect the x-axis a discrete and finite number of times. On the other hand, a multivariate polynomial describes a surface in space, which will always intersect the 0-plane in a continuum of points.

All throughout these notes, it will be important to have in mind some basic algebra definitions.

To begin with, we ask what is the most basic (but useful) structure we can put on a set. We ask, for example, given the set of natural numbers, what do we need to do to allow basic manipulation (i.e. summation)? This leads us to the definition of *group*.

**DEF 1**: A group is made of a set with one binary operation such that:

- The operation is closed:
- The operation is associative:
- The operation has an identity element s.t.
- Each element has an inverse element:

A group is usually denoted with .

Notice that we did not ask anything about commutativity!

Then, the notion of group can be made richer and more complex: first into that of *ring*, then into that of *field*.

**DEF 2**: A ring is a group with an extra operation which sastisfies the following properties:

- The operation is commutative:
- The operation is closed:
- The operation has an identity element s.t.
- The operation is associative:
- The operation is distributive with respect to

**DEF. 3**: A field is a ring in which all elements have an inverse with respect to the operation .

All throughout these notes, the symbol will denote a field.

**DEF 4**: A monomial is a product , with . Its degree is the sum of the exponents.

**DEF 5**: A polynomial is a linear combinations of monomials.

We conclude by noting that the space of polynomials with coefficients taken from a field makes a ring, denoted with .

Our first step towards formalizing the theory for non-linear systems is to understand what the space of solutions looks like. As much as we know that linear spaces are the solutions spaces for linear systems, there is something analogous for non-linear systems, and that is affine varieties.

**DEF 6**: Given polynomials in , the affine variety over them is the set of their common roots:

**EX 1**:

When working with rings, as it is our case, the notion of ideal is important. The reason for its importance is that ideals turn out to be kernels of ring homomorphisms — or, in other words, that they are the “good sets” that can be used to take ring quotients.

**DEF 7**: An ideal is a subset such that:

- it is closed w.r.t +:
- it is closed w.r.t * for elements in the ring:

Given some elements of a ring, we might wonder what is the way to build an ideal (the smallest) that would contain them.

**DEF 8**: Given polynomials, the ideal generated by them is the set of combinations with coefficients taken from the ring:

Having introduced ideals, we immediately find a result that is linked to our purpose of non-linear systems inspection: a simple way to check if a system has solutions or not.

**THEO 1**: If , then .

**PROOF:** Since , it must be possible to write it as a combination of the form . Now, if we suppose that is not empty, then one of its points is a root of all the . This would mean that , which is absurd.

Groebner bases give a computational method for solving non-linear systems of equations through an apt sequence of intersection of ideals. To state its definition, we first need to know what a *monomial ordering* is. Intuitively, we can think of such an ordering as a way to compare monomials — the technical definition does not add much more concept. Different orderings are possible.

Once we have a way of ordering monomials, it is also possible to define the leading monomial (denoted as ) of a given polynomial. For single variable polynomials it is pretty straightforward, but for the multi-variate case we need to define an ordering first (some possible options are: lexicographic, graded lexicographic, graded reverse lexicographic).

**DEF 9**: Given a monomial ordering, a Groebner basis of an ideal w.r.t the ordering is a finite subset s.t. .

This basis is a generating set for the ideal, but notice how *it depends on the ordering*! Finally, it is possible to prove that every ideal has a Groebner basis (Hilbert’s basis theorem).

From here now, the rationale is that, given a system of polynomial equations, we can see the polynomials as generators of some ideal. That ideal *will have* a Groebner basis, and there is an algorithm to build one (Buchberger algorithm). From there, apt ideal operations will allow to solve the system by eliminating the variables.

We now describe this elimination algorithm with an example:

(1)

Given the ideal

then a Groebner basis with respect to the (lexicographical order) is

(2)

which can be used to compute the solutions of the initial system (1).

To do so, first consider the ideal , which practically corresponds to all polynomials in where are not present. In our case, we are left only with one element from the basis which only involve : . The roots of are .

The values for can then be used to find the possible values for using polynomial , which only involve . Finally, once possible values for are known, they can be used to find the corresponding values for through .

This example will yield the following solutions:

(3)

The post A gentle (and short) introduction to Gröbner Bases appeared first on Quick Math Intuitions.

]]>The post What is the different between Finite Differences and Finite Element Methods? appeared first on Quick Math Intuitions.

]]>With Finite Elements, we approximate the solution as a (finite) sum of functions defined on the discretized space. These functions make up a basis of the space, and the most commonly used are the *hat functions*. We end up with a linear system whose unknowns are the weights associated with each of the basis functions: i.e., how much does each basis function count for out particular solution to our particular problem?

Brutally, it is finding the value of the solution function at grid points (finite differences) vs the weight of the linear combinations of the hat functions (finite elements).

The post What is the different between Finite Differences and Finite Element Methods? appeared first on Quick Math Intuitions.

]]>The post The role of intuitions in mathematics appeared first on Quick Math Intuitions.

]]>**Is intuition needed to really understand a topic?**

I would say yes, since in the end we reason through ideas, of which we have an intuitive representation. Without intuitions, it is difficult to relate topics with each other as we lack in hooks, and we often lack a deep understanding as well.**Do you feel like you have understood something even if you do not have an intuitive representation of it?****How does formalism complete intuition?**

It shows whether and how an intuition is right. Sometimes intuition can be deceitful and/or tricky, especially in high dimensions or very abstract topics.**Can/Should intuitions be taught? Or are they only effective when discovered on one’s own?**

I side more with the latter. This is bordering with Maths Education, but I deem the process more important than the result – it is the tough digestion of some math material that ultimately leads to developing an intuition what really makes the intuition strong in one’s mind. If somebody else (like a teacher) does the work for us, then the result does not really stick, albeit nice it may be.**Can we say somebody with only intuitions**(well understood and well reasoned)**is a mathematician?**

I would say yes. I often find the intuitive side more important than the formal one.**Is it possible to develop intuitions for very abstract topics? If yes, what***shape*would they have, since there is rarely anything visual we can hook up to?

The post The role of intuitions in mathematics appeared first on Quick Math Intuitions.

]]>The post A note on the hopes for Fully Homomorphic Signatures appeared first on Quick Math Intuitions.

]]>This is taken from my Master Thesis on *Homomorphic Signatures over Lattices*.

Imagine that Alice owns a large data set, over which she would like to perform some computation. In a homomorphic signature scheme, Alice signs the data set with her secret key and uploads the signed data to an untrusted server. The server then performs the computation modeled by the function to obtain the result over the signed data.

Alongside the result , the server also computes a signature certifying that is the correct result for . The signature should be short – at any rate, it must be independent of the size of . Using Alice’s public verification key, anybody can verify the tuple without having to retrieve all the data set nor to run the computation on their own again.

The signature is a *homomorphic signature*, where *homomorphic* has the same meaning as the mathematical definition: `*mapping of a mathematical structure into another one in such a way that the result obtained by applying the operations to elements of the first structure is mapped onto the result obtained by applying the corresponding operations to their respective images in the second one*‘. In our case, the *operations* are represented by the function , and the *mapping* is from the matrices to the matrices .

Notice how the very idea of **homomorphic signatures challenges the basic security requirements of traditional digital signatures**. In fact, for a traditional signatures scheme we require that it should be computationally infeasible to generate a valid signature for a party without knowing that party’s private key. Here, we *need* to be able to generate a valid signature on *some data* (i.e. results of computation, like ) *without* knowing the secret key. What we require, though, is that it must be computationally infeasible to forge a valid signature for a result . In other words, the security requirement is that *it must not be possible to cheat on the signature of the result*: if the provided result is validly signed, then it must be the *correct* result.

The next ideas stem from the analysis of the signature scheme devised by Gorbunov, Vaikuntanathan and Wichs. It relies on the *Short Integer Solution* hard problem on lattices. The scheme presents several limitations and possible improvements, but it is also the first homomorphic signature scheme able to evaluate arbitrary arithmetic circuits over signed data.

*Def.* – A signature scheme is said to be **leveled homomorphic** if it can only evaluate circuits of fixed depth over the signed data, with being function of the security parameter. In particular, each signature comes with a noise level : if, combining the signatures into the result signature , the noise level grows to exceed a given threshold , then the signature is no longer guaranteed to be correct.

*Def.* – A signature scheme is said to be **fully homomorphic** if it supports the evaluation of any arithmetic circuit (albeit possibly being of fixed size, i.e. leveled). In other words, there is no limitation on the “richness” of the function to be evaluated, although there may be on its complexity.

Let us remark that, to date, no (*non-leveled*) fully homomorphic signature scheme has been devised yet. The state of the art still lies in *leveled* schemes. On the other hand, a great breakthrough was the invention of a fully homomorphic encryption scheme by Craig Gentry.

The main limitation of the current construction (GVW15) is that verifying the correctness of the computation takes Alice roughly as much time as the computation of itself. However, what she gains is that she does not have to store the data set long term, but can do only with the signatures.

To us, this limitation makes intuitive sense, and it is worth comparing it with real life. In fact, if one wants to judge the work of someone else, they cannot just look at it without any preparatory work. Instead, they have to have spent (at least) *a comparable amount of time* studying/learning the content to be able to evaluate the work.

For example, a good musician is required to evaluate the performance of Beethoven’s Ninth Symphony by some orchestra. Notice how anybody with some musical knowledge could evaluate whether what is being played *makes sense* (for instance, whether it actually *is* the Ninth Symphony and not something else). On the other hand, evaluating the perfection of performance is something entirely different and requires years of study in the music field and in-depth knowledge of the particular symphony itself.

That is why it looks like hoping to devise a homomorphic scheme in which the verification time is significantly shorter than the computation time would be against what is rightful to hope. It may be easy to judge whether the result makes sense (for example, it is not a letter if we expected an integer), but is **difficult if we want to evaluate perfect correctness**.

However, there is **one more caveat**. If Alice has to verify the result of the same function over two different data sets, then the verification cost is basically the same (*amortized verification*). Again, this makes sense: when one is skilled enough to evaluate the performance of the Ninth Symphony by the *Berlin Philharmonic*, they are also skilled enough to evaluate the performance of the same piece by the *Vienna Philharmonic*, without having to undergo any significant further work other than going and *listening to* the performance.

So, although **it does not seem feasible to devise a scheme that guarantees the correctness of the result and in which the verification complexity is significantly less than the computation complexity**, not all hope for improvements is lost. In fact, it may be possible to obtain a scheme in which verification is faster, but the correctness is only

Back to our music analogy, we can imagine the evaluator * listening to a handful of minutes* of the Symphony and evaluate the whole performance from the little he has heard. However, the orchestra has no idea at what time the evaluator will show up, and for how long they will listen. Clearly, if the orchestra makes a mistake in those few minutes, the performance is not perfect; on the other hand, if what they hear is flawless, then there is

Similarly, the scheme may be tweaked to **only partially check the signature result**, thus assigning a *probabilistic measure of correctness*. As a rough example, we may think of not computing the homomorphic transformations over the matrices wholly, but only calculating a few, randomly-placed entries. Then, if those entries are all correct, it is very *unlikely* (and it quickly gets more so as the number of checked entries increases, of course) that the result is wrong. After all, to cheat, the third party would need to guess several numbers in , each having likelihood of coming up!

Another idea would be for the music evaluator to **delegate another person to check for the quality of the performance**, by giving them some precise and detailed features to look for when hearing the play. In the homomorphic scheme, this may translate in *looking for some specific features in the result*, some characteristics we know *a priori* that must be in the result. For example, we may know that the result must be a prime number, or must satisfy some constraint, or a relation with something much easier to check. In other words, we may be able to *reduce the correctness check to a few fundamental traits* that are very easy to check, but also provide some guarantee of correctness. This method seems much harder to model, though.

The post A note on the hopes for Fully Homomorphic Signatures appeared first on Quick Math Intuitions.

]]>The post Probability as a measure of ignorance appeared first on Quick Math Intuitions.

]]>** What does a probability measure measure?** Sure, the open sets of the -algebra that supports the measure space. But really, what? Thinking about it, it is very difficult to define

Well, **probability measures our ignorance about something**.

When we make some claim with 90% probability, what we are really saying is that *the knowledge we have* allows us to make a prediction that is that much accurate. And the main point here is that **different people may assign different probabilities to the very same claim!** If you have ever seen weather forecasts for the same day disagree, you know what I am talking about. **Different data or different models can generate different knowledge, and thus different probability figures.**

But we do not have to go that far to find reasonable examples. Let’s consider a very simple one. Imagine you found yourself on a train, and in front of you is sitting a girl with clothes branded Patagonia. What would be the odds that the girl has been to Patagonia? Not more than average, you would guess, because Patagonia is just a brand that makes warm clothes, and can be purchased in several stores all around the world, probably even more than in Patagonia itself! So you would probably say that is surely no more than 50% likely.

**But now imagine a kid in the same scenario.** If they see a girl with Patagonia clothes, they would immediately think that she had been to Patagonia (with probability 100% this time), because they are lacking a good amount of important information that you instead hold. And so the figure associated with is pretty **different depending on the observer, or rather on the knowledge (or lack of) they possess**. In this sense probability is a measure of our ignorance.

The post Probability as a measure of ignorance appeared first on Quick Math Intuitions.

]]>The post But WHY is the Lattices Bounded Distance Decoding Problem difficult? appeared first on Quick Math Intuitions.

]]>This is taken from my Master Thesis on *Homomorphic Signatures over Lattices*.

A lattice is a discrete subgroup , where the word discrete means that each has a neighborhood in that, when intersected with results in itself only. One can **think of lattices as being grids**, although the coordinates of the points need not be integer. Indeed, all lattices are isomorphic to , but it may be a grid of points with non-integer coordinates.

Another very nice way to define a lattice is: given independent vectors , the lattice generated by that base is the set of all linear combinations of them **with integer coefficients:**

Then, we can go on to define the **Bounded Distance Decoding problem** (BDD), which is used in **lattice-based cryptography** (more specifically, for example in trapdoor homomorphic encryption) and believed to be hard in general.

Given an arbitrary basis of a lattice , and a point *not necessarily belonging* to , find the point of that is closest to . We are also guaranteed that is *very close* to one of the lattice points. Notice how we are relying on an *arbitrary* basis – if we claim to be able to solve the problem, we should be able to do so with *any* basis.

Now, as the literature goes, this is a problem that is *hard in general, but easy if the basis is nice enough*. So, for example for encryption, the idea is that we can encode our secret message as a lattice point, and then add to it some small noise (i.e. a small element ). This basically generates an instance of the BDD problem, and then the decoding can only be done by someone who holds the good basis for the lattice, while those having a bad basis are going to have a hard time decrypting the ciphertext.

However, albeit of course there is no proof of this (it is a problem* believed* to be hard), I wanted to get at least some clue on **why** it should be easy with a nice basis and hard with a bad one (GGH is an example schema that employs techniques based on this).

So now to our real question. But WHY is the Bounded Distance Decoding problem hard (or easy)?

Let’s first say what a good basis is. **A basis is good if it is made of nearly orthogonal short vectors**. This is a pretty vague definition, so let’s make it a bit more specific (although tighter): we want a base in which each of its is of the form for some . One can imagine being smaller than some random value, like 10. (This shortness is pretty vague and its role will be clearer later.) In other words, **a nice basis is the canonical one, in which each vector has been re-scaled by an independent real factor.**

To get a flavor of why the Bounded Distance Decoding problem is easy with a nice basis, let’s make an example. Consider , with as basis vectors. Suppose we are given as challenge point. It does not belong to the lattice generated by , but it is only away from the point , which does belong to the lattice.

Now, what does one have to do to solve this problem? Let’s get a graphical feeling for it and formalize it.

We are looking for the lattice point closest to . So, sitting on , we are looking for the linear combination with integer coefficients of the basis vectors that is closest to us. Breaking it component-wise, we are looking for and such that they are solution of:

This may seem a difficult optimization problem, but in truth it is very simple! **The reason is that each of the equations is independent, so we can solve them one by one – the individual minimum problems are easy and can be solved quickly**. (One could also put boundaries on with respect to the norm of the basis vectors, but it is not vital now.)

So the overall complexity of solving BDD with a good basis is , which is okay.

**A bad basis** is any basis that does not satisfy any of the two conditions of a nice basis: it **may be poorly orthogonal, or may be made of long vectors.** We will later try to understand what roles these differences play in solving the problem: for now, let’s just consider an example again.

Another basis for the lattice generated by the nice basis we picked before () is . This is a bad one.

Let’s write down the system of equations coordinate-wise as we did for the nice basis. We are looking for and such that they are solution of:

Now look! This may look similar as before, but **this time it really is a system, the equations are no longer independent:** we have 3 unknowns and 2 equations. The system is under-determined! This already means that, in principle, there are infinite solutions. Moreover, we are also trying to find a solution that is constrained to be minimum. Especially with big , solving this optimization problem can definitely be non-trivial!

So far so good: we have discovered why the Bounded Distance Decoding problem is easy with a good basis and difficult with a bad one. But still, **what does a good basis have to make it easy? How do its properties related to easy of solution?**

We enforced two conditions: orthogonality and shortness. Actually, we even required something stronger than orthogonality: that the good basis was basically a stretched version of the canonical one – i.e. had only one non-zero entry.

**Let’s think for a second in terms of canonical basis** . **This is what makes the minimum problems independent** and allows for easy resolution of the BDD problem. However, when dealing with cryptography matters, **we cannot always use the same basis,** we need some randomness. That is why we required to use a set of independent vectors each having only one non-zero coordinate: it is the main feature that makes the problem easy (at least for the party having the good basis).

We also asked for **shortness. This does not give immediate advantage to who holds the good basis, but makes it harder to solve the problem for those holding the bad one.** The idea is that, given a challenge point , if we have short basis vectors, *we can take small steps* from it and look around us for nearby points. It may take some time to find the best one, but we are still not looking totally astray. Instead, **if we have long vectors, every time we use one we have to make a big leap in one direction**. In other words, *who has the good basis knows the step size of the lattice, and thus can take steps of considerate size. slowly poking around*; who has the bad basis takes huge jumps and may have a hard time pinpointing the right point.

It is true, though, that the features of a good basis usually only include shortness and orthogonality, and not the “rescaling of the canonical basis” we assumed in the first place. So, let’s consider a basis of that kind, like . If we wrote down the minimum problem we would have to solve given a challenge point, it would be pretty similar to the one with the bad basis, with the equations not being independent. Looks like bad luck, uh?

However, not all hope is lost! In fact, **we can look for the rotation matrix that will turn that basis into a stretching of the canonical one,** finding ! Then we can rotate the challenge point as well, and solve the problem with respect to those new basis vectors. Of course that is not going to be the solution to the problem, but we can easily rotate it back to find the real solution!

However, given that using a basis of this kind does not make the opponent job any harder, but only increases the computational cost for the honest party, I do not see why this should ever be used. Instead, I guess the best choices for good basis are the stretched canonical ones.

(This may be obvious, but having a generic orthogonal basis is not enough for an opponent to break the problem. If it is orthogonal, but its vectors are long, bad luck!)

The post But WHY is the Lattices Bounded Distance Decoding Problem difficult? appeared first on Quick Math Intuitions.

]]>The post Conditional probability: why is it defined like that? appeared first on Quick Math Intuitions.

]]>

We need to wonder: **what does the happening of event tell about the odds of happening of event ?** How much *more likely* becomes if happens? Think in terms of **how affects **.

**If and are independent**, then knowing something about B will not tell us anything at all about , at least not that we did not know already. In this case is empty and thus . This makes sense! In fact, consider this example: how does me buying a copybook affects the likelihood that your grandma is going to buy a frying pan? It does not: the first event has no influence on the second, thus the conditional probability is just the same as the normal probability of the first event.

**If and are not independent**, several things can happen, and that is where things get interesting. We know that B happened, and we should now **think as if was our whole universe**. The idea is: we already know what are the odds of , right? It is just . **But how do they increase if we know that we do not really have to consider all possible events, but just a subset of them? **As an example, think of versus *knowing that* all balls are red. This makes a huge difference, right? (As an aside, that is what we mean when we say that probability is a measure of our ignorance.)

So anyway, now we ask: what is the probability of ? Well, it would just be , but we must account for the fact that we now *live inside* , and everything that is outside it is as if it did not existed. So actually becomes : we only care about the part of that is inside , because that is where we live now.

But, there is a caveat. We are *thinking* as if was the whole universe but, in terms of probabilities, it actually is not, because nobody has informed the probability distribution. In fact, we compute precisely because **we still live in the bigger universe**, but we need to account for the fact that is our real universe now. That is why we need a *re-scaling factor*: something that will scale to make it numerically correct, to account for the fact that is our current universe. This is what the at the denominator does.

In fact, the factor accounts for *how much relevant the information that happened is*. If , it means that, for probability purposes, – the switch of universe was just apparent! A further consequence is that , because is basically inside (apart from silly null-measure caveats). In turn, this has the consequence of making . This makes sense: if is sure to happen, then what does it tell us about the odds of something else? As an example, if we are considering strings of digits (), what is the likelihood that a certain string is made of ones or twos, knowing that it is made out of digits ()? It sounds tautological, and it certainly is.

What about a that is big, yet ? This is trickier, as it mostly depends on the interplay between and . But you are not on a university level website to read about inverse proportionality, are you?

Another case worth inspection is when . In that case,

Makes sense, right? If happened, and is inside it, then clearly must happen as well. If I bought a red umbrella, what are the odds that I bought a generic umbrella as a consequence? Full, yep.

Finally, let’s consider the case in which is very small. Suppose that and . An at the denominator will make the resulting fraction become significantly bigger.

The idea here is that if is very narrow, if it talks about **something very unlikely, and it happened, this greatly influences the overall conditional probability**. What are the odds that I get hospitalized in Japan as a 25 years old man? Very low. What are the odds that today there is an earthquake in Japan? Very low. What are the odds that I get hospitalized in Japan, knowing that today an earthquake happened? Quite high. That’s the idea: the more is unlikely, the higher tends to be. In a sense, the more narrow is, the higher the amount of information it brings knowing that it happened.

The post Conditional probability: why is it defined like that? appeared first on Quick Math Intuitions.

]]>The post Diagonalizing a matrix NOT having full rank, what does it mean? appeared first on Quick Math Intuitions.

]]>Every matrix can be seen as a linear map between vector spaces. Stating that a matrix is similar to a diagonal matrix equals to stating that there exists a basis of the source vector space in which the linear transformation can be seen as a simple *stretching of the space*, as re-scaling the space. In other words, diagonalizing a matrix is the same as *finding an orthogonal grid that is transformed in another orthogonal grid*. I recommend this article from AMS for good visual representations of the topic.

That’s all right – when we have a matrix from in , if it can be diagonalized, we can find a basis in which the transformation is a re-scaling of the space, fine.

But what does it mean to diagonalize a matrix that has null determinant? The associated transformations have the effect of killing at least one dimension: indeed, a x matrix of rank has the effect of lowering the output dimension by . For example, a x matrix of rank 2 will have an image of size 2, instead of 3. This happens because two basis vectors are merged in the same vector in the output, so one dimension is bound to collapse.

Let’s consider the sample matrix

which has non full rank because has two equal rows. Indeed, one can check that the two vectors go in the same basis vector. This means that instead of 3. In fact, it is common intuition that when the rank is not full, some dimensions are lost in the transformation. Even if it’s a x matrix, the output only has 2 dimensions. It’s like at the end of Inception when the 4D space in which cooper is floating gets shut.

However, is also a symmetric matrix, so from the spectral theorem we know that it can be diagonalized. And now to the vital questions: what do we expect? What meaning does it have? *Do we expect a basis of three vectors even if the map destroys one dimension?*

Pause and ponder.

Diagonalize the matrix and, indeed, you obtain three eigenvalues:

The eigenvalues are thus , and , each giving a different eigenvector. Taken all together, they form a orthogonal basis of . The fact that is among the eigenvalues is important: it means that *all the vectors belonging to the associated eigenspace all go to the same value*: zero. This is the mathematical representation of the fact that **one dimension collapses**.

At first, I naively thought that, since the transformation destroys one dimension, I should expect to find a 2D basis of eigenvectors. But this was because I confused the source of the map with its image! The point is that we can still find a basis of the source space from the perspective of which the transformation is just a re-scaling of the space. However, that doesn’t tell anything about the behavior of the transformation, whether it will preserve all dimensions: *it is possible that two vectors of the basis will go to the same vector in the image*!

In fact, the fact that the matrix has the first and third rows that are the same means that the basis vectors and both go into . A basis of is simply , and we should not be surprised by the fact that those vectors have three entries. In fact, *two* vectors (even with *three* coordinates) only allow to represent a *2D* space. In theory, one could express any vector that is combination of the basis above as combination of the usual 2D basis , to confirm that .

The post Diagonalizing a matrix NOT having full rank, what does it mean? appeared first on Quick Math Intuitions.

]]>