A gentle (and short) introduction to Gröbner Bases

Taken from my report for a Computer Algebra course.


We know there are plenty of methods to solve a system of linear equations (to name a few: Gauss elimination, QR or LU factorization). In fact, it is straightforward to check whether a linear system has any solutions, and if it does, how many of them there are. But what if the system is made of non-linear equations? The invention of Groebner bases and the field of computational algebra came up to answer these questions.

In this text we will recap the theory behind single-variable polynomials and extend it to multiple-variable ones, ultimately getting to the definition of Groebner bases.

In some cases, the transition from one to multiple variables is smooth and pretty much an extension of the simple case (for example for the Greatest Common Divisor algorithm). In other cases, however, there are conceptual jumps to be made. To give an example, single variable polynomials have always a finite number of roots, while this does not hold for multivariable polynomials. Intuitively, the reason is that a polynomial in one variable describes a curve in the plane, which can only intersect the x-axis a discrete and finite number of times. On the other hand, a multivariate polynomial describes a surface in space, which will always intersect the 0-plane in a continuum of points.

Continue reading “A gentle (and short) introduction to Gröbner Bases”

What is the different between Finite Differences and Finite Element Methods?

With Finite Differences, we discretize space (i.e. we put  a grid on it) and we seek the values of the solution function at the mesh points. We still solve a discretized differential problem.

With Finite Elements, we approximate the solution as a (finite) sum of functions defined on the discretized space. These functions make up a basis of the space, and the most commonly used are the hat functions. We end up with a linear system whose unknowns are the weights associated with each of the basis functions: i.e., how much does each basis function count for out particular solution to our particular problem?

The role of intuitions in mathematics

Some thoughts and questions about the role of intuition in mathematics:

  • Is intuition needed to really understand a topic?
    I would say yes, since in the end we reason through ideas, of which we have an intuitive representation. Without intuitions, it is difficult to relate topics with each other as we lack in hooks, and we often lack a deep understanding as well.
  • Do you feel like you have understood something even if you do not have an intuitive representation of it?
  • How does formalism complete intuition?
    It shows whether and how an intuition is right. Sometimes intuition can be deceitful and/or tricky, especially in high dimensions or very abstract topics.
  • Can/Should intuitions be taught? Or are they only effective when discovered on one’s own?
    I side more with the latter. This is bordering with Maths Education, but I deem the process more important than the result – it is the tough digestion of some math material that ultimately leads to developing an intuition what really makes the intuition strong in one’s mind. If somebody else (like a teacher) does the work for us, then the result does not really stick, albeit nice it may be.
  • Can we say somebody with only intuitions (well understood and well reasoned) is a mathematician?
    I would say yes. I often find the intuitive side more important than the formal one.
  • Is it possible to develop intuitions for very abstract topics? If yes, what shape would they have, since there is rarely anything visual we can hook up to?

A note on the hopes for Fully Homomorphic Signatures

This is taken from my Master Thesis on Homomorphic Signatures over Lattices.

What are homomorphic signatures

Imagine that Alice owns a large data set, over which she would like to perform some computation. In a homomorphic signature scheme, Alice signs the data set with her secret key and uploads the signed data to an untrusted server. The server then performs the computation modeled by the function g to obtain the result y = g(x) over the signed data.

Alongside the result y, the server also computes a signature \sigma_{g,y} certifying that y is the correct result for g(x). The signature should be short – at any rate, it must be independent of the size of x. Using Alice’s public verification key, anybody can verify the tuple (g,y,\sigma_{g,y}) without having to retrieve all the data set x nor to run the computation g(x) on their own again.

The signature \sigma_{g,y} is a homomorphic signature, where homomorphic has the same meaning as the mathematical definition: `mapping of a mathematical structure into another one in such a way that the result obtained by applying the operations to elements of the first structure is mapped onto the result obtained by applying the corresponding operations to their respective images in the second one‘. In our case, the operations are represented by the function f, and the mapping is from the matrices U_i \in \mathbb{Z}_q^{n \times n} to the matrices V_i \in \mathbb{Z}_q^{n \times m}.

homomorphic signatures

Notice how the very idea of homomorphic signatures challenges the basic security requirements of traditional digital signatures. In fact, for a traditional signatures scheme we require that it should be computationally infeasible to generate a valid signature for a party without knowing that party’s private key. Here, we need to be able to generate a valid signature on some data (i.e. results of computation, like g(x)) without knowing the secret key. What we require, though, is that it must be computationally infeasible to forge a valid signature \sigma' for a result y' \neq g(x). In other words, the security requirement is that it must not be possible to cheat on the signature of the result: if the provided result is validly signed, then it must be the correct result.

The next ideas stem from the analysis of the signature scheme devised by Gorbunov, Vaikuntanathan and Wichs. It relies on the Short Integer Solution hard problem on lattices. The scheme presents several limitations and possible improvements, but it is also the first homomorphic signature scheme able to evaluate arbitrary arithmetic circuits over signed data.

Continue reading “A note on the hopes for Fully Homomorphic Signatures”

Probability as a measure of ignorance

One of the most beautiful intuitions about probability measures came from Rovelli’s book, that took it in turn from Bruno de Finetti.

What does a probability measure measure? Sure, the open sets of the \sigma-algebra that supports the measure space. But really, what? Thinking about it, it is very difficult to define probability without using the word probable or possible.

Well, probability measures our ignorance about something.

When we make some claim with 90% probability, what we are really saying is that the knowledge we have allows us to make a prediction that is that much accurate. And the main point here is that different people may assign different probabilities to the very same claim! If you have ever seen weather forecasts for the same day disagree, you know what I am talking about. Different data or different models can generate different knowledge, and thus different probability figures.

But we do not have to go that far to find reasonable examples. Let’s consider a very simple one. Imagine you found yourself on a train, and in front of you is sitting a girl with clothes branded Patagonia. What would be the odds that the girl has been to Patagonia? Not more than average, you would guess, because Patagonia is just a brand that makes warm clothes, and can be purchased in several stores all around the world, probably even more than in Patagonia itself! So you would probably say that is surely no more than 50% likely.

But now imagine a kid in the same scenario. If they see a girl with Patagonia clothes, they would immediately think that she had been to Patagonia (with probability 100% this time), because they are lacking a good amount of important information that you instead hold. And so the figure associated with \mathbb{P}(\text{The girl has been to Patagonia} | \text{The girl has a Patagonia jacket}) is pretty different depending on the observer, or rather on the knowledge (or lack of) they possess. In this sense probability is a measure of our ignorance.

But WHY is the Lattices Bounded Distance Decoding Problem difficult?

This is taken from my Master Thesis on Homomorphic Signatures over Lattices.

Introduction to lattices and the Bounded Distance Decoding Problem

A lattice is a discrete subgroup \mathcal{L} \subset \mathbb{R}^n, where the word discrete means that each x \in \mathcal{L} has a neighborhood in \mathbb{R}^n that, when intersected with \mathcal{L} results in x itself only. One can think of lattices as being grids, although the coordinates of the points need not be integer. Indeed, all lattices are isomorphic to \mathbb{Z}^n, but it may be a grid of points with non-integer coordinates.

Another very nice way to define a lattice is: given n independent vectors b_i \in \mathbb{R}^n, the lattice \mathcal{L} generated by that base is the set of all linear combinations of them with integer coefficients:

    \[\mathcal{L} = \{\sum\limits_{i=0}^{n} z_i b_i, \ b_i \in \mathbb{R}^n, z_i \in \mathbb{Z} \}\]

Then, we can go on to define the Bounded Distance Decoding problem (BDD), which is used in lattice-based cryptography (more specifically, for example in trapdoor homomorphic encryption) and believed to be hard in general.

Given an arbitrary basis of a lattice \mathcal{L}, and a point x \in \mathbb{R}^n not necessarily belonging to \mathcal{L}, find the point of \mathcal{L} that is closest to x. We are also guaranteed that x is very close to one of the lattice points. Notice how we are relying on an arbitrary basis – if we claim to be able to solve the problem, we should be able to do so with any basis.

Bounded Distance Problem example

Now, as the literature goes, this is a problem that is hard in general, but easy if the basis is nice enough. So, for example for encryption, the idea is that we can encode our secret message as a lattice point, and then add to it some small noise (i.e. a small element v \in \mathbb{R}^n). This basically generates an instance of the BDD problem, and then the decoding can only be done by someone who holds the good basis for the lattice, while those having a bad basis are going to have a hard time decrypting the ciphertext.

However, albeit of course there is no proof of this (it is a problem believed to be hard), I wanted to get at least some clue on why it should be easy with a nice basis and hard with a bad one (GGH is an example schema that employs techniques based on this).

So now to our real question. But WHY is the Bounded Distance Decoding problem hard (or easy)?

Continue reading “But WHY is the Lattices Bounded Distance Decoding Problem difficult?”

Conditional probability: why is it defined like that?

So, you want to calculate the probability of an event knowing that another has happened. There is a formula for that, it is called conditional probability, but why is it the way it is? Let’s first write down the definition of conditional probability:

    \[\mathbb{P}(A | B) = \dfrac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)}\]

We need to wonder: what does the happening of event B tell about the odds of happening of event A? How much more likely A becomes if B happens? Think in terms of how B affects A.

If A and B are independent, then knowing something about B will not tell us anything at all about A, at least not that we did not know already. In this case A \cap B is empty and thus \mathbb{P}(A | B) = \mathbb{P}(A). This makes sense! In fact, consider this example: how does me buying a copybook affects the likelihood that your grandma is going to buy a frying pan? It does not: the first event has no influence on the second, thus the conditional probability is just the same as the normal probability of the first event.

Sets no intersection

If A and B are not independent, several things can happen, and that is where things get interesting. We know that B happened, and we should now think as if B was our whole universe. The idea is: we already know what are the odds of A, right? It is just \mathbb{P}(A). But how do they increase if we know that we do not really have to consider all possible events, but just a subset of them? As an example, think of \mathbb{P}(\text{drawing a red ball}) versus \mathbb{P}(\text{drawing a red ball}) knowing that all balls are red. This makes a huge difference, right? (As an aside, that is what we mean when we say that probability is a measure of our ignorance.)

So anyway, now we ask: what is the probability of A? Well, it would just be \mathbb{P}(A), but we must account for the fact that we now live inside B, and everything that is outside it is as if it did not existed. So \mathbb{P}(A) actually becomes \mathbb{P}(A \cap B): we only care about the part of A that is inside B, because that is where we live now.

But, there is a caveat. Continue reading “Conditional probability: why is it defined like that?”

Diagonalizing a matrix NOT having full rank, what does it mean?

This is going to be a quick intuition about what it means to diagonalize a matrix that does not have full rank (i.e. has null determinant).

Every matrix can be seen as a linear map between vector spaces. Stating that a matrix is similar to a diagonal matrix equals to stating that there exists a basis of the source vector space in which the linear transformation can be seen as a simple stretching of the space, as re-scaling the space. In other words, diagonalizing a matrix is the same as finding an orthogonal grid that is transformed in another orthogonal grid. I recommend this article from AMS for good visual representations of the topic.

Image Taken from AMS - We Recommend a Singular Value Decomposition
Taken from AMS – We Recommend a Singular Value Decomposition

Diagonalization on non full rank matrices

That’s all right – when we have a matrix from \mathbb{R}^3 in \mathbb{R}^3, if it can be diagonalized, we can find a basis in which the transformation is a re-scaling of the space, fine.

But what does it mean to diagonalize a matrix that has null determinant? The associated transformations have the effect of killing at least one dimension: indeed, a nxn matrix of rank k has the effect of lowering the output dimension by n-k. For example, a 3x3 matrix of rank 2 will have an image of size 2, instead of 3. This happens because two basis vectors are merged in the same vector in the output, so one dimension is bound to collapse.

Let’s consider the sample matrix

    \[A = \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix}\]

which has non full rank because has two equal rows. Indeed, one can check that the two vectors (1,0,0); (0,0,1) go in the same basis vector. This means that dim(Im(f_A))=2 instead of 3. In fact, it is common intuition that when the rank is not full, some dimensions are lost in the transformation. Even if it’s a 3x3 matrix, the output only has 2 dimensions. It’s like at the end of Inception when the 4D space in which cooper is floating gets shut.

However, A is also a symmetric matrix, so from the spectral theorem we know that it can be diagonalized. And now to the vital questions: what do we expect? What meaning does it have? Do we expect a basis of three vectors even if the map destroys one dimension?

Pause and ponder.

Continue reading “Diagonalizing a matrix NOT having full rank, what does it mean?”

Finding paths of length n in a graph

Suppose you have a non-directed graph, represented through its adjacency matrix. How would you discover how many paths of length n link any two nodes?

Graph Adjacency Matrix 1For example, in the graph aside there is one path of length 2 that links nodes A and B (A-D-B). How can this be discovered from its adjacency matrix?

It turns out there is a beautiful mathematical way of obtaining this information! Although this is not the way it is used in practice, it is still very nice. In fact, Breadth First Search is used to find paths of any length given a starting node.

PROP. (A^n)_{ij} holds the number of paths of length n from node i to node j.

Let’s see how this proposition works. Consider the adjacency matrix of the graph above:

    \[(A)_{ij} \ \ \begin{matrix} \textbf{-} & \textbf{A} & \textbf{B} & \textbf{C} & \textbf{D} & \textbf{E} \\ \textbf{A} & 0 & 1 & 1 & 1 & 0 \\ \textbf{B} & 1 & 0 & 0 & 1 & 1 \\ \textbf{C} & 1 & 0 & 0 & 1 & 0 \\ \textbf{D} & 1 & 1 & 1 & 0 & 1 \\ \textbf{E} & 0 & 1 & 0 & 1 & 0 \\ \end{matrix}\]

With n = 2 we should find paths of length 2. So we first need to square the adjacency matrix:

Continue reading “Finding paths of length n in a graph”

On the relationship between L^p spaces and C_c functions for p = infinity

Very quick post on the relationship between \mathcal{L}^p, \mathcal{C}_c(X) and \mathcal{L}^\infty. I will assume you already know what I am talking about, I’ll just be sharing some intuition on what those mean, but won’t bother with details. It’s more a reminder for me rather than something that intends to be useful, actually, but there’s almost nothing on the Internet about this!

When we discover that \mathcal{C}_c(X) (continuous functions with compact support) is dense in \mathcal{L}^p, we also discover that it does not hold if p = \infty and \mu(X) = \infty.

What that intuitively means is that if you take away functions in \mathcal{C}_c(X) from \mathcal{L}^p, you take away something fundamental for \mathcal{L}^p: you are somehow taking away a net that keeps the ceiling up.

The fact that it becomes false for limitless spaces (\mu(X) = \infty) and p = \infty means that the functions in \mathcal{L}^\infty do not need functions in \mathcal{C}_c(X) to survive.

This is reasonable: functions in \mathcal{L}^\infty are not required to exist only in a specific (compact) region of space, whereas functions in \mathcal{C}_c(X) do. Functions in\mathcal{L}^\infty are simply bounded – their image keeps below some value, but can go however far they want in x direction. Very roughly speaking, they have a limit on their height, but not on their width.

What we find out, however, is that the following chain of inclusions holds:

\mathcal{C}_c(X) \subset \mathcal{C}_\infty(X) \subset \mathcal{L}^\infty

Continue reading “On the relationship between L^p spaces and C_c functions for p = infinity”