With Finite Differences, we discretize space (i.e. we put a grid on it) and we seek the values of the solution function at the mesh points. We still solve a discretized differential problem.
With Finite Elements, we approximate the solution as a (finite) sum of functions defined on the discretized space. These functions make up a basis of the space, and the most commonly used are the hat functions. We end up with a linear system whose unknowns are the weights associated with each of the basis functions: i.e., how much does each basis function count for out particular solution to our particular problem?
Brutally, it is finding the value of the solution function at grid points (finite differences) vs the weight of the linear combinations of the hat functions (finite elements).