Eigenvectors and covariance

Finding the directions a transformation leaves alone.

PublishedMarch 2026

Reading time

SeriesLinear Algebra for Neural Data, Part 5

Directions that stay put

Take a matrix and apply it to a vector. In general, the output points in a different direction than the input. The vector gets rotated, sheared, sent somewhere new. That is what matrices do.

But try this specific matrix and this specific vector:

\begin{bmatrix} 3 & 1 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 3 \\ 0 \end{bmatrix}

(1)

The input is $(1, 0)$ . The output is $(3, 0)$ . Same direction. The vector was stretched by a factor of 3, but it did not rotate. The matrix, which in general rotates and shears everything, left this particular direction alone.

Is that a coincidence of the input we chose? Let's try another:

\begin{bmatrix} 3 & 1 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \end{bmatrix} = \begin{bmatrix} 2 \\ -2 \end{bmatrix}

(2)

The input is $(1, -1)$ . The output is $(2, -2)$ . Again: same direction, just scaled, this time by a factor of 2. A second direction that the matrix does not rotate.

Now try something generic, like $(1, 1)$ : $A(1,1) = (4, 2)$ . That is not a multiple of $(1, 1)$ . This vector does get rotated. So the property is special to particular directions.

These special directions are called eigenvectors, and they turn out to be the key to PCA, to stability analysis in dynamical systems, and to understanding covariance matrices.

Eigenvectors and eigenvalues

A direction that a matrix scales but does not rotate is called an eigenvector. The scale factor is its eigenvalue. In symbols:

Av = \lambda v

(3)

The matrix $A$ acts on $v$ and produces the same vector $v$ , multiplied by the number $\lambda$ . For the matrix above, we found two eigenvectors: $(1, 0)$ with eigenvalue 3, and $(1, -1)$ with eigenvalue 2.

What does the eigenvalue tell you? If $\lambda > 1$ , the vector gets stretched. If $0 < \lambda < 1$ , it gets compressed. If $\lambda < 0$ , it gets flipped. If $\lambda = 0$ , it gets sent to zero: the matrix annihilates that direction entirely. (That is a null-space direction, from the previous post.)

A circle of unit vectors deforms into an ellipse under the transformation. The highlighted directions are eigenvectors: they scale but do not rotate. Click Diagonalize to switch to the eigenvector basis and watch the matrix simplify.

Watch the figure. A circle of unit vectors gets mapped to an ellipse. Most directions change angle. But the eigenvector directions stay on their original line: they land on the ellipse at exactly the point you would reach by walking along the same direction, just farther or closer to the origin. The eigenvalue is the stretch factor.*Some matrices have fewer than $n$ independent eigenvectors (defective matrices). Others have complex eigenvalues, which correspond to rotations rather than pure scalings. We set both aside here. Complex eigenvalues will matter in the dynamics posts, where they produce oscillations.

For neural dynamics, eigenvalues control stability. If your system evolves as $x_{t+1} = Ax_t$ , then along an eigenvector direction the system simply scales: $A^k v = \lambda^k v$ . When $|\lambda| < 1$ , the component decays. When $|\lambda| > 1$ , it explodes. The eigenvalues are the knobs controlling how each independent mode of the dynamics grows or shrinks.

Diagonalization

In Post 2, we said that the same transformation looks different in different bases. A messy matrix in one basis might become simple in another. Eigenvectors tell you which basis makes it simplest.

Our matrix has eigenvectors $v_1 = (1, 0)$ and $v_2 = (1, -1)$ . These are two independent vectors, so they form a basis. What does the matrix look like in this basis?

In the eigenvector basis, the matrix sends $v_1$ to $3v_1$ and $v_2$ to $2v_2$ . No mixing between the two directions. The matrix, which in the standard basis had an off-diagonal entry (the 1 in the top-right corner, coupling the two coordinates), becomes diagonal:

A_{\text{eigenbasis}} = \begin{bmatrix} 3 & 0 \\ 0 & 2 \end{bmatrix}

(4)

Each axis scales independently. No cross-talk. This is the eigendecomposition. Stack the eigenvectors as columns of a matrix $V$ and the eigenvalues on the diagonal of $\Lambda$ :

A = V \Lambda V^{-1}

(5)

Read this as a three-step process. Convert from the standard basis to the eigenvector basis (multiply by $V^{-1}$ ). Scale each coordinate independently by its eigenvalue (multiply by $\Lambda$ ). Convert back (multiply by $V$ ). In the eigenvector basis, the transformation has no mixing between dimensions. All the complexity of the matrix was just a consequence of looking at it in the wrong coordinates.

Same transformation, two bases. Drag the right-panel basis to find the coordinates that make the matrix simplest. The eigenvalues (shown below) never change.

The eigenvalues are invariant. Switch to any basis and they stay the same: they are a property of the transformation, not of the coordinates. The trace (sum of diagonal entries) equals $\lambda_1 + \lambda_2 + \cdots + \lambda_n$ . The determinant equals $\lambda_1 \cdot \lambda_2 \cdots \lambda_n$ . Both are invariant too. The entries of the matrix change with the basis. These quantities do not.*Two matrices related by $B = P^{-1}AP$ for some invertible $P$ are called similar. Similar matrices represent the same transformation in different bases. They share eigenvalues, trace, determinant, and rank.

The covariance matrix

Now let's build a specific matrix from neural data and see what its eigenvectors tell us.

You record from two neurons over many time bins. Center the data: subtract each neuron's mean, so the average firing rate is zero. At each time bin $t$ , you have a centered firing-rate vector $x_t = (x_{t,1},\; x_{t,2})$ .

Take one time bin and form the outer product: $x_t x_t^\top$ . This is a 2-by-2 matrix:

x_t\, x_t^\top = \begin{bmatrix} x_{t,1} \\ x_{t,2} \end{bmatrix} \begin{bmatrix} x_{t,1} & x_{t,2} \end{bmatrix} = \begin{bmatrix} x_{t,1}^2 & x_{t,1}\, x_{t,2} \\ x_{t,2}\, x_{t,1} & x_{t,2}^2 \end{bmatrix}

(6)

The diagonal entries are the squared firing rates. The off-diagonal entries are the products of the two neurons' rates. If both neurons fire above their mean at the same time, the off-diagonal product is positive. If one is above and the other below, it is negative. This one matrix captures the pattern of co-activation at a single moment.

Drag

u

and

v

to see the outer product matrix update. The heatmap shows how each entry is the product of one component from

u

and one from

v

Notice that each outer product has rank 1: every row is a scaled copy of every other row. One moment of data gives you a rank-1 snapshot of co-fluctuation.

Now average over all time bins:

C = \frac{1}{T} \sum_{t=1}^{T} x_t\, x_t^\top = \frac{1}{T}\, X^\top X

(7)

where $X$ is the centered data matrix (rows are time bins, columns are neurons). This is the covariance matrix. Entry $C_{jk}$ tells you how much neurons $j$ and $k$ tend to fluctuate together. Entry $C_{jj}$ is the variance of neuron $j$ .*Equation (7) uses $T$ in the denominator rather than $T-1$ . The sample covariance uses $T-1$ , which corrects for bias. The distinction does not matter for our purposes. What matters: always center first. If you forget, the first principal component may simply point toward the mean instead of capturing variance. Cunningham and Yu [4] and Yu et al. [8] (GPFA) both work with centered covariance structure as the starting point for neural dimensionality reduction.

Let's make this concrete. Suppose your two neurons have covariance matrix:

C = \begin{bmatrix} 4 & 3 \\ 3 & 9 \end{bmatrix}

(8)

Neuron 1 has variance 4, neuron 2 has variance 9, and their covariance is 3 (positively correlated: when one is above its mean, the other tends to be too). The total variance is $4 + 9 = 13$ , which is the trace.

Now: what are this matrix's eigenvectors?

The covariance ellipse of a 2D neural population. Drag the direction vector to read off the variance along any direction (

z^\top C z

). The maximum and minimum coincide with the eigenvectors.

The spectral theorem

The covariance matrix has two properties that make it much easier to work with than a general matrix.

First, it is symmetric: $C = C^\top$ . The covariance of neuron 1 with neuron 2 equals the covariance of neuron 2 with neuron 1. This follows from the outer product: $x_{t,1} x_{t,2} = x_{t,2} x_{t,1}$ .

Second, it is positive semidefinite: for any vector $z$ ,

z^\top C\, z \;\geq\; 0

(9)

Why? Expand it: $z^\top C z = \frac{1}{T} \sum_t (x_t^\top z)^2$ . A sum of squares. Always nonnegative. The quantity $z^\top C z$ has a name: it is the variance of the data projected onto the direction $z$ . So equation (9) says variance is never negative. Good.

These two properties together give the spectral theorem. For a symmetric matrix:

The eigenvectors are orthogonal. They point in perpendicular directions. And the eigenvalues are all real numbers (no complex eigenvalues, no oscillations).

For a positive semidefinite matrix, the eigenvalues are additionally nonnegative. The proof is one line: if $Cv = \lambda v$ , then $v^\top C v = \lambda \|v\|^2$ , and $v^\top C v \geq 0$ since $C$ is PSD, so $\lambda \geq 0$ . Zero eigenvalues mean the data do not move in that direction. Positive eigenvalues mean there is real variance there.

So the eigendecomposition of the covariance matrix simplifies to:

C = V \Lambda V^\top

(10)

where $V$ is orthogonal (its columns are perpendicular, each of unit length) and $\Lambda$ is diagonal with nonnegative entries. Compare this to the general eigendecomposition $A = V \Lambda V^{-1}$ from equation (5). For symmetric matrices, $V^{-1} = V^\top$ . The inverse is the transpose. Free. No computation. This is the payoff of orthogonality we saw in the basis post.*The spectral theorem says: every real symmetric matrix can be diagonalized by an orthogonal change of basis, with real eigenvalues. Most of multivariate statistics rests on this single fact.

The spectral theorem in action. Step through

Q^\top

(align eigenvectors),

\Lambda

(scale by eigenvalues),

Q

(rotate back). A symmetric matrix is a rotation, a scaling, and the reverse rotation.

Let's find the eigenvectors of our covariance matrix $C$ from equation (8). Solving the characteristic equation $\det(C - \lambda I) = 0$ : $(4 - \lambda)(9 - \lambda) - 9 = 0$ , which gives $\lambda^2 - 13\lambda + 27 = 0$ . The eigenvalues are $\lambda_1 = \tfrac{13 + \sqrt{61}}{2} \approx 10.41$ and $\lambda_2 = \tfrac{13 - \sqrt{61}}{2} \approx 2.59$ .

Check: $10.41 + 2.59 = 13$ , which is the trace (total variance). The eigenvalues split the total variance into two perpendicular components. The first eigenvector direction captures 10.41 out of 13 units of variance. The second captures 2.59. Most of the action is along one direction.

That direction is the first principal component.

How did we find those eigenvalues? We solved the characteristic equation $\det(C - \lambda I) = 0$ , which gives a polynomial whose roots are the eigenvalues. For a 2-by-2 matrix this is a quadratic; for an $n \times n$ matrix it is degree $n$ . Once you have the eigenvalues, the eigenvectors come from finding the null space of $C - \lambda I$ for each $\lambda$ .*Nobody solves the characteristic equation for large matrices. Iterative methods (QR algorithm, Lanczos) find eigenvalues without forming the polynomial. But the characteristic equation is the right way to understand what eigenvalues are: roots of a polynomial defined by the matrix.

Polynomials can have complex roots. A rotation matrix like $\bigl(\begin{smallmatrix} 0 & -1 \\ 1 & 0 \end{smallmatrix}\bigr)$ has eigenvalues $\pm i$ : no real direction is preserved, because a 90-degree rotation moves every direction. Complex eigenvalues come in conjugate pairs $a \pm bi$ , where $a$ controls growth/decay and $b$ controls rotation rate. For neural dynamics, complex eigenvalues mean the system oscillates. The rotational dynamics in motor cortex [3] correspond to complex eigenvalues of the estimated transition matrix.

Matrix powers make this precise. From $A = V\Lambda V^{-1}$ , raising to the $k$ -th power gives $A^k = V\Lambda^k V^{-1}$ . Each eigenvalue gets raised independently. A real eigenvalue with $|\lambda| < 1$ decays exponentially; with $|\lambda| > 1$ , it explodes. A complex eigenvalue with $|\lambda| < 1$ spirals inward; with $|\lambda| = 1$ , it orbits forever.

One more tool the covariance matrix gives us. Take a unit vector $z$ and compute $z^\top C\, z$ . Expanding with $C = \tfrac{1}{T} X^\top X$ gives $\tfrac{1}{T}\sum_t (x_t \cdot z)^2$ : the variance of the data projected onto direction $z$ . This is a quadratic form. The covariance matrix encodes a variance field: point in any direction and read off the spread. The eigenvectors are where this field is extremal: the largest eigenvalue is the maximum variance in any direction.

What if you want all directions to have the same variance? Define $W = \Lambda^{-1/2} V^\top$ and transform: $\tilde{x}_t = Wx_t$ . The new covariance is $WCW^\top = \Lambda^{-1/2} V^\top V \Lambda V^\top V \Lambda^{-1/2} = I$ . This is whitening: rotate to the eigenvector basis, then scale each direction by one over its standard deviation. The result has unit variance in every direction and zero correlation between directions. CCA needs this preprocessing to work, and we will see why in that post.

What comes next

We now have all the pieces. A neural population's firing-rate data gives you a covariance matrix. That matrix is symmetric and positive semidefinite, so the spectral theorem applies: it can be diagonalized by an orthogonal matrix of eigenvectors, with nonnegative eigenvalues on the diagonal.

The eigenvectors are perpendicular directions. In the eigenvector basis, the covariance matrix is diagonal: each direction has its own variance, and there is no correlation between directions. The eigenvalues tell you the variance along each eigenvector direction: the largest eigenvalue picks out the direction of maximum spread, the second-largest picks out the direction of maximum spread perpendicular to the first, and so on.

And because the eigenvector basis is orthonormal, extracting coordinates is just dot products (equation (2) from Post 2). Reconstructing from a subset of coordinates is just the truncated sum (equation (4) from that same post). Keep the first $k$ eigenvectors — the ones with the largest eigenvalues — and you capture as much variance as possible in $k$ dimensions.

That is PCA. No new operation. It is the spectral theorem applied to the covariance matrix, combined with the truncated reconstruction formula from the basis post. Every concept in this series played a role: vectors as population states, independence as the number of real dimensions, dot products as coordinates, basis change as the core modeling decision, column space as what a matrix preserves, projection as the best approximation in a subspace, and eigenvectors as the directions that make the covariance matrix diagonal.

One decomposition ties everything together. The singular value decomposition factorizes any matrix — square or rectangular, symmetric or not — into a rotation, a scaling, and another rotation. It turns out to be the computational backbone of PCA, CCA, least squares, and subspace identification. That is the next post.

References

Strang, G. Introduction to Linear Algebra, 6th ed. Wellesley-Cambridge Press, 2023.
3Blue1Brown. "Essence of Linear Algebra" video series, 2016.
Churchland, M. M., Cunningham, J. P., Kaufman, M. T., et al. "Neural population dynamics during reaching," Nature, vol. 487, pp. 51-56, 2012.
Cunningham, J. P. and Yu, B. M. "Dimensionality reduction for large-scale neural recordings," Nature Neuroscience, vol. 17, pp. 1500-1509, 2014.
Axler, S. Linear Algebra Done Right, 4th ed. Springer, 2024.
Strang, G. "The fundamental theorem of linear algebra," The American Mathematical Monthly, vol. 100, no. 9, pp. 848-855, 1993.
Safaie, M., Chang, J. C., Park, J., et al. "Preserved neural dynamics across animals performing similar behaviour," Nature, vol. 623, pp. 765-771, 2023.
Yu, B. M., Cunningham, J. P., Santhanam, G., et al. "Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity," Journal of Neurophysiology, vol. 102, no. 1, pp. 614-635, 2009.

← Back to home