Eigenvectors and covariance

Finding the directions a transformation leaves alone.

Directions that stay put

Take a matrix and apply it to a vector. In general, the output points in a different direction than the input. The vector gets rotated, sheared, sent somewhere new. That is what matrices do.

But try this specific matrix and this specific vector:

[3102][10]=[30]\begin{bmatrix} 3 & 1 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 3 \\ 0 \end{bmatrix}
(1)

The input is (1,0)(1, 0). The output is (3,0)(3, 0). Same direction. The vector was stretched by a factor of 3, but it did not rotate. The matrix, which in general rotates and shears everything, left this particular direction alone.

Is that a coincidence of the input we chose? Let's try another:

[3102][11]=[22]\begin{bmatrix} 3 & 1 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \end{bmatrix} = \begin{bmatrix} 2 \\ -2 \end{bmatrix}
(2)

The input is (1,1)(1, -1). The output is (2,2)(2, -2). Again: same direction, just scaled, this time by a factor of 2. A second direction that the matrix does not rotate.

Now try something generic, like (1,1)(1, 1): A(1,1)=(4,2)A(1,1) = (4, 2). That is not a multiple of (1,1)(1, 1). This vector does get rotated. So the property is special to particular directions.

These special directions are called eigenvectors, and they turn out to be the key to PCA, to stability analysis in dynamical systems, and to understanding covariance matrices.

Eigenvectors and eigenvalues

A direction that a matrix scales but does not rotate is called an eigenvector. The scale factor is its eigenvalue. In symbols:

Av=λvAv = \lambda v
(3)

The matrix AA acts on vv and produces the same vector vv, multiplied by the number λ\lambda. For the matrix above, we found two eigenvectors: (1,0)(1, 0) with eigenvalue 3, and (1,1)(1, -1) with eigenvalue 2.

What does the eigenvalue tell you? If λ>1\lambda > 1, the vector gets stretched. If 0<λ<10 < \lambda < 1, it gets compressed. If λ<0\lambda < 0, it gets flipped. If λ=0\lambda = 0, it gets sent to zero: the matrix annihilates that direction entirely. (That is a null-space direction, from the previous post.)

λ₁vλ₂vvvA =[3.00 1.000.00 2.00]λ₁ = 3λ₂ = 2eigenvectors stay on their line — they only scale
A circle of unit vectors deforms into an ellipse under the transformation. The highlighted directions are eigenvectors: they scale but do not rotate. Click Diagonalize to switch to the eigenvector basis and watch the matrix simplify.

Watch the figure. A circle of unit vectors gets mapped to an ellipse. Most directions change angle. But the eigenvector directions stay on their original line: they land on the ellipse at exactly the point you would reach by walking along the same direction, just farther or closer to the origin. The eigenvalue is the stretch factor.Some matrices have fewer than nn independent eigenvectors (defective matrices). Others have complex eigenvalues, which correspond to rotations rather than pure scalings. We set both aside here. Complex eigenvalues will matter in the dynamics posts, where they produce oscillations.

For neural dynamics, eigenvalues control stability. If your system evolves as xt+1=Axtx_{t+1} = Ax_t, then along an eigenvector direction the system simply scales: Akv=λkvA^k v = \lambda^k v. When λ<1|\lambda| < 1, the component decays. When λ>1|\lambda| > 1, it explodes. The eigenvalues are the knobs controlling how each independent mode of the dynamics grows or shrinks.

Diagonalization

In Post 2, we said that the same transformation looks different in different bases. A messy matrix in one basis might become simple in another. Eigenvectors tell you which basis makes it simplest.

Our matrix has eigenvectors v1=(1,0)v_1 = (1, 0) and v2=(1,1)v_2 = (1, -1). These are two independent vectors, so they form a basis. What does the matrix look like in this basis?

In the eigenvector basis, the matrix sends v1v_1 to 3v13v_1 and v2v_2 to 2v22v_2. No mixing between the two directions. The matrix, which in the standard basis had an off-diagonal entry (the 1 in the top-right corner, coupling the two coordinates), becomes diagonal:

Aeigenbasis=[3002]A_{\text{eigenbasis}} = \begin{bmatrix} 3 & 0 \\ 0 & 2 \end{bmatrix}
(4)

Each axis scales independently. No cross-talk. This is the eigendecomposition. Stack the eigenvectors as columns of a matrix VV and the eigenvalues on the diagonal of Λ\Lambda:

A=VΛV1A = V \Lambda V^{-1}
(5)

Read this as a three-step process. Convert from the standard basis to the eigenvector basis (multiply by V1V^{-1}). Scale each coordinate independently by its eigenvalue (multiply by Λ\Lambda). Convert back (multiply by VV). In the eigenvector basis, the transformation has no mixing between dimensions. All the complexity of the matrix was just a consequence of looking at it in the wrong coordinates.

Standard basisCustom basisAeAeA =[]3.00 1.000.00 2.00ppB = P⁻¹AP[]3.06 -0.210.29 1.94λ₁ = 3 λ₂ = 2invariant under change of basistr(A) = 5 det(A) = 6 (also invariant)drag basis vectors p, p to change the basis
Same transformation, two bases. Drag the right-panel basis to find the coordinates that make the matrix simplest. The eigenvalues (shown below) never change.

The eigenvalues are invariant. Switch to any basis and they stay the same: they are a property of the transformation, not of the coordinates. The trace (sum of diagonal entries) equals λ1+λ2++λn\lambda_1 + \lambda_2 + \cdots + \lambda_n. The determinant equals λ1λ2λn\lambda_1 \cdot \lambda_2 \cdots \lambda_n. Both are invariant too. The entries of the matrix change with the basis. These quantities do not.Two matrices related by B=P1APB = P^{-1}AP for some invertible PP are called similar. Similar matrices represent the same transformation in different bases. They share eigenvalues, trace, determinant, and rank.

The covariance matrix

Now let's build a specific matrix from neural data and see what its eigenvectors tell us.

You record from two neurons over many time bins. Center the data: subtract each neuron's mean, so the average firing rate is zero. At each time bin tt, you have a centered firing-rate vector xt=(xt,1,  xt,2)x_t = (x_{t,1},\; x_{t,2}).

Take one time bin and form the outer product: xtxtx_t x_t^\top. This is a 2-by-2 matrix:

xtxt=[xt,1xt,2][xt,1xt,2]=[xt,12xt,1xt,2xt,2xt,1xt,22]x_t\, x_t^\top = \begin{bmatrix} x_{t,1} \\ x_{t,2} \end{bmatrix} \begin{bmatrix} x_{t,1} & x_{t,2} \end{bmatrix} = \begin{bmatrix} x_{t,1}^2 & x_{t,1}\, x_{t,2} \\ x_{t,2}\, x_{t,1} & x_{t,2}^2 \end{bmatrix}
(6)

The diagonal entries are the squared firing rates. The off-diagonal entries are the products of the two neurons' rates. If both neurons fire above their mean at the same time, the off-diagonal product is positive. If one is above and the other below, it is negative. This one matrix captures the pattern of co-activation at a single moment.

uvu = (1.5, 1.0)v = (1.2, -0.8)uvᵀu₁u₂v₁v₂u₁v₁1.80u₁v₂-1.20u₂v₁1.20u₂v₂-0.80drag arrow tips to move vectors
Drag uu and vv to see the outer product matrix update. The heatmap shows how each entry is the product of one component from uu and one from vv.

Notice that each outer product has rank 1: every row is a scaled copy of every other row. One moment of data gives you a rank-1 snapshot of co-fluctuation.

Now average over all time bins:

C=1Tt=1Txtxt=1TXXC = \frac{1}{T} \sum_{t=1}^{T} x_t\, x_t^\top = \frac{1}{T}\, X^\top X
(7)

where XX is the centered data matrix (rows are time bins, columns are neurons). This is the covariance matrix. Entry CjkC_{jk} tells you how much neurons jj and kk tend to fluctuate together. Entry CjjC_{jj} is the variance of neuron jj.Equation (7) uses TT in the denominator rather than T1T-1. The sample covariance uses T1T-1, which corrects for bias. The distinction does not matter for our purposes. What matters: always center first. If you forget, the first principal component may simply point toward the mean instead of capturing variance. Cunningham and Yu [4] and Yu et al. [8] (GPFA) both work with centered covariance structure as the starting point for neural dimensionality reduction.

Let's make this concrete. Suppose your two neurons have covariance matrix:

C=[4339]C = \begin{bmatrix} 4 & 3 \\ 3 & 9 \end{bmatrix}
(8)

Neuron 1 has variance 4, neuron 2 has variance 9, and their covariance is 3 (positively correlated: when one is above its mean, the other tends to be too). The total variance is 4+9=134 + 9 = 13, which is the trace.

Now: what are this matrix's eigenvectors?

λ₁ = 2.86λ₂ = 0.40zCovariance C[1.75 1.22][1.22 1.52]λ₁ = 2.863λ₂ = 0.405Variance along zzᵀCz = 2.751λ₂λ₁Neuron 1Neuron 2
The covariance ellipse of a 2D neural population. Drag the direction vector to read off the variance along any direction (zCzz^\top C z). The maximum and minimum coincide with the eigenvectors.

The spectral theorem

The covariance matrix has two properties that make it much easier to work with than a general matrix.

First, it is symmetric: C=CC = C^\top. The covariance of neuron 1 with neuron 2 equals the covariance of neuron 2 with neuron 1. This follows from the outer product: xt,1xt,2=xt,2xt,1x_{t,1} x_{t,2} = x_{t,2} x_{t,1}.

Second, it is positive semidefinite: for any vector zz,

zCz    0z^\top C\, z \;\geq\; 0
(9)

Why? Expand it: zCz=1Tt(xtz)2z^\top C z = \frac{1}{T} \sum_t (x_t^\top z)^2. A sum of squares. Always nonnegative. The quantity zCzz^\top C z has a name: it is the variance of the data projected onto the direction zz. So equation (9) says variance is never negative. Good.

These two properties together give the spectral theorem. For a symmetric matrix:

The eigenvectors are orthogonal. They point in perpendicular directions. And the eigenvalues are all real numbers (no complex eigenvalues, no oscillations).

For a positive semidefinite matrix, the eigenvalues are additionally nonnegative. The proof is one line: if Cv=λvCv = \lambda v, then vCv=λv2v^\top C v = \lambda \|v\|^2, and vCv0v^\top C v \geq 0 since CC is PSD, so λ0\lambda \geq 0. Zero eigenvalues mean the data do not move in that direction. Positive eigenvalues mean there is real variance there.

So the eigendecomposition of the covariance matrix simplifies to:

C=VΛVC = V \Lambda V^\top
(10)

where VV is orthogonal (its columns are perpendicular, each of unit length) and Λ\Lambda is diagonal with nonnegative entries. Compare this to the general eigendecomposition A=VΛV1A = V \Lambda V^{-1} from equation (5). For symmetric matrices, V1=VV^{-1} = V^\top. The inverse is the transpose. Free. No computation. This is the payoff of orthogonality we saw in the basis post.The spectral theorem says: every real symmetric matrix can be diagonalized by an orthogonal change of basis, with real eigenvalues. Most of multivariate statistics rests on this single fact.

e₁e₂C = QΛQᵀcurrent: I[1, 0][0, 1]λ₁10.41 λ₂2.59unit circle with eigenvectors of C marked
The spectral theorem in action. Step through QQ^\top (align eigenvectors), Λ\Lambda (scale by eigenvalues), QQ (rotate back). A symmetric matrix is a rotation, a scaling, and the reverse rotation.

Let's find the eigenvectors of our covariance matrix CC from equation (8). Solving the characteristic equation det(CλI)=0\det(C - \lambda I) = 0: (4λ)(9λ)9=0(4 - \lambda)(9 - \lambda) - 9 = 0, which gives λ213λ+27=0\lambda^2 - 13\lambda + 27 = 0. The eigenvalues are λ1=13+61210.41\lambda_1 = \tfrac{13 + \sqrt{61}}{2} \approx 10.41 and λ2=136122.59\lambda_2 = \tfrac{13 - \sqrt{61}}{2} \approx 2.59.

Check: 10.41+2.59=1310.41 + 2.59 = 13, which is the trace (total variance). The eigenvalues split the total variance into two perpendicular components. The first eigenvector direction captures 10.41 out of 13 units of variance. The second captures 2.59. Most of the action is along one direction.

That direction is the first principal component.

How did we find those eigenvalues? We solved the characteristic equation det(CλI)=0\det(C - \lambda I) = 0, which gives a polynomial whose roots are the eigenvalues. For a 2-by-2 matrix this is a quadratic; for an n×nn \times n matrix it is degree nn. Once you have the eigenvalues, the eigenvectors come from finding the null space of CλIC - \lambda I for each λ\lambda.Nobody solves the characteristic equation for large matrices. Iterative methods (QR algorithm, Lanczos) find eigenvalues without forming the polynomial. But the characteristic equation is the right way to understand what eigenvalues are: roots of a polynomial defined by the matrix.

Polynomials can have complex roots. A rotation matrix like (0110)\bigl(\begin{smallmatrix} 0 & -1 \\ 1 & 0 \end{smallmatrix}\bigr) has eigenvalues ±i\pm i: no real direction is preserved, because a 90-degree rotation moves every direction. Complex eigenvalues come in conjugate pairs a±bia \pm bi, where aa controls growth/decay and bb controls rotation rate. For neural dynamics, complex eigenvalues mean the system oscillates. The rotational dynamics in motor cortex [3] correspond to complex eigenvalues of the estimated transition matrix.

Matrix powers make this precise. From A=VΛV1A = V\Lambda V^{-1}, raising to the kk-th power gives Ak=VΛkV1A^k = V\Lambda^k V^{-1}. Each eigenvalue gets raised independently. A real eigenvalue with λ<1|\lambda| < 1 decays exponentially; with λ>1|\lambda| > 1, it explodes. A complex eigenvalue with λ<1|\lambda| < 1 spirals inward; with λ=1|\lambda| = 1, it orbits forever.

One more tool the covariance matrix gives us. Take a unit vector zz and compute zCzz^\top C\, z. Expanding with C=1TXXC = \tfrac{1}{T} X^\top Xgives 1Tt(xtz)2\tfrac{1}{T}\sum_t (x_t \cdot z)^2: the variance of the data projected onto direction zz. This is a quadratic form. The covariance matrix encodes a variance field: point in any direction and read off the spread. The eigenvectors are where this field is extremal: the largest eigenvalue is the maximum variance in any direction.

What if you want all directions to have the same variance? Define W=Λ1/2VW = \Lambda^{-1/2} V^\top and transform: x~t=Wxt\tilde{x}_t = Wx_t. The new covariance is WCW=Λ1/2VVΛVVΛ1/2=IWCW^\top = \Lambda^{-1/2} V^\top V \Lambda V^\top V \Lambda^{-1/2} = I. This is whitening: rotate to the eigenvector basis, then scale each direction by one over its standard deviation. The result has unit variance in every direction and zero correlation between directions. CCA needs this preprocessing to work, and we will see why in that post.

What comes next

We now have all the pieces. A neural population's firing-rate data gives you a covariance matrix. That matrix is symmetric and positive semidefinite, so the spectral theorem applies: it can be diagonalized by an orthogonal matrix of eigenvectors, with nonnegative eigenvalues on the diagonal.

The eigenvectors are perpendicular directions. In the eigenvector basis, the covariance matrix is diagonal: each direction has its own variance, and there is no correlation between directions. The eigenvalues tell you the variance along each eigenvector direction: the largest eigenvalue picks out the direction of maximum spread, the second-largest picks out the direction of maximum spread perpendicular to the first, and so on.

And because the eigenvector basis is orthonormal, extracting coordinates is just dot products (equation (2) from Post 2). Reconstructing from a subset of coordinates is just the truncated sum (equation (4) from that same post). Keep the first kk eigenvectors — the ones with the largest eigenvalues — and you capture as much variance as possible in kk dimensions.

That is PCA. No new operation. It is the spectral theorem applied to the covariance matrix, combined with the truncated reconstruction formula from the basis post. Every concept in this series played a role: vectors as population states, independence as the number of real dimensions, dot products as coordinates, basis change as the core modeling decision, column space as what a matrix preserves, projection as the best approximation in a subspace, and eigenvectors as the directions that make the covariance matrix diagonal.

One decomposition ties everything together. The singular value decomposition factorizes any matrix — square or rectangular, symmetric or not — into a rotation, a scaling, and another rotation. It turns out to be the computational backbone of PCA, CCA, least squares, and subspace identification. That is the next post.

References

  1. Strang, G. Introduction to Linear Algebra, 6th ed. Wellesley-Cambridge Press, 2023.
  2. 3Blue1Brown. "Essence of Linear Algebra" video series, 2016.
  3. Churchland, M. M., Cunningham, J. P., Kaufman, M. T., et al. "Neural population dynamics during reaching," Nature, vol. 487, pp. 51-56, 2012.
  4. Cunningham, J. P. and Yu, B. M. "Dimensionality reduction for large-scale neural recordings," Nature Neuroscience, vol. 17, pp. 1500-1509, 2014.
  5. Axler, S. Linear Algebra Done Right, 4th ed. Springer, 2024.
  6. Strang, G. "The fundamental theorem of linear algebra," The American Mathematical Monthly, vol. 100, no. 9, pp. 848-855, 1993.
  7. Safaie, M., Chang, J. C., Park, J., et al. "Preserved neural dynamics across animals performing similar behaviour," Nature, vol. 623, pp. 765-771, 2023.
  8. Yu, B. M., Cunningham, J. P., Santhanam, G., et al. "Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity," Journal of Neurophysiology, vol. 102, no. 1, pp. 614-635, 2009.
← Back to home