Vectors and neural geometry

Representing neural populations as points in a shared space.

PublishedMarch 2026

Reading time

SeriesLinear Algebra for Neural Data, Part 1

Population vectors

Suppose you stick an electrode array into motor cortex and record from a hundred neurons while a monkey reaches to a target. You get a hundred time-varying traces, one per neuron. The natural thing to do is plot them. Pick a neuron, look at its firing rate, note when it peaks. Repeat for the next neuron. Build up a picture of the population one cell at a time.

This works for a while. Many motor cortex neurons have clean, interpretable responses. But you have a hundred of them, each with its own time-varying trace, across dozens of conditions. At some point, a hundred overlaid traces stop being informative and start being a wall of spaghetti.

So let's try a completely different representation. At a single moment in time, each neuron has a firing rate. Neuron 1 fires at 12 spikes/s, neuron 2 at 7, neuron 3 at 31, and so on. Write these hundred numbers as a column:

r = \begin{bmatrix} 12 \\ 7 \\ 31 \\ \vdots \\ r_{100} \end{bmatrix}

(1)

That column is a vector. And here is the key move: we can think of it as a single point in a hundred-dimensional space, where each axis corresponds to one neuron's firing rate. The next time bin gives a different column, a different point. As time passes, the point moves, tracing out a trajectory through this high-dimensional space.

Try this in the figure below. Each axis is a neuron. Click different neurons to choose which ones define the axes you see, and watch the trajectory reshape itself.

Click neurons to choose which ones define the axes. The trajectory changes shape, but the underlying activity is the same. Only the viewpoint has changed.

Notice what happened. The trajectory looks completely different depending on which neurons you pick as axes. But the data did not change. The firing rates are identical. Only the coordinates changed. The trajectory is a geometric object that exists independently of how you describe it. The coordinates are just one particular description.*The idea of treating neural population activity as a trajectory through a high-dimensional state space goes back to Churchland et al. [3] and is now a standard framework in computational neuroscience. Cunningham and Yu [4] give a review, and Safaie et al. [7] show that these low-dimensional trajectories are preserved across individuals of the same species.

This distinction between object and description will turn out to be the central idea behind dimensionality reduction. But before we can talk about finding better descriptions, we need to understand what you can do with vectors.

Two operations matter. You can scale a vector (multiply every entry by the same number, stretching or shrinking it). And you can add two vectors (add their entries, which geometrically places them tip to tail). Any collection of objects that supports these two operations, with a short list of rules about how they interact, forms a vector space.*Axler [5] has the formal definition. The key implication: anything that can be added and scaled is a vector. Columns of firing rates qualify. But so do entire firing-rate trajectories $r(t)$ , matrices, and polynomials. The same theory covers all of them. For the rest of this post, our vectors will be concrete columns of real numbers.

Scaling and adding seem like modest operations. It is surprising how much structure they give you.

Linear combinations and span

Start small. Forget the hundred neurons. Think about two vectors, $u$ and $v$ . Scale each by a number and add:

w = \alpha\, u + \beta\, v

(2)

The result $w$ is called a linear combination. Try it below: adjust the scalars and watch the resultant vector move.

α = 1.0

β = 0.5

Adjust

\alpha

and

\beta

to scale the two vectors. The resultant (teal) is their sum.

Now let $\alpha$ and $\beta$ range over all real numbers. What is the set of all vectors you can produce this way?

It depends on $u$ and $v$ . If they point in genuinely different directions, you can reach any point on a plane. Every point on that plane corresponds to some choice of $\alpha$ and $\beta$ . This set of reachable vectors is the span of $\{u, v\}$ .

But what if $v$ happens to lie along the same line as $u$ ? Say, $v = 3u$ . Then $\alpha u + \beta v = (\alpha + 3\beta)\, u$ . No matter what scalars you pick, you can only produce multiples of $u$ . The span collapsed from a plane to a line. Adding $v$ bought you nothing.

Drag the vectors in the figure below and watch this happen. When they point in different directions, the shaded region fills the plane. When one is a scaled copy of the other, the span collapses.

Drag the two vectors. When they are independent, their span fills the plane (shaded). When they are collinear, the span collapses to a line.

This collapse is exactly what we want to detect. A set of vectors is linearly independent when none of them can be written as a linear combination of the others. Equivalently: the only way to combine them to get the zero vector is to use all-zero scalars. Each independent vector adds a genuine new direction to the span. Each dependent vector adds nothing.

The same idea extends to any number of vectors. Three independent vectors in three dimensions span all of $\mathbb{R}^3$ . But if the third is a combination of the first two, you are still stuck in a plane. Four vectors in three dimensions must be dependent: there is no room for a fourth independent direction.

Independence and dimensionality

Here is where this connects back to the hundred-neuron problem.

Imagine that neuron 47 always fires at exactly twice the rate of neuron 12. Always. Across every condition, every time bin. If you already recorded neuron 12, then neuron 47 tells you nothing new. Its firing-rate vector is a scalar multiple of neuron 12's. It is linearly dependent on what you already have.

Now suppose instead that neuron 47 responds during grasping, while neuron 12 responds during reaching. These are genuinely different patterns. Recording both tells you something about the population that recording either one alone could not. Neuron 47 is independent of neuron 12.

Let's push this further. Suppose you go through all hundred neurons and find that, for every neuron beyond the first ten, its firing rate can be predicted as a weighted sum of those ten. Then the activity of the entire population is confined to a ten-dimensional subspace of the hundred-dimensional neuron space. Ten independent patterns explain everything. The other ninety neurons are redundant: they carry no information that the ten do not.*In practice, neural firing rates are never exactly linearly dependent. Noise ensures that. But they are often approximately so. Gallego et al. [8] showed that motor cortex populations of hundreds of neurons typically have an effective dimensionality of 10–20, meaning the activity is confined near a low-dimensional subspace. The degree of approximate confinement is precisely what makes PCA, factor analysis, and GPFA useful [4].

This is the core observation behind dimensionality reduction: neural populations are redundant. Their activity, despite nominally living in a space with as many dimensions as there are neurons, stays close to a much smaller subspace. Finding that subspace is the goal. And "finding a subspace" turns out to mean "finding a good set of independent directions."

Span and independence are not just abstract concepts. They tell you how many dimensions of structure your population actually uses. But knowing the dimensionality is only half the story. You also want to know whether two activity patterns are similar, whether they are different, and how different. For that, you need a way to measure.

The dot product

You record two vectors of firing rates: one from a leftward reach, one from a rightward reach. Are these population patterns similar or different? And can you quantify the answer?

Here is a natural idea. Go neuron by neuron: multiply the two firing rates, then add up all the products.

u \cdot v = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n

(3)

This is the dot product. Let's think about what it captures. If neuron $i$ fires strongly in both conditions, the product $u_i v_i$ is large and positive, pulling the sum up. If it fires strongly in one condition and barely in the other, the product is small. If it fires in one and is suppressed in the other (positive times negative), the product is negative, pulling the sum down.

The result is a single number that summarizes how much the two patterns overlap across the whole population. Large and positive: the same neurons are active in both conditions. Near zero: they activate different neurons. Negative: they tend to be anti-correlated.

Let's make this concrete. Suppose you have three neurons. During a leftward reach, $u = (8, 2, 1)$ . During a rightward reach, $v = (1, 3, 9)$ . The dot product is $8(1) + 2(3) + 1(9) = 23$ . Positive, but not huge. Now compare $u$ with itself: $8(8) + 2(2) + 1(1) = 69$ . Much larger. And compare $u$ with the pattern $w = (-8, -2, -1)$ : you get $-69$ . The dot product tracks what your intuition expects.*Representational similarity analysis (RSA) [9] is exactly this idea applied systematically. You compute the dot product (or cosine similarity, or correlation) between every pair of condition-averaged population vectors, building a "representational dissimilarity matrix." The geometry of that matrix tells you how the population organizes its representations. The dot product is the foundation.

There is a geometric way to state the same thing:

u \cdot v = \|u\|\;\|v\|\;\cos\theta

(4)

where $\theta$ is the angle between the two vectors and $\|u\|$ is the length of $u$ . When the vectors point in the same direction, the cosine is 1 and the dot product is as large as the lengths allow. When they are perpendicular, the cosine is zero and the dot product vanishes. Perpendicular vectors have a name: orthogonal.

This geometric picture gives you a way to ask: how much of one pattern lies along a particular direction? Imagine shining a flashlight perpendicular to some direction $v$ and looking at the shadow $u$ casts along it. The signed length of that shadow is the scalar projection:

\text{proj}_v\, u = \frac{u \cdot v}{\|v\|}

(5)

When $v$ has unit length, the denominator is 1. The projection is just $u \cdot v$ . A single dot product. No division, no correction.

When we get to PCA in a later post, the entire computation will reduce to dot products with unit vectors. "How much of this data point lies along this principal component?" is answered by one dot product. That only works because unit-length reference vectors make the projection formula collapse to a single operation.*This simplification is so convenient that much of applied linear algebra is devoted to constructing sets of unit-length, mutually perpendicular vectors (orthonormal sets) with specific properties. PCA, the Fourier transform, and wavelet decompositions all amount to choosing an orthonormal set tailored to a particular problem.

Drag the two vectors. The projection of

u

onto

v

is drawn as a shadow. When the vectors are perpendicular, the dot product is zero and the shadow vanishes.

One more thing worth noticing. Dotting a vector with itself gives $v \cdot v = \|v\|^2$ , the squared length. So the dot product does double duty: it measures similarity between two vectors, and size of a single vector. This connection is not a coincidence. Both jobs come from the same underlying structure, called an inner product. The inner product on functions, $\langle f, g \rangle = \int f(t)\, g(t)\, dt$ , is what makes Fourier analysis work. Every inner product gives you lengths, angles, projections, and orthogonality. The dot product is the version for columns of numbers.

Norms and distance

The dot product gave us a way to measure length: $\|v\| = \sqrt{v \cdot v}$ . This is called the $L^2$ norm:

\|v\|_2 = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2}

(6)

It is the most common way to measure size. But it is not the only way, and the choice has consequences that are easy to miss.

Consider two firing-rate vectors. In the first, one neuron fires at 100 spikes/s and the other ninety-nine are silent. In the second, all hundred neurons fire at 1 spike/s. Both vectors have the same total spike count: 100. But their $L^2$ norms are 100 and $\sqrt{100} = 10$ . The squaring inside the norm amplifies the single dominant entry. By this measure, the concentrated pattern is ten times "larger" than the distributed one, even though total activity is identical.

That might or might not match what you care about. If you want a measure that treats total activity as size, you want the $L^1$ norm, which sums absolute values: $\|v\|_1 = |v_1| + \cdots + |v_n|$ . Under this norm, both patterns have size 100. If you care only about the single most active neuron, you want the $L^\infty$ norm: $\|v\|_\infty = \max_i |v_i|$ , which gives 100 for the first pattern and 1 for the second.

These are all special cases of a single family:

\|v\|_p = \left(\sum_{i=1}^{n} |v_i|^p\right)^{1/p}

(7)

Each value of $p$ gives a different geometry. You can see this by looking at the unit ball, the set of all vectors with norm at most 1. Adjust $p$ below and watch the shape change.

p = 2

Adjust

p

to see how the unit ball changes shape. At

p = 2

you get the familiar circle. At

p = 1

, a diamond. As

p \to \infty

, a square.

Why does this matter? Because when you say two population activity patterns are "close," you are implicitly choosing a norm. The distance between vectors $u$ and $v$ is $d(u,v) = \|u - v\|$ . Change the norm, and two patterns that seemed close can become far apart, or the reverse.*Most dimensionality reduction methods (PCA, factor analysis, GPFA) implicitly use the $L^2$ norm, inherited from the dot product. Sparse methods use $L^1$ penalties precisely because the $L^1$ ball has corners on the coordinate axes, which encourages solutions where some coordinates are exactly zero.

Everything in this series uses the $L^2$ norm unless stated otherwise. But the choice exists, and it shapes the answers you get. When a method "finds the nearest point" or "minimizes distance," ask: distance in what sense?

What comes next

These are the building blocks for almost every linear method in computational neuroscience. PCA finds the directions of greatest variance, one dot product at a time. CCA compares projections across two datasets. Linear decoding builds a map from neural space to behavioral space, which is a stack of dot products.

But we left something unfinished. When you clicked different neurons as axes in the first figure, the trajectory changed shape. The data did not change. The description did. The electrode gave you one set of axes. Anatomy chose it, not anything about the structure in the data. There should be a way to choose axes that make the structure visible. And there should be a way to convert between different sets of axes.

That is the problem of choosing a basis and converting between coordinate systems. It is what the next post is about. And it is, in a sense, the whole point: PCA, CCA, and PSID all amount to choosing the right basis for the question you are asking. What changes from method to method is what "right" means.

References

Strang, G. Introduction to Linear Algebra, 6th ed. Wellesley-Cambridge Press, 2023.
3Blue1Brown. "Essence of Linear Algebra" video series, 2016.
Churchland, M. M., Cunningham, J. P., Kaufman, M. T., et al. "Neural population dynamics during reaching," Nature, vol. 487, pp. 51-56, 2012.
Cunningham, J. P. and Yu, B. M. "Dimensionality reduction for large-scale neural recordings," Nature Neuroscience, vol. 17, pp. 1500-1509, 2014.
Axler, S. Linear Algebra Done Right, 4th ed. Springer, 2024.
Strang, G. "The fundamental theorem of linear algebra," The American Mathematical Monthly, vol. 100, no. 9, pp. 848-855, 1993.
Safaie, M., Chang, J. C., Park, J., et al. "Preserved neural dynamics across animals performing similar behaviour," Nature, vol. 623, pp. 765-771, 2023.
Gallego, J. A., Perich, M. G., Miller, L. E., and Solla, S. A. "Neural manifolds for the control of movement," Neuron, vol. 94, no. 5, pp. 978-984, 2017.
Kriegeskorte, N., Mur, M., and Bandettini, P. A. "Representational similarity analysis — connecting the branches of systems neuroscience," Frontiers in Systems Neuroscience, vol. 2, 4, 2008.

← Back to home