At Cogita, we believe the next generation of AI should be more modular, specialized, efficient, auditable, and easier to debug—not simply larger and more opaque. Reaching that goal requires combining practical deployments with fundamental research into how neural networks represent and use information.

In the article below, our AI Lead, Maciej Satkiewicz, introduces Semantic Pullbacks: a new approach to understanding deep neural networks, developed through his research at 314 Foundation. The work is an early example of the bridge we want to build between foundational research and AI systems that can be inspected, improved, and deployed in practice.

We believe this kind of technical differentiation is particularly important for Europe, which may need to explore new directions rather than compete solely by scaling the approaches developed in the United States and China. The article is technical, but the broader question is simple: can we build AI that is not only powerful, but also more understandable and controllable?

How deep neural networks see the world

Deep neural networks are powerful, but they are still difficult to understand. We can train them, deploy them, fine-tune them, and measure their performance. But when we ask a simple question - what exactly influenced the model decision? - the answer is often surprisingly fragile.

In a linear model, the explanation is obvious. A weight vector points in the model’s preferred input direction. If we visualise that vector, we see what the model is looking for. The model computes a dot product between the input and the weight vector, so the weight vector directly tells us which pattern increases the score.

The question is - how to generalise this style of explanations to deeper models?

The common approach is to use the gradient, as it coincides with the weight vector for linear models. However, there’s a more natural candidate - a pullback.

The problem with gradients

A gradient tells us how the output changes under an infinitesimal change of the input. That is a sensitivity measure. But it is not necessarily a best description of what the neuron expects. Such a description should ideally tell us what input-space pattern the neuron is locally using as its preferred direction, similarly as the weight vector of the linear model.

Notice that at a given input, many layers behave like input-conditioned affine operators. ReLU gates switch on and off. Pooling layers select routes. Attention layers choose which tokens interact. Normalization layers change the local geometry of the computation. But for a given input all those switches are fixed.

Therefore, the network can be viewed as an input-dependent linear (or affine, in the presence of biases) computation. The natural explanation of a target neuron is then not the gradient, but the pointwise transposition of this effective operator, i.e. its adjoint action. This is what I call a pullback, inspired by the differential geometry.

Pullback: the right analogue of the linear weight vector

In a linear model, we have:

score = <weight, input>

The weight vector is the explanation because it represents the model’s preferred input direction.

For a deep network, at a fixed input, we can often write the computation locally as:

output = W(x) x

where W(x) is the network’s effective dynamic affine operator at input x.

If we choose a target neuron or class direction u, then its score can be represented as a dot product in input space:

score = <pullback, input>

The pullback is obtained by transporting the target direction backward through the transpose of the effective operator:

pullback = W(x)^T u

This is the direct generalisation of the linear model explanation.

The key point is subtle but important: the pullback is not generally the same as the gradient.

The gradient differentiates through how the effective operator changes with the input. It includes additional terms coming from gates, routing decisions, layer statistics, attention maps, and other input-dependent mechanisms.

The pullback does something different. It asks: given the computation the network actually used at this input, what input-space vector represents the action of this target neuron?

That is closer to the original intuition behind visualising a linear filter.

Soft Pullback: neurons represent features locally and partially

A standard pullback is already more aligned with the dynamic affine view of neural computation. But there is another issue.

Neural features are often not fully expressed at a single input point. They may be partially active, suppressed by a hard gate, or distributed across several weakly contributing components. A ReLU unit may be just below the threshold. A pooling layer may route most of the signal through one location while nearby alternatives still contain semantically relevant evidence.

This suggests that the meaningful explanation is not always the raw pointwise pullback, but the locally expected pullback: the pullback we would obtain by looking at a small neighbourhood around the input.

Sampling-based methods such as SmoothGrad already hint at this intuition. They add noise to the input, compute many gradients, and average them. This often produces more perceptually aligned explanations, but it is expensive and heuristic.

Semantic Pullbacks pursue the same idea more directly.

Instead of sampling many perturbed inputs, we modify the backward computation only. Hard or steep backward gates are softened. For example, a hard ReLU mask can be replaced in the backward pass by a smooth gate. The forward computation stays exactly the same. The model’s prediction does not change. Only the explanation rule changes.

This gives us a Soft Pullback: a tractable approximation of the locally expected pullback.

It recovers weak but consistently contributing components that the standard backward pass may suppress. In practice, this often turns noisy, fragmented explanations into more coherent structures.

Pullback Ascent: strengthening the locally preferred direction

Once we have a pullback vector field, we can move the input slightly in the direction preferred locally by a target neuron and recompute the pullback. Repeating this for a few steps gives Pullback Ascent.

This is analogous to gradient ascent, but with a crucial substitution: we ascend along the pullback direction, not the gradient direction.

The difference is visible. Gradient ascent on modern networks often creates noisy, adversarial-looking patterns. Pullback Ascent tends to reveal more coherent, class-conditional structures. It strengthens the locally preferred direction of the target neuron instead of merely amplifying raw sensitivity.

This makes it useful not only for attribution, but also for local counterfactuals. We can ask: what would need to become more visible in this image for the model to move toward a different class? Pullback Ascent provides a structured answer.

Empirical confirmation

In experiments on standard pretrained vision models, including convolutional architectures and transformer-based models, Semantic Pullbacks produced explanations that were more faithful, stable, target-specific, and significantly more perceptually aligned than standard gradient-based baselines.

The strongest result is conceptual: gradients are not the only natural backward signal in deep learning. If we want to understand what a neural network sees, we should not only ask how the output changes when the input changes. We should ask what input-space direction represents the network’s current computation  for the target neuron.

That direction seems to be the Semantic Pullback.

A unifying perspective on explainability

One of the most interesting outcomes of this work is that Semantic Pullbacks connect several ideas that previously looked separate, but the connection is more specific than simply saying that they all “improve gradients”.

B-cos-style models are especially close to our perspective. They already use the standard pullback as the explanation: they transport the output direction backward through the effective linear operator of the network. Their additional step is architectural and training-based: they modify the model and add alignment objectives so that the standard pullback becomes better aligned with the input.

Semantic Pullbacks take a different route. We do not change the forward model and we do not fine-tune it. Instead, we ask whether a better explanation can be obtained by computing a locally expected pullback directly on a standard pretrained network.

This also clarifies the relation to gradient smoothing methods such as SmoothGrad. These methods can be interpreted as trying to recover a local expectation by sampling noisy perturbations and averaging the resulting explanations. Semantic Pullbacks pursue a similar goal, but approximate the locally expected pullback through closed-form, layer-wise backward rules rather than stochastic sampling.

Pullback Ascent connects the method to feature accentuation. Standard feature accentuation follows gradients and therefore usually needs strong regularization to avoid producing noisy or adversarial-looking patterns. Replacing the gradient direction with the (soft) pullback direction gives a more coherent local ascent procedure: it strengthens the target neuron’s preferred direction without relying on heavy post-processing.

There is also a connection to robust optimization. Robust models often have more perceptually aligned gradients because their decision functions become more locally stable around the data manifold. From the pullback perspective, this resonates with the idea that models learn input-aligned features not necessarily at a single point, but in local expectation. Semantic Pullbacks expose this structure directly, without requiring adversarial training.

The broader message is that many successful explanation methods can be understood as different attempts to recover a stable, input-space direction associated with a target neuron, class, or feature. Semantic Pullbacks make this object explicit: not as a gradient, but as a locally expected pullback of the network’s effective computation.

Should pullbacks replace gradients?

Today, deep learning libraries treat gradients as a first-class primitive. But if deep networks are dynamic affine systems, then the adjoint transport of the neuron action should be available alongside the derivative. In other words, pullback should become a first-class citizen of deep learning libraries, next to gradient.

This would not require redesigning neural networks. In many layers, pullback and gradient already coincide. For linear layers, convolutions, and residual connections, the standard backward pass is enough. The small differences arise in a relatively small catalogue of mechanisms: gates, routing operations, normalization layers, and attention.

That makes the idea practical. Semantic Pullbacks can be implemented as custom backward rules while leaving the forward pass unchanged. Arguably, if pullbacks prove to be better for optimisation (see below), they may even replace gradients altogether, as they have already proven to be better for generating explanations!

What comes next

Semantic Pullbacks suggest a new way to explore and shape representation space. Beyond attribution, Pullback Ascent can be used to probe what structures a model associates with a neuron, class, or internal feature. This could support knowledge discovery in scientific domains, more meaningful counterfactuals and interpolations, and better diagnostics of failure modes.

The same perspective extends naturally to text. Semantic Pullbacks could help extract evidence behind a prediction, identify argumentative structures, and generate counterfactual variants that show what would need to change for a model to support a different claim, label, or answer.

They may also be useful for language and multimodal modelling. For language models, pullback-based attribution could ask which tokens, passages, or internal features most shaped the next-token prediction. In video and other sequential modalities, the same idea could help trace which frames, objects, or temporal cues drive a model’s continuation or decision.

The same perspective may also inform model editing, pruning, and continual learning. If pullbacks reveal which components carry coherent semantic evidence, they can help identify which parts of a model are useful, redundant, unstable, or responsible for a new behaviour.

A further open direction is training itself. Recent work suggests that changing the backward pass can improve learning. Semantic Pullbacks offer a broader interpretation of why: adjoint backward signals may provide a cleaner representation of the direction a neuron is locally using, instead of mixing it with effects from gates, routing, normalization, or attention. This has the potential to improve not only the explanations, but also the generalisation itself!

Let’s talk!

If you are interested in working on Semantic Pullbacks, alternative backward passes, interpretability for language models, or pullback-based training and adaptation, we are happy to talk!

Note: The research described in this article was conducted at 314 Foundation with collaboration with American University and AGH University of Kraków. Cogita is publishing this article as a friendly host and partner in the broader AI community. The paper preprint can be found here: https://arxiv.org/abs/2507.22832 with the interactive demo here: https://huggingface.co/spaces/msat/SemanticPullbacks.

Polish Office
COGITA Sp. z o.o.

ul. Łąkowa 4

42-282 Widzów, Poland
UK Office
COGITA.AI Limited
93 Tanorth Road
Bristol, BS14 0NT, England
Services
Solutions
Resources
Cogita