Computation and Waste

Every computational system produces waste. Not in the colloquial sense of unused cycles or idle memory, but in a precise thermodynamic sense: information is destroyed, structure is discarded, and entropy increases. This is not a side effect. It is the default behavior of nearly every architecture in use today.

Consider what happens inside a standard neural network. An input tensor passes through a sequence of layers. At each stage, dimensionality is reduced, activations are clipped, and intermediate representations are overwritten. The system retains what it deems relevant and discards the rest. But relevance is determined by a loss function defined at the end of the pipeline. The intermediate layers have no way of knowing what they are throwing away.

This is not a new observation. Rolf Landauer established in 1961 that erasing a single bit of information dissipates a minimum of kT ln 2 joules of energy, where k is Boltzmann's constant and T is the temperature of the environment. The principle is subtle but absolute: any logically irreversible operation has a thermodynamic cost. Erasure is not free. It never was.

The implications extend far beyond physics. Every time a computational system discards information it cannot recover, it pays a cost. Not always in energy, but in capability. The discarded structure might have been useful downstream. The erased gradient might have contained a correction signal. The pooled feature map might have held spatial relationships that mattered three layers later.

Current practice treats this as acceptable. The reasoning is pragmatic: storage is cheap, compute is abundant, and we can always retrain. But this framing confuses affordability with efficiency. The fact that we can absorb waste does not mean the waste is harmless. It accumulates. It compounds. Systems trained with high waste require more data, more parameters, and more energy to reach the same performance as systems that preserve structure.

The most efficient computation is the one that preserves the most structure.

This is not an optimization target. It is a design principle. Efficiency retrofitted onto a wasteful architecture will always be bounded by the architecture's inherent information loss. You cannot recover what has already been erased. You can only prevent the erasure from happening in the first place.

Think of it structurally. A transformation that maps a high-dimensional input to a lower-dimensional output is, by definition, many-to-one. Multiple distinct inputs produce the same output. The inverse does not exist. Once the transformation is applied, the original distinctions are gone. No amount of post-hoc processing can reconstruct them.

The alternative is to design systems where transformations are invertible by construction. Where the forward pass does not destroy information but rearranges it. Where compression is achieved not by discarding dimensions but by finding more compact representations that remain fully recoverable.

This is harder. Invertible transformations impose constraints on architecture, on parameterization, on training dynamics. They demand that the designer think carefully about what the system preserves, not just what it produces. But the payoff is substantial. Systems that preserve structure waste less energy, require less data, generalize more robustly, and degrade more gracefully under distribution shift.

Waste in computation is not inevitable. It is a design choice, made implicitly every time we build a system that treats intermediate information as disposable. The question is whether we continue to accept that choice or begin to design around it.

Landauer's principle tells us that erasure has a floor cost. It does not tell us we must erase.