antifold

Research

How Flies Move in 3D

A plain‑English walkthrough of motion on SE(3) (rotations + translations) and what the simulator is doing under the hood.

Hover here for connections.

🎮 Experience the Interactive Simulation

What if virtual creatures could move through space the way physics actually works?

The Problem That Hooked Us

Watch a fly navigate a room and you'll notice something remarkable: it doesn't just move forward, backward, left, and right. It banks, rolls, tumbles, and recovers, all while tracking a scent trail or dodging your hand. Its motion lives in a richer space than simple x-y-z coordinates can capture. Every pose of the fly is a combination of where it is and how it's oriented, and these two things are tangled together in a mathematically beautiful way.

This article is about building a swarm of virtual flies that respect that geometry. Each fly lives on $SE(3)$, the space of all rigid-body poses (rotations combined with translations). They propose small moves, score them against a landscape of attractors and repellents, and learn which kinds of moves work best. The result: structured, lifelike motion that emerges from three simple ingredients: symmetry, noise, and reward.

The Mathematics of Rigid Motion

To describe where a fly is and which way it's facing, we pack both pieces of information into a single matrix called a homogeneous transform. Think of it as a compact ID card: one part says "I'm rotated this way" and the other says "I'm located here."

$$T=\begin{pmatrix}R & t \\ 0 & 1\end{pmatrix}\in SE(3),\quad R\in SO(3),\ t\in\mathbb{R}^3.$$

Now, how does a fly take a step? It doesn't just add a displacement vector the way you might move a chess piece on a flat board. Instead, it works in the tangent space, a local flat patch that approximates the curved surface of $SE(3)$. The fly draws a small "twist" (six numbers encoding a tiny rotation and a tiny push) and then maps it back onto the manifold. This is the key move:

$$T_{t+1} \;=\; \exp(\xi_t)\,T_t.$$

The exponential map is what converts a twist in the flat tangent space into an actual rotation-plus-translation on the curved manifold. For the rotation part, it uses the elegant Rodrigues formula, which essentially says: "given an axis and an angle, here's the rotation matrix."

$$R(\omega)=\exp(\widehat{\omega})=I+\frac{\sin\theta}{\theta}\widehat{\omega}+\frac{1-\cos\theta}{\theta^2}\widehat{\omega}^2,\quad \theta=\|\omega\|,$$

The translation part is subtler. When something is simultaneously rotating and translating, the translation gets "bent" by the rotation. The $V(\omega)$ matrix below captures exactly this coupling, ensuring the fly traces a smooth screw-like path rather than an awkward rotate-then-jump.

$$\exp\!\begin{pmatrix}\widehat{\omega}&\rho\\0&0\end{pmatrix} = \begin{pmatrix} R(\omega) & V(\omega)\rho\\ 0 & 1 \end{pmatrix},\quad V(\omega)=I+\frac{1-\cos\theta}{\theta^2}\widehat{\omega}+\frac{\theta-\sin\theta}{\theta^3}\widehat{\omega}^2.$$

The beauty of this approach is that orientation and position evolve together, coherently, and the fly's state never "falls off" the manifold. There are no gimbal lock surprises, no angle-wrapping bugs, just clean geometry.

So where do the proposed moves come from? Each fly draws from a mixture of four strategies, and each one has a clear physical interpretation. The Haar-uniform component is pure wandering: it samples orientations that are truly random on the sphere of rotations, so the fly has no hidden bias toward any direction. The goal-seeking component points proposals toward the nearest attractor you've placed in the scene. The plume-following component sniffs out a diffusing chemical signal and steers uphill along its gradient. And a small exploration component adds random body-frame jitter, like a fly buzzing unpredictably. In the simulation, you can actually see these proposals as faint "ghost" lines extending from each fly, making the decision process visible.

Once the fly has a handful of proposed moves, it needs to decide which one is best. A lightweight critic scores each proposal by combining several signals into a single number:

$$R(T') \;=\; w_g\,\phi_{\mathrm{goal}}(T') + w_p\,\phi_{\mathrm{plume}}(T') - w_r\,\phi_{\mathrm{repel}}(T') - w_w\,\phi_{\mathrm{wall}}(T') - w_c\,\phi_{\mathrm{collision}}(T'),$$

In plain English: the fly wants to be near attractors and strong plume signals, and it wants to stay away from walls, repellents, and other flies. It picks the best-scoring proposal and takes that step. But it also does something cleverer: it remembers which strategy produced the winning move and nudges its internal preferences accordingly, using a simple policy-gradient update:

$$\Delta \alpha_c \;\propto\; \eta\,\big(\mathbf{1}\{c=c^\star\}-\pi_\alpha(c)\big)\,R(T^\star),$$

Here $\alpha$ are the component logits (raw preference scores) and $\pi_\alpha$ their softmax probabilities. Over time, strategies that consistently produce good moves gain influence, while those that don't quietly fade. It's a lightweight learning loop: no deep networks, no replay buffers, just a gentle tug on the mixture weights after every step.

Step back and look at what emerges. The flies are performing something close to Langevin dynamics on a symmetry manifold: random noise (uniform on the rotation group) plus a drift term shaped by the reward landscape you create. The bounding box acts like walls in a room, with elastic reflections. The plume field gives you an intuitive way to broadcast information: click somewhere and a chemical signal blooms outward, diffusing and decaying like smoke in still air. Because every update happens in the Lie algebra $\mathfrak{se}(3)$ and gets mapped back through the exponential, the simulation never cheats on the underlying physics. The same math powers robotic arm planners, protein docking simulations, and camera pose estimators.

What We Learned

Geometry isn't optional. Working directly on $SE(3)$ sidesteps a whole class of bugs and biases. Euler angles have gimbal lock. Quaternions need normalization hacks. The Lie group approach just works, and the Haar-uniform component ensures no direction gets secretly favored.

You don't need much structure to get rich behavior. Four simple proposal strategies, blended with manifold-respecting noise, produce surprisingly lifelike motion. There's no neural network here, no massive training run. The complexity lives in the geometry, not the model.

Making the prior visible changes everything. Those ghost lines showing proposed moves aren't just pretty: they turn debugging into observation. You can see when the flies are confused, when they're hugging a wall, when one strategy is dominating. Transparent AI starts with transparent proposals.

Fields are a natural language for shaping behavior. Rather than hand-coding policies ("if near wall, turn left"), you paint the world with attractors, repellents, and diffusing plumes. The flies figure out the rest. It's a surprisingly expressive way to program collective motion without writing explicit rules.

While this simulation is built for exploration and learning, the same ingredients show up in real systems: manifold-correct motion models in visual tracking, proposal distributions in 6-DoF pose estimation, and energy-shaping controllers for robot navigation. The deeper lesson is that symmetry, noise, and reward compose cleanly when you treat the state space as a group, not just a bag of numbers.

Under the Hood

Several pieces of mathematical machinery work together to make this happen. Every motion update flows through the SE(3) exponential map, so the group structure is never violated: no matter how wild the proposed twist, the result is always a valid rigid-body pose. Rotational exploration uses the Haar measure on $SO(3)$, which is the mathematically correct way to sample "uniformly at random" on the space of orientations (naively picking three Euler angles would cluster samples near the poles). When a fly is drawn toward an attractor, the directional pull comes from a von Mises-Fisher distribution, the spherical analogue of a Gaussian: peaked in the goal direction, with tunable concentration. The plume field evolves according to a diffusion-reaction equation discretized on a 3D grid, giving the environment a physical memory of past events. And the learning signal that reshapes each fly's strategy mix is a lightweight REINFORCE update, the simplest member of the policy gradient family.

Where This Goes Next

The ideas here reach well beyond virtual flies. In robotics, every six-degree-of-freedom motion planner faces the same geometry: a gripper's pose is an element of $SE(3)$, and planning a reach-and-grasp trajectory means finding a smooth path on that manifold. In computer vision, estimating where a camera is and which way it's pointing is literally a pose estimation problem on $SE(3)$, and proposal-based methods like particle filters use the same sample-and-score loop. Molecular dynamics treats rigid protein domains as elements of $SE(3)$ when docking them against each other. The growing field of generative models on Lie groups (diffusion models for 3D molecules, for example) relies on the same exponential-map machinery to ensure samples stay on the manifold. And multi-agent coordination, from drone swarms to warehouse robots, benefits whenever the agents respect geometric constraints rather than fighting them.

🎮 Interactive Exploration

Experience the concepts firsthand in our interactive simulation. Click to add attractors (gold), repellents (red), and plumes (blue). Watch how the ghost segments reveal the generative prior's proposals, and observe how the flies adapt their behavior through online learning.