antifold

Theory

Graph Theory: A Practical Guide

From your brain's wiring to Google's search algorithm, graphs are everywhere. This guide builds intuition from the ground up: what graphs are, why they matter, and how a handful of powerful ideas (paths, cuts, trees, eigenvalues) unlock problems across computer science, physics, and machine learning.

Why graphs?

Your brain has roughly 86 billion neurons, each connected to thousands of others. The internet links billions of devices. A social network maps who knows whom across the planet. What do these systems have in common? They're all graphs: collections of things (nodes) and the connections between them (edges). Graphs are arguably the most universal data structure in mathematics, and learning to think in graphs gives you a surprisingly powerful lens on the world.

Formally, a graph $G=(V,E)$ has vertices $V$ and edges $E\subseteq V\times V$ (undirected if $(u,v)\in E \Rightarrow (v,u)\in E$). We can represent it as an adjacency matrix $A\in\{0,1\}^{|V|\times|V|}$, where $A_{uv}=1$ if there's an edge from $u$ to $v$. This matrix encoding unlocks a beautiful trick: raising $A$ to the $k$-th power gives you $(A^k)_{uv}$, the number of distinct length-$k$ walks from $u$ to $v$. Add weights to the edges and you can model costs, capacities, or affinities.

Local structure

The simplest question you can ask about a node is: how many connections does it have? That number is its degree, $\deg(u)=\sum_v A_{uv}$. In a social network, degree is your friend count. In a power grid, it's how many transmission lines meet at a substation. Collecting all degrees into a diagonal matrix $D=\operatorname{diag}(\deg(u))$ will become surprisingly useful later when we get to spectral methods.

Zoom out a bit and you start seeing paths (sequences of edges connecting two nodes) and cycles (paths that loop back to the start). A graph with no cycles at all is called a tree, and trees turn out to be the simplest connected structures: they use exactly $|V|-1$ edges to hold everything together, with no redundancy.

Some graph families are worth knowing by name. Directed acyclic graphs (DAGs) model dependencies and allow topological sorting, which is why your build system and spreadsheet both use them. Bipartite graphs split nodes into two groups where edges only cross between groups (think students and courses, or users and products). Planar graphs can be drawn flat without any edge crossings, a constraint that matters for circuit layout and map coloring.

Traversal and shortest paths

Every time you ask Google Maps for directions, a shortest-path algorithm runs on a graph where intersections are nodes and roads are weighted edges. The fundamental technique is breadth-first search (BFS): start at a source node and explore outward in concentric layers, like ripples on a pond. In an unweighted graph, BFS gives you the shortest path for free, because the first time you reach a node is guaranteed to be along the fewest edges.

When edges have different costs (some roads are longer than others), you need Dijkstra’s algorithm, which always expands the cheapest known frontier node next. It’s the workhorse behind most real-world routing. And if some edges have negative weights (think of them as rewards rather than costs), Bellman-Ford handles that case, and even detects negative cycles: loops you could traverse forever to accumulate infinite reward, which signals that "shortest path" is undefined.

Spanning trees and cuts

Imagine you're a telecom company that needs to connect ten cities with fiber optic cable. You could connect every pair, but that's expensive. The cheapest network that still keeps everyone connected is a minimum spanning tree (MST): a tree that touches all $|V|$ vertices using exactly $|V|-1$ edges, chosen to minimize total weight.

Two classic algorithms find MSTs. Kruskal's approach is beautifully simple: sort all edges by weight, then greedily add the cheapest one that doesn't create a cycle. Prim's algorithm grows a single connected frontier outward, always reaching for the cheapest edge that connects a new node. Both work because of a deep property called the cut property: for any partition of the vertices into two groups, the lightest edge crossing between them must belong to some MST.

Cuts themselves are fundamental. A cut partitions the vertices into two sets $S$ and $\bar{S}$, and its capacity is the total weight of edges crossing between them. The celebrated max-flow min-cut theorem says that the maximum amount of "stuff" you can push from a source $s$ to a sink $t$ through a network equals the capacity of the smallest bottleneck (minimum cut) separating them. It's one of the most satisfying dualities in all of combinatorics.

Spectra: smoothness and clusters

Here's where graph theory gets unexpectedly deep. Think of a guitar string: its vibration modes are sine waves of increasing frequency, and each mode has an associated frequency (eigenvalue). Graphs have an analogous notion. The graph Laplacian $L=D-A$ plays the role of the “vibration operator,” and its eigenvalues and eigenvectors reveal the graph's fundamental modes of variation.

To see why, imagine assigning a number $x_u$ to each node (a temperature, a label, a signal). The Dirichlet energy

$$\mathcal{E}(x)=x^\top L x = \frac{1}{2}\sum_{(u,v)\in E} w_{uv}\,(x_u-x_v)^2$$

measures how much the signal “jumps” across edges. When connected nodes have similar values, the energy is low: the signal is smooth over the graph. When neighbors disagree sharply, the energy is high.

The smallest eigenvalue of $L$ is always $0$, with eigenvector $\mathbf{1}$ (the constant signal, which is perfectly smooth). The second-smallest eigenvector, called the Fiedler vector, is the smoothest non-trivial signal on the graph, and it turns out to approximately split the graph into two communities with a small cut between them. This is the foundation of spectral clustering: let the graph's own geometry tell you where the natural divisions are.

Random walks and diffusion

In the late 1990s, two Stanford grad students had a radical idea: instead of judging a webpage by its content, judge it by the structure of links pointing to it. Imagine a "random surfer" who starts on a random page and keeps clicking random links. The pages where this surfer spends the most time are, intuitively, the most important. That's PageRank, and it's a random walk on a graph.

A random walk on a graph moves from node $u$ to a neighbor $v$ with probability proportional to the edge weight $w_{uv}$. The dynamics are governed by normalized operators like $L_{\text{rw}}=I-D^{-1}A$, which describe how a signal (or a probability distribution, or heat) diffuses across the network over time. Many of the most useful centrality measures, the ones that identify "important" nodes, are walk-based: degree centrality (who has the most connections), betweenness centrality (who sits on the most shortest paths), closeness centrality (who can reach everyone fastest), and of course PageRank itself.

Graphs in algorithms and learning

Graphs are everywhere in computer science, often hiding in plain sight. Every program has a control-flow graph where nodes are basic blocks and edges are branches. Finite automata (DFAs and NFAs) are directed graphs where reachability answers the question "does this machine accept this string?" Compilers, model checkers, and regex engines all run graph algorithms under the hood.

In optimization, graphs are the natural language for flows (routing goods through a supply chain), matchings (pairing job applicants to positions), and network design (building infrastructure at minimum cost).

Most recently, graph neural networks (GNNs) have brought graph thinking into deep learning. The core idea is message passing: each node updates its representation by aggregating information from its neighbors, which is really just a learned, weighted version of multiplying by $A$ or $D^{-1}A$. This powers drug discovery (molecules are graphs), social network analysis, and recommendation systems.

Planarity and Euler’s formula

Can you draw a graph on a flat sheet of paper without any edges crossing? If so, it’s planar, and it obeys one of mathematics’ most elegant identities: $V - E + F = 2$, where $F$ counts the faces (regions) of the drawing, including the outer infinite face. This is Euler’s formula, and it places hard limits on how many edges a planar graph can have.

Not every graph is planar, and Kuratowski’s theorem tells you exactly when: a graph is non-planar if and only if it contains a subdivision of $K_5$ (five nodes, all pairs connected) or $K_{3,3}$ (two groups of three, fully cross-connected). These constraints matter for circuit board layout (VLSI design), where crossing wires means adding expensive extra layers, and for graph visualization, where a planar layout is the cleanest way to display structure.

A compact toolbox

If you take away one thing from this article, let it be this mental toolkit. Two matrices capture most of what you need: the adjacency matrix $A$ for counting paths and combinatorial structure, and the Laplacian $L$ for smoothness, diffusion, clustering, and cuts. Two traversal strategies, BFS and DFS, handle reachability, connected components, and topological ordering. For optimization, you have Dijkstra and Bellman-Ford for shortest paths, Kruskal and Prim for minimum spanning trees, and Edmonds-Karp or Dinic for max-flow. And when you need to find hidden communities or propagate labels across a network, spectral methods turn graph structure into geometry you can cluster.

Try It: Watch BFS Explore a Graph

Click any node to make it the source, then hit Step or Run to watch BFS ripple outward. Each color ring is one "hop" further from the source.
Notice how BFS discovers nodes in perfect layers, like concentric circles. Each layer contains all nodes at exactly the same distance from the source. The thin arrows trace the shortest-path tree: if you follow the arrows backward from any node, you get the shortest route back to the source. Mathematically, the nodes at distance $k$ correspond to the support of $A^k \mathbf{e}_s$ not seen at earlier distances.
Frontier size: Visited:

Try It: Build a Minimum Spanning Tree

Edge weights are proportional to their screen length. Watch Kruskal’s algorithm consider edges from cheapest to most expensive: green means "accepted into the tree," gray means "rejected (would create a cycle)," and amber highlights the edge currently under consideration.
The algorithm is greedy yet optimal. It always picks the cheapest available edge that doesn’t form a cycle, and this works because of the cut property: for any way you split the nodes into two groups, the cheapest crossing edge is guaranteed to belong to some MST. The total weight shown is the minimum possible cost to keep all nodes connected.
Total weight: 0Edges chosen: 0

Try It: Diffuse Heat Across a Graph

Click nodes to plant heat sources (red) or cold sinks (blue). The remaining nodes will gradually equilibrate, smoothing their values along edges. Hit Run to watch the temperature field evolve in real time.
This is the Laplacian in action. Each step performs $x \leftarrow x - \alpha L x$, which is gradient descent on the Dirichlet energy $\mathcal{E}(x)=x^\top L x$. In physical terms: each node averages itself toward its neighbors. Fixed seeds act as boundary conditions (like holding one end of a metal bar in a flame). The colors (blue through white to red) show the evolving temperature at each node.
Connections

How this article plugs into other pieces on the site: