The Minimization Visualizer

Theory Reference

Deterministic Finite Automata

The formal model underlying all of this.

A DFA is a 5-tuple (Q, Σ, δ, q₀, F) where every component has a precise role:

Example: DFA accepting strings ending in 'b'

Key rule: δ must be total — every state must have exactly one outgoing transition per symbol. No missing arrows, no choices.

Historical Context & Limitations

The concept of the finite automaton was first introduced by neurophysiologists Warren McCulloch and Walter Pitts in 1943. They developed a mathematical model of neural networks that was later refined by computer scientists like Michael Rabin and Dana Scott natively into what we now recognize as finite state machines. In 1959, Rabin and Scott introduced the concept of Non-deterministic Finite Automata (NFAs) alongside DFAs, and proved the astonishing theorem that they are equivalent in computing power: anything an NFA can compute, a DFA can compute too (via the powerset construction algorithm).

Despite their elegance, DFAs are profoundly restricted computing models. A DFA possesses exactly zero memory outside of its current state. Unlike an unbounded Turing Machine, or even a Pushdown Automaton which operates with a stack, a DFA cannot maintain a count of indeterminate values. This makes them incapable of recognizing non-regular languages such as aⁿbⁿ (the language of matching parentheses) because they cannot remember how many 'a's they've seen dynamically.

Closure Properties

DFAs are exceptionally mathematically well-behaved. The class of regular languages (languages recognized by DFAs) is structurally closed under several fundamental operations:

Union: If L1 and L2 are regular, L1 ∪ L2 is regular. We can construct a combined machine using the cross-product construction.
Intersection: If L1 and L2 are regular, L1 ∩ L2 is regular. The same cross-product construct handles this natively.
Complement: If L is regular, so is its complement. To achieve this, simply take the DFA that recognizes L and invert all accept and non-accept states!
Concatenation & Kleene Star: NFAs make it trivial to prove that these operations produce regular languages, and consequently, DFAs accept them as well.

Understanding the DFA bounds the notion of "regular expression computing" used in modern software engineering (although modern PCRE tools add back-references which push them slightly out of the realm of pure theoretical DFAs).

Why Minimize?

Multiple DFAs, one language — but only one is the smallest.

Many different DFAs can recognize the same language. Some have redundant states doing identical work. The minimal DFA is the unique most efficient one.

Same language, fewer states:

Formal definition. States p and q are distinguishable if ∃ string w such that exactly one of δ*(p,w), δ*(q,w) ∈ F. Otherwise they are equivalent and can be merged.

The minimal DFA is unique up to state renaming — guaranteed by the Myhill–Nerode theorem. This means that if you construct the smallest possible Machine A for a language, and your colleague constructs the smallest possible Machine B for the exact same language, Machine A and Machine B are isomorphic. You can map every state in A to a state in B symmetrically.

Real-World System Optimization

Why do we care about merging redundant states? Finite automata are rarely just textbook concepts; they are used explicitly inside microprocessors as hardware controllers (sequencers) and inside compilers as lexical analyzers (lexers). A redundant state in software means a larger transition table in memory. A redundant state in hardware logic means wasted logic gates, higher heat production, and increased silicone manufacturing costs.

When you supply a Regular Expression (RegEx) to an engine like grep or awk, they parse your string into an NFA, determinize it into a DFA using powerset construction (which frequently blows up the state count massively), and then they optionally minimize it to ensure memory limits aren't exceeded and traversal speed remains mathematically optimal.

A Note on NFA Minimization

If a DFA can be minimized, what about an NFA? Is there a unique, smallest NFA for a given language? The fascinating contrast here is a resounding no.

Unlike DFAs, the minimal NFA is not unique. You might find two entirely distinct NFAs that have the same minimal number of states for the same language, but they cannot be mapped to each other straightforwardly.
While DFA minimization is efficiently polynomial—most commonly bounded at O(n log n) scaling—NFA minimization is an incredibly stubborn PSPACE-complete problem. In practical terms, this means optimizing an NFA state-for-state approaches incomputable resource costs for large systems.

Therefore, the robust strategy enforced universally is to sacrifice intermediate time: convert the NFA to a DFA first, taking the temporary spatial hit, and then execute the wildly efficient polynomial DFA minimizer algorithm presented here.

Myhill–Nerode Theorem

The theorem that makes minimization provably correct.

Myhill–Nerode relation ~_L:

x ~_L y ⟺ ∀z ∈ Σ* : (xz ∈ L ↔ yz ∈ L)

Two strings x and y are equivalent if every possible continuation z leads to the same accept/reject outcome for both.

Each class of strings becomes one state:

The theorem: L is regular ↔ ~_L has finitely many classes. The number of classes = number of states in the minimal DFA.

This proves uniqueness: any two minimal DFAs for L must have the same number of states and be isomorphic.

The Three Equivalences of Myhill-Nerode

The Myhill-Nerode theorem (1958) establishes that the following three propositions are strictly logically equivalent for any language L:

L is regular (i.e., recognized by some DFA).
L is the union of a set of equivalence classes of some right-invariant equivalence relation of finite index.
The canonical indistinguishability relation ~_L has a finite index.

Right-Invariance & The Coarsest Relation

An equivalence relation R on strings is right-invariant if x R y implies xa R ya for all characters a in the alphabet. If you think of an equivalence class as a "state memory", right-invariance guarantees that adding a character transitions both strings entirely deterministically into the same new "state memory".

While many such relations can exist for L, ~_L is the coarsest right-invariant equivalence relation that refines L. This means any other valid relation R simply subdivides the equivalence classes of ~_L (yielding a machine with more states). Consequently, the classes of ~_L perfectly define the absolute minimum number of states required to build the DFA.

Constructing the Minimal DFA from ~_L

The beauty of the Myhill-Nerode theorem is that it doesn't just promise a minimal DFA—it gives you the exact blueprint to build it algebraically, without needing an initial NFA or DFA:

States (Q): The set of equivalence classes of ~_L, denoted as [x].
Start State (q₀): The class containing the empty string, [ε].
Accept States (F): The classes [x] where x ∈ L.
Transition Function (δ): Defined intrinsically by right-invariance as δ([x], a) = [xa].

Proving Non-Regularity

The Myhill-Nerode theorem provides an exactly necessary and sufficient condition for regular languages, making it a stronger mathematical weapon than the Pumping Lemma. To prove a language is non-regular, you simply identify an infinite sequence of prefixes where no two prefixes can belong to the same equivalence class.

For instance, in the language L = { aⁿbⁿ | n ≥ 0 }, consider the prefixes aⁱ and a^j where i ≠ j. If you append the suffix z = bⁱ, you get aⁱbⁱ (which is inside L) and a^jbⁱ (which is outside L). Because they differ on suffix z, every count of a's forms its own distinct equivalence class. Ergo, ~_L has an infinite index, which proves mathematically that no finite state automation can ever compute L.

Bridging to the Table-Filling Algorithm

The Myhill-Nerode theorem states that states are equivalent if and only if they behave identically on all future inputs. The Table-Filling Method is the algorithmic realization of this theorem. It starts by assuming all states are equivalent, and systematically "marks" pairs that violate the Myhill-Nerode indistinguishability condition. When it terminates, the unmarked pairs represent the precise equivalence classes dictated by the theorem.

Table-Filling Algorithm

Practical O(n²|Σ|) algorithm to find all distinguishable pairs.

Minimization of DFA - Table Filling Method

(Myhill-Nerode Theorem)

Steps:

Draw a table for all pairs of states (P,Q)
Mark all pairs where P ∈ F and Q ∉ F
If there are any Unmarked pairs (P,Q) such that [δ(P,x), δ(Q,x)] is marked, then mark [P,Q] where 'x' is an input symbol
REPEAT THIS UNTIL NO MORE MARKINGS CAN BE MADE
Combine all the Unmarked Pairs and make them a single state in the minimized DFA

Visualizing the mechanics step-by-step:

Base Case

Mark every pair (p,q) where exactly one is an accept state. They are trivially distinguishable by ε.

Propagation — repeat until stable

For unmarked (p,q) and each a ∈ Σ:
if (δ(p,a), δ(q,a)) is marked → mark (p,q)

Merge

Every unmarked pair → those states are equivalent. Group into equivalence classes using Union-Find. Each class becomes one state in the minimal DFA.

State	on a	on b	Type
→ q0	q1	q2	non-accept
* q1	q0	q3	accept
q2	q3	q0	non-accept
* q3	q2	q1	accept

The Minimization
Visualizer

Define the Automaton

🔢 RegEx Mode

📋 Table Mode

🎨 Draw Mode

Regular Expression

Live DFA Structure

The Original Topology

Marking Base Pairs

Iterative Propagation

Equivalence Classes

Equivalence Classes Found

The Minimized Automaton

Original States

Minimized States

Test Equivalence

String Traversal:

Deterministic Finite Automata

Historical Context & Limitations

Closure Properties

Why Minimize?

Real-World System Optimization

A Note on NFA Minimization

Myhill–Nerode Theorem

The Three Equivalences of Myhill-Nerode

Right-Invariance & The Coarsest Relation

Constructing the Minimal DFA from ~_L

Proving Non-Regularity

Bridging to the Table-Filling Algorithm

Table-Filling Algorithm

Minimization of DFA - Table Filling Method

Worked Example

The Minimization Visualizer

Define the Automaton

🔢 RegEx Mode

📋 Table Mode

🎨 Draw Mode

Regular Expression

Live DFA Structure

The Original Topology

Marking Base Pairs

Iterative Propagation

Equivalence Classes

Equivalence Classes Found

The Minimized Automaton

Original States

Minimized States

Test Equivalence

String Traversal:

Deterministic Finite Automata

Historical Context & Limitations

Closure Properties

Why Minimize?

Real-World System Optimization

A Note on NFA Minimization

Myhill–Nerode Theorem

The Three Equivalences of Myhill-Nerode

Right-Invariance & The Coarsest Relation

Constructing the Minimal DFA from ~L

Proving Non-Regularity

Bridging to the Table-Filling Algorithm

Table-Filling Algorithm

Minimization of DFA - Table Filling Method

Worked Example

The Minimization
Visualizer

Constructing the Minimal DFA from ~_L