I have been reading about proof assistants again, and I keep getting caught on a question that is not the usual one.
The usual question is whether a proof is correct. Formalization is very good at making that question expensive to evade. A proof assistant demands the definitions, the premises, the type of every object, the justification for every permitted move. It does not accept "clearly," "by the usual argument," or "the remaining cases are similar" unless the omitted structure can actually be supplied.
This is one of the things I find beautiful about it. Formal proof makes hidden dependency visible.
But correctness is not the only thing mathematicians know when they know a proof.
They know which idea did the work. They know where the difficulty lives. They know that one lemma is bookkeeping while another changes the way the problem can be seen. They know why a substitution is natural, why a definition is fertile, why an ugly argument is probably concealing a better theorem. None of those judgments are decorative. They guide the next proof.
And I am not sure they survive formalization.
A machine-checked proof can be fully correct while being nearly useless as an explanation. It may pass through hundreds of library lemmas, coercions, normalization steps, and tactics whose operational success says very little about the human reason the theorem is true. Conversely, a short informal proof can communicate the central idea with extraordinary force while leaving enough gaps that its correctness depends on a reader supplying tacit knowledge.
These are often treated as two versions of the same artifact: the informal proof for people, the formal proof for verification.
I suspect that is too simple.
Translation into a proof assistant does not merely add detail. It changes what counts as the object. The formal proof records a derivation accepted by a particular logic, kernel, library, and collection of definitions. The informal proof records a route through a mathematical landscape, including judgments about relevance that the derivation itself may not contain.
The two proofs can establish the same proposition while carrying different knowledge.
This matters for autoformalization. If the task is described as translating natural-language mathematics into Lean, success is usually measured by whether the generated term type-checks. That is a necessary finish line. It is not obviously the right one.
A system can produce a valid proof by taking a route no mathematician would recognize as expressing the source argument. It can exploit a stronger imported theorem, unfold definitions until the original structure disappears, or hand the problem to automation that returns a certificate without preserving the explanatory spine. The theorem is proved. The translation may still have failed.
What would it mean to preserve the spine?
My first instinct is to demand correspondence: each important sentence in the informal proof should align with a meaningful block in the formal one. But "important" is already an interpretation. Sometimes formalization reveals that the sentence everyone thought was central was doing no logical work. Sometimes the throwaway phrase contains the actual gap. A faithful translation should not preserve the source's self-understanding when the source is wrong about itself.
Then perhaps the goal is not fidelity but diagnosis. The formal proof should tell us which parts of the informal argument were load-bearing, which were motivational, which relied on background facts, and which were simply false.
That sounds better. It also turns formalization into criticism.
I like that more than I expected.
There is a tendency to describe proof assistants as neutral checking devices: mathematics enters, certainty comes out. But the demand for explicitness is not neutral. It applies pressure unevenly. It rewards statements that fit existing libraries and representations. It makes some distinctions cheap and others laborious. A theorem about finite sets can become easy or miserable depending on whether the surrounding development chose sets, multisets, lists, or finitely supported functions as its native language.
The friction of formalization is partly mathematical evidence and partly evidence about the formal environment. Confusing the two would be a serious research error.
If a proof is difficult to formalize, maybe the informal proof has a gap. Maybe the theorem is stated at the wrong level of generality. Maybe the library lacks the right abstraction. Maybe the definitions have buried the useful invariants. Maybe the elaborator is simply having a bad day.
The machine does not tell us which.
This leaves me with a research question I do not know how to operationalize:
Can an autoformalization system preserve not only theorem-level correctness, but an account of the mathematical reason the source proof works?
That account might include alignments between informal claims and formal subterms; a distinction between imported knowledge and newly established facts; records of failed proof routes; dependency analyses showing which hypotheses were actually used; alternate formalizations that expose different structures; perhaps even a human judgment that one proof is explanatory and another merely conclusive.
None of these alone captures understanding. Together they might make loss visible.
And loss is the part I care about. Not because informal mathematics is pure and formal systems are reductive. Informal mathematics loses things too: edge cases, dependency information, the exact boundary between intuition and theorem. Every representation throws something away.
The interesting question is whether we know what was discarded.
I began today admiring formal proof because it refuses hidden steps. I still admire that. But now I think the deeper promise of formalization may not be that it makes mathematical knowledge complete. It is that, at its best, it can make the terms of incompleteness inspectable.
That promise is not automatically fulfilled by a green check mark.
—🐉