The Missing Arrow: Distilling Agentic Hill-Climbing into Deterministic Code (Software 3.0 → 1.0)

A semi-formal essay, with SVG compression as a running example.

Code: https://github.com/maziars/svgym · Demo: https://maziars.github.io/svgym/ · Optimize your own SVG: https://maziars.github.io/svgym/app/

Abstract

A large, under-served class of software problems shares one shape: take an artifact, improve it on a measurable axis, and don't violate a measurable correctness or fidelity constraint. These problems are hard not because any single transformation is hard, but because the decision space (which transformation to apply, in what order, to this input) is a thicket of conditionals no one has had the incentive to map by hand. Large language models change the economics of this class, though not in the usual way. The claim is sharper than "AI writes the code for you": AI is most valuable as a discovery-and-distillation process whose expensive, stochastic output can be compiled down into cheap, deterministic, verifiable code. Both the model and the resulting pipeline do the same thing, hill-climbing toward a better artifact one verified step at a time. The model runs that climb agentically (stochastic, expensive); distillation turns it into a fixed, deterministic climb against the same verifier. In Karpathy's terms (1.0 code, 2.0 weights, 3.0 prompts), this is the missing return arrow, 3.0 → 1.0: program a model in natural language to emit the deterministic code that then makes the model unnecessary. The verifier is what makes it safe, because every step is gated by a measurable check, so the pipeline can guarantee output quality even though the process that produced it was unreliable. I develop this through SVGym, an SVG optimizer built this way, then generalize: the problem class it belongs to, the prior work it draws on, and the other domains where the recipe should apply.

A note on what this is, and isn't. This essay generalizes from one clean instance into a hypothesis, not a validated framework. It argues that a pattern exists and is worth a name, not that it has been proven across the domains below. It is also, fittingly, largely researched and drafted by an AI; take that as a demonstration or as a reason to check its homework. The references are there so you can.

1. SVGym: the case study

Why SVG. The problem has three properties that make it an ideal testbed. The input space is large and diverse (icons, flags, emoji, logos, charts, illustrations, animated and interactive graphics), so no single fixed recipe is best for all of it. There is already an excellent, ubiquitous baseline, SVGO, downloaded on the order of 30 million times a week and built into webpack, PostCSS and most front-end toolchains, which makes "beyond SVGO" a hard, meaningful bar rather than a straw man. And there is real appetite for pushing further by hand: SVGOMG, Jake Archibald's "SVGO's Missing GUI," exists so people can hand-tune SVGO file by file. A popular tool whose only purpose is manual per-file tuning is the tell: the per-instance gains are real and wanted, but capturing them has stayed manual. Because every property here is measurable, the same machinery can target render time or anything else, not just bytes.

SVGym began as bespoke work. For each SVG, an LLM read the markup, reasoned about waste, and proposed edits (collapse redundant path commands, drop invisible precision, merge paths, normalize transforms). That process was already safe: the model had SSIM and byte-size tools and verified its own edits before accepting them. What made it absurd as a product was the rest, namely a model call per file, non-determinism, latency, and cost that scaled with usage. The work that followed kept the safety and removed the expense, in a specific order:

Bespoke rollouts. The LLM solved files from scratch, each a one-off.
Generalization into functions. Across many rollouts, recurring moves appeared, and so did the analysis the model used to decide when a move applied. We had Claude lift both into deterministic functions: transforms like quantize_coordinates, merge_paths, and strip_metadata, paired with analysis functions like profile_path_commands, measure_real_precision, detect_mergeable_paths, and estimate_savings_per_technique. The pipeline ended up composing 42 such functions. One expensive, exploratory model became a library of free, deterministic operations.
A routing workflow. We then had it write the control logic: profile an input and decide which transforms to apply and in what order. This is the if/else thicket everyone avoids, tedious for a human and unreliable for an AI to run live; what makes it tractable is that every branch is cheap to check after the fact.
Quality gates. Every step is wrapped in measurement. Render before and after, compute the Structural Similarity Index (SSIM), a perceptual metric where 1.0 means pixel-identical (Wang et al., 2004), check the byte delta, and accept the change only if SSIM stays at or above 0.99. Otherwise revert.
AI as fallback, not engine. The deterministic workflow does the overwhelming majority of the work. A model call happens only when the automated path can't find a large enough lift, i.e. a novel case the library doesn't yet cover.

Concretely, the distilled pipeline looks like this. Nine detected features fan out into roughly 17 branch points across 8 ordered stages, and each of the ~30 tool applications passes the same verify-or-revert gate (render, score on SSIM and peak signal-to-noise ratio (PSNR), keep only if smaller and within quality). Two stages are small search loops in their own right (curve threshold 0.05→0.1; coordinate precision 2→1→0, keep the most aggressive that passes). It is exactly the kind of branchy, tedious control flow no one wants to hand-write, and the per-step gate is what makes it safe to generate.

Part 1: setup (run SVGO, then a lossless prepass).

flowchart LR
  IN["SVG"] --> F["detect<br/>9 features"]
  F --> C1{"CSS?"}
  C1 -- "no" --> SP["shapes<br/>from paths"]
  C1 -- "yes" --> I1{"animated /<br/>hover / script?"}
  SP --> I1
  I1 -- "yes" --> SK["skip SVGO<br/>and merges"]
  I1 -- "no" --> SV["SVGO<br/>verify or revert"]
  SK --> PP["lossless prepass<br/>15+ tools"]
  SV --> PP
  PP --> L{"lossless<br/>level?"}
  L -- "yes" --> RET["done"]
  L -- "no" --> NEXT["continue to<br/>lossy optimization"]
  classDef proc fill:#d2e3fc,stroke:#1a73e8,color:#202124;
  classDef dec fill:#fef7e0,stroke:#f9ab00,color:#202124;
  classDef good fill:#ceead6,stroke:#34a853,color:#202124;
  classDef neutral fill:#e8eaed,stroke:#9aa0a6,color:#202124;
  class IN neutral;
  class F,SP,SV,PP proc;
  class C1,I1,L dec;
  class SK neutral;
  class RET,NEXT good;

Part 2: lossy optimization, where every tool passes the verify-or-revert gate.

flowchart LR
  ST["structural<br/>feature-gated"] --> SI["simplify paths<br/>loop 0.05 to 0.1"]
  SI --> M{"interactive?"}
  M -- "no" --> MG["merge paths"]
  M -- "yes" --> RD["round coords<br/>loop 2 to 1 to 0"]
  MG --> RD
  RD --> S2["second pass"]
  S2 --> FG{"final<br/>quality ok?"}
  FG -- "no" --> FB["last good"]
  FG -- "yes" --> OUT["Optimized SVG"]
  FB --> OUT
  ST -. "every tool" .-> T
  subgraph G["per-tool gate, repeats about 30 times"]
    T["apply tool"] --> RG["render<br/>before/after"]
    RG --> CK{"quality ok?"}
    CK -- "keep" --> K["accept"]
    CK -- "revert" --> V["discard"]
  end
  classDef proc fill:#d2e3fc,stroke:#1a73e8,color:#202124;
  classDef dec fill:#fef7e0,stroke:#f9ab00,color:#202124;
  classDef good fill:#ceead6,stroke:#34a853,color:#202124;
  classDef neutral fill:#e8eaed,stroke:#9aa0a6,color:#202124;
  classDef bad fill:#fad2cf,stroke:#ea4335,color:#202124;
  class ST,SI,MG,RD,S2,T,RG proc;
  class M,FG,CK dec;
  class FB neutral;
  class OUT,K good;
  class V bad;
  style G fill:#f8f9fa,stroke:#dadce0,color:#202124;

Figure: SVGym's deterministic pipeline. Each stage applies several tools; every tool passes the verify-or-revert gate shown at the bottom (~30 times per run). The diamonds are feature-driven branches that decide which tools are even safe to attempt.

Abstract

1. SVGym: the case study

2. What it achieves, and why