FontsPack: 4.2MB → 749KB

The impact

Packed 29 fonts totaling 4.2MB into a single 749KB asset - an 82% reduction.
Note: Long read, heavy on jargon - glossary at the bottom. You'll want a tea ☕.

Where this came from

I have an Android app called TexWalls - text wallpapers, ~520K downloads. Users pick a display font and render it as a wallpaper, so the font picker is the whole point.

Shipping 25+ fonts in the APK isn't the best approach. So I built FontsPack: a Gradle plugin that packs them at build time, and an Android runtime that decodes them on first launch. Numbers here come from a sample app running the full flow.

The corpus: 29 fonts across sans-serif, serif, display/script, and Devanagari. The variety turned out to matter a lot.

You might ask - Why not just use Play Asset Delivery?

Google Play only - no path for non-GMS devices, AOSP forks, or anything off the Play ecosystem. Also leans on an install-time download that's flaky on spotty networks. In-APK decode here is faster than a network round-trip when the bytes can live in the app already.

Metric	Value
Raw corpus	~4.2MB
Packed	~749KB
Reduction	82.1%
Cold decode	~115ms sample median (parallel pool decode)
Fonts	29

The rest of this post is about the compression. The runtime API and how the numbers were measured are covered near the end.

The progression

I'll be honest, this didn't all land in one shot. It moved through four phases and across months on and off between this project and others:

Phase 1: Architecture decision (Format A vs B)
Phase 2: Building the optimizers, one at a time
Phase 3: Tuning LZMA2 and squeezing the manifest
Phase 4: Pushing for sub-700KB and hitting the wall

Phases 1-3 happened progressively. Phase 4 is where most of the interesting stories live.

Phase 1 - Picking the architecture

The first decision was how to lay out the pack.

Format A: each font becomes its own WOFF2 blob with Brotli, concatenated into the pack. Simple. Modular. But each font compresses in isolation, so Brotli can't find patterns across fonts.

Format B: split every glyph in every font into 9 typed streams - contour counts, point counts, flags, coordinate triplets, and so on. Then pool each stream across all 29 fonts before compression. The compressor now sees long runs of similar data and finds cross-font matches automatically.

Format B won, by ~7%. Cross-font dedup is a thing no per-font compressor can match - which is the whole point.

While I was at it, I swapped Brotli for LZMA2 and got another ~2-3% on the pooled blob. LZMA2 is slower at decode (range coding with richer context modelling beats Brotli's LZ77+Huffman on this kind of data), but font unpack is a one-time first-launch cost - totally fine to spend a few hundred ms there once.

Phase 2 - Table optimizers

A font isn't just glyphs. Every TTF carries a name table, cmap, hhea, head, post, optionally GPOS/GSUB/GDEF layout tables, hmtx metrics, kern legacy kerning, fpgm/prep/cvt hinting bytecode, and more.

Most of it can be losslessly transformed into something the compressor likes better. So I wrote optimizers, one at a time:

Optimizer	What it does
Kern strip	Drop `kern` when `GPOS` is present - `kern` is obsolete then
Aggressive strip	Remove `hdmx`, `VDMX`, `LTSH`, `FFTM`, `DSIG`, `morx`, `STAT`, `feat`, `gasp`
Dehinting (opt-in)	Zero out `fpgm`, `prep`, `cvt`, per-glyph instructions
Implied points drop	Drop on-curve points the renderer can reconstruct from neighbours
Name table trim	Keep only the small set of name IDs needed by this Android pipeline
Post table 2 → 3	Convert `post` to the minimal "no glyph names" format
Cmap cleanup	Drop Macintosh records, dedupe identical encodings
GPOS PairPos rewrite	Convert sparse class-matrix PairPos to denser per-pair format
GPOS value tighten	Tighten `ValueFormat` bitmasks where fields are always zero
Hmtx LSB drop	Remove redundant LSB array where `LSB == glyf.xMin`

Each one is a small targeted pass - small enough to land in a single PR with before/after numbers in CI.

Phase 3 - Tuning the compressor itself

The default LZMA2 settings aren't optimal for this kind of data. The literal coder takes three context parameters - lc, lp, pb - that control how it predicts each byte.

A custom sweep on the actual joined pool blob landed at lc=2, lp=0, pb=1, niceLen=273. Better than the defaults, not by a lot.

Two more wins in this phase:

Compressed manifest emission: delta-encode the per-font index, XZ-compress it, keep whichever is smaller.
Sparse instrLen encoding: per-glyph instruction-length values are mostly zeros after dehinting. A sparse format reused the existing decoder.

End of Phase 3: ~749KB. And I left this one at this point. But after a month, I came back and that's where things got a bit more interesting, well in terms of learnings.

Phase 4 - Trying to break below 700KB

The question: can we keep going? I am always greedy for sizes and performance - you might've noticed from the Docker post.

I ran the heaviest production-tier lossless compressor I could find - zpaq, with its m=56 maximum context-mixing setting - against the joined raw pool blob. That's the ~1.8MB of bytes that come out of all the table optimizers and the WOFF2 stream split, just before LZMA2 takes over. Same input LZMA2 sees, so the comparison is apples-to-apples.

shell

$ zpaq a ~/fontspack/entropy/joined.zpaq joined_raw.bin -m56

Encode took ~4 seconds. LZMA2 takes around 200ms on the same input.

The number it produced: ~720KB. If zpaq barely compresses my data further, nothing I'd reasonably ship on Android is likely to beat it by much.

So LZMA2's output on the pool blob was sitting 3.5% above the practical entropy floor - about 26KB of headroom on the blob itself. The shipping pack adds a small amount of manifest overhead on top, putting the final asset around ~29KB above the floor. Either way: tiny.

That framing changed everything.

Things I tried that didn't work

A tour of the experiments. Each was a "what if?" backed by code, measurement, and an answer I didn't want.

1. Predictive glyph encoding

TrueType stores point coordinates as deltas from the previous point. The idea: store the change in delta instead - the acceleration of the pen as it draws. For straight line segments, this would produce long runs of zeros. Compressible.

I built it. Compiled it through the pipeline.

The pack got bigger by 12KB. 🤡

The reason was subtle. The pipeline pools fonts together into one LZMA2 stream. LZMA2's strength is finding patterns across the whole window. When I made the encoding adaptive - apply it only to fonts where it helped per-font - the pool now contained two different encoding styles mixed. LZMA2 stopped finding the cross-font byte patterns it relied on.

The per-font win was real. The pool-level loss was bigger. In a multi-input compression scheme, uniformity matters - smart per-item optimizations can hurt the aggregate.

2. Cross-font glyph dedup

Surely .notdef, space, common digits appear in multiple fonts byte-for-byte?

I hashed every glyph across the corpus. 15,792 glyphs, 188 duplicate groups, 22KB raw. After LZMA2's 2MB sliding window already finds most of these, the net potential saving was about 2KB compressed.

Even the space glyph varies in declared bounding boxes and table-internal offsets across fonts. Of course, it does. 🙃

3. GPOS / GSUB sub-table dedup

OK, fine. Surely GPOS kerning tables share structure across fonts that all kern Latin letters?

I walked every Coverage table, every ClassDef, every Lookup sub-table in every font.

Not a single byte-identical match across Coverage tables, ClassDefs, or Lookups. Across all three structure types, across all 29 fonts, nothing.

GSUB Lookups were the only exception - 45 byte-identical groups totaling ~2KB raw, and LZMA2's sliding window was already finding most of that, netting maybe ~200 bytes. Doesn't move the needle.

Coverage tables reference glyph IDs by number. Every font assigns its own glyph IDs. Two fonts can implement the exact same semantics - "kern AV tighter, kern To tighter" - and produce completely different bytes because their glyph IDs differ.

4. Glyph-ID canonicalization

This was the idea I had most hope for.

What if I renumbered every font's glyph IDs to a canonical Unicode-derived space? Now any two fonts that implement the same OpenType feature on the same characters produce identical bytes. LZMA2 dedupes them across the pack.

I wrote the analysis tool to find out how much it'd save.

Zero bytes.

Even after maximum semantic abstraction - comparing tables by which Unicode codepoints they cover rather than by glyph IDs - no two fonts in the corpus had identical sub-tables. The fonts are too genuinely different from each other.

That measurement was the most damning thing I found. The corpus has no shared structure to exploit, at any abstraction level.

5. Trained compression dictionaries

Standard technique. Train a dictionary of common patterns, ship it to encoder and decoder.

I trained a 128KB zstd dictionary on 14 of the 29 fonts, then measured how well it compressed the disjoint 15 it had never seen.

Saving on the disjoint test set: ~600 bytes. Effectively zero.

The dictionary captured patterns specific to its training fonts and didn't generalize. With 29 unrelated fonts, that's the expected outcome.

6. OpenZL

Meta's open-sourced OpenZL - a "format-aware compression framework". You describe your data's structure, an offline trainer finds optimal transforms, the encoder embeds a decode recipe in each frame. Headline: 2.06x ratio vs zstd's 1.31x on tabular data.

I spent a weekend on this. Built OpenZL from source, wrote SDDL (schema description language) for the pools, ran the trainer.

Approach	Size	Delta
Untrained, joined blob	~837KB	+90KB vs LZMA2 on joined blob (~747KB)
Trained, `serial` profile	~840KB	+94KB vs same LZMA2 baseline
SDDL `Int16BE[]`, bbox pool	~13KB	+2.3KB vs xz on bbox pool
FunctionGraph + DispatchSerial, triplets	~824KB	+233KB vs xz on flags+glyphs (~591KB)

OpenZL's back-end is in the same family as zstd - LZ77 with an ANS-style entropy stage. On these specific font binary streams that family runs 2-5% behind LZMA2's range-coded context modeling. The structural transforms in front of the back-end do narrow that gap (untrained OpenZL is +90KB worse, the SDDL Int16BE[] on bbox closes it to just +2.3KB on that one pool) - they just don't close it.

OpenZL is genuinely impressive engineering. It's just optimized for data shapes I don't have - numeric tables, ML tensors, columnar warehouse data. Font binary streams aren't its killer use case.

Second opinion

Halfway through this I ran the OpenZL questions past codex. It traced the SDDL compiler source and confirmed the variable-width-record limitation I hit was an architectural decision in the semantic analyzer, not a doc oversight. Same conclusion on the back-end gap. Independent confirmation that I wasn't missing something obvious.

What actually worked

Across the entire Phase 4 investigation, three things produced real measured wins:

Compressed manifest emission: ~1.6KB
Sparse instrLen modelling: 36 bytes
other pool dedup cleanup: not really a compression win, just less allocation churn at build time

Combined offline measurement: ~749.5KB → ~747.9KB. About 1.6KB.

None of those landed in the shipping pipeline though. They were too small to justify the format-flag churn and the extra risk surface against ~1.6KB of payload. Shipping pack stays at ~749KB.

That's a full investigation cycle. Roughly 0.2% achievable for a substantial amount of measurement work. lmao.

Runtime API and benchmarks

The runtime ships with an API for the case this kind of app has - opening a screen with a specific font in view means that font needs to be decoded immediately and the rest can come in the background:

kotlin

// decode priority fonts first, return as soon as each is ready
FontsPack.unpackOnly(context, "lobster", "raleway", "pacifico")

// decode whatever unpackOnly skipped - fire and forget
FontsPack.unpackPending(context)

// suspend until a font is ready, then return its Typeface
val typeface = FontsPack.get("lobster")

get(name) is per-font deferred. If unpackOnly() is mid-flight, and you ask for a font in its list, you get it the moment its slice is decoded - you don't wait on the rest.

Two benchmark suites live in the sample app and every timing in this post comes from one of them:

Suite	What it measures
Local (JVM)	10 iterations (3 warmup). Full unpack vs reconstruct-only. p50/p95/avg.
Emulator/device	6 iterations after wiping `filesDir/fonts`. Per-pool + write p50/p95.

The ~115ms cold-decode median is from the emulator run.

On using AI agents

A few experiments here were tested with help from Claude and Codex - the entropy floor run and the OpenZL detour being the main ones.

Takeaways

Measure your entropy floor first. Run the heaviest available compressor against your data. If it produces N bytes, no portable compressor will produce significantly fewer. Knowing this up front saves weeks of chasing wins that don't exist.
Mixing encoding styles in a shared pool defeats cross-input matching. Predictive glyph encoding taught me this the hard way. Smart per-item optimizations can hurt aggregate compression. Uniformity matters.
The data is the variable, not the algorithm. Glyph-ID canonicalization yielding zero cross-font duplicates told me the limit isn't in the pipeline - it's in the corpus. A different set of fonts (a Variable Font family, related typefaces) would have a much higher ceiling.
Bigger frameworks aren't automatically better. OpenZL is impressive engineering. It just doesn't help my data shape. "We use this framework that beats everyone" is a sentence to verify, not believe.
Stopping is engineering too. Document the wall, ship what you have, and put the energy into other things that matter to users.

What's still on the table

For honesty, three things I measured but haven't completely ruled out:

A small custom arithmetic coder for the flags pool only. The full zpaq-vs-LZMA2 gap on that pool is ~24KB (~180KB → ~156KB). A minimal Kotlin context-mixing coder probably wouldn't capture all of it - call it ~10-15KB if it works well. Untested.
Cross-font glyph dedup with affine equivalence. TrueType uses int16 coordinates, so only integer-preserving transforms qualify as lossless: translation, reflection, 90° rotation, integer-ratio scale. Arbitrary rotation rounds and loses bytes. Within that restricted family, it's lossless. Likely few matches on this corpus, never measured.
A tiny Markov-style predictor for the glyphs pool. Train a small model (Markov order 3-4, or a sub-100KB LSTM if I'm feeling adventurous) on the WOFF2 triplet stream and use it as the probability source for a custom arithmetic coder. Speculative, but the entropy gap zpaq finds suggests there's something to model. Untested, no guarantees.

None of these are guaranteed. None are quick weekend projects. And none of them change the wall much, even in the best case - the floor measurement says the remaining budget is 26KB.

The shipping pack

~749KB. 17.8% of raw. 29 fonts, fully lossless, decoded in parallel in ~115ms on an emulator (~110ms median across ten back-to-back resets after the first cold-process start). The runtime API serves a Typeface the moment a specific font's slice is ready, even while the rest of the corpus is still decompressing.

For reference, the practical entropy floor for the joined pool blob (measured with zpaq at its heaviest setting) sits at ~720KB - around 17.2% of raw. So the shipping pack is parked ~4% above the floor, and that's where the corpus says it should be.

It's, as far as my measurements show, very close to the smallest that number can be on this corpus, under the constraints I picked - lossless, all 29 fonts, fast and robust decode.

The lesson is sometimes the best engineering choice is to measure the wall, agree it's the wall, and stop.

Glossary

A jargon dump for the curious. A lot of typography and compression terms got dropped without much context above, here's the short version -

Expand definitions

Font internals

Term	Meaning
TTF / OTF	The two main font file formats. Same internal structure, different historical lineage.
WOFF2	A compressed font format. Smaller than TTF, same rendering result.
Glyph	A single shape in a font - a letter, digit, or symbol.
Glyph ID	Each glyph gets a number inside the font. Different fonts assign different numbers to the same letter.
Hinting	Tiny instructions that snap glyphs cleanly to pixels at small sizes. Dehinting removes them.

OpenType tables (the chunks inside a TTF)

Table	Meaning
`glyf`	The actual glyph outlines - the shapes themselves.
`loca`	Where each glyph lives inside `glyf`. Basically an index.
`cmap`	Maps characters (like `A`) to their glyph IDs in the font.
`head`	General font info - version, sizing units, overall bounding box.
`name`	Human-readable strings (family name, style, copyright).
`post`	Legacy PostScript info. Format 3.0 drops glyph names to save space.
`hmtx`	How wide each glyph is and how much blank space sits to its left.
`kern`	Old way to tighten or loosen space between letter pairs. Replaced by `GPOS` long ago.
`fpgm` / `prep` / `cvt`	TrueType's hinting program code and its setup tables.
`GPOS` / `GSUB` / `GDEF`	Modern layout tables - positioning (kerning), glyph swaps (ligatures), categories.
LSB	Left side bearing. The horizontal gap between a glyph's bounding box and its leftmost ink.
Bbox	Bounding box. The rectangle that fully contains a glyph's outline.

OpenType layout internals

Term	Meaning
Coverage table	The list of which glyphs a positioning or substitution rule applies to.
ClassDef	Groups glyphs into buckets so a rule can target a whole category at once.
PairPos	Pair-positioning lookup. "For this pair, shift the second glyph by X units."
ValueFormat	A compact flag set saying which positioning fields are present - X offset, Y offset, etc.

Compression

Term	Meaning
Brotli	Google's general-purpose compressor. Modern, balanced speed and ratio.
LZMA2	The algorithm behind XZ. Slower than Brotli, usually compresses better.
zstd	Facebook's compressor. Very fast, modern, used pretty much everywhere.
LZ77	The classic trick: spot repeated chunks and replace them with pointers back to earlier copies.
Huffman / range / arithmetic coding	Ways to pack symbols into the fewest bits. Huffman is simpler; the other two get closer to the math-perfect limit.
Context modelling	Predicting the next byte from the bytes seen just before. Better predictions = fewer bits.
Entropy	The information-theoretic minimum bits needed to represent some data. You can't compress below it, no matter how clever the coder.
Entropy floor	The smallest size a very strong compressor reaches on a given input - a practical proxy for the remaining headroom.

OpenZL

Term	Meaning
OpenZL	Meta's format-aware compression framework. You describe your data, it picks transforms.
SDDL	Schema Description Language. How you tell OpenZL what your data looks like.
FunctionGraph / DispatchSerial	OpenZL's way of routing parts of the input through different transforms.