What TOON is and where it fits
A compact, LLM-friendly alternative for tabular data; useful and efficient, yet still not the answer for every data structure.
Introduction
When you send structured data into a large language model, every token you include costs time and money. TOON (Token-Oriented Object Notation) is a compact, human-readable way to express the JSON data model that’s built specifically for LLM inputs.
TOON uses indentation like YAML plus a CSV-style header/rows for uniform arrays so you don’t repeat field names across many objects.
The intent is simple: same data, fewer tokens, clearer structure for models.
What TOON looks like
Instead of repeating keys for every object, TOON declares array length and field names once, then lists rows. That makes uniform arrays much denser than standard JSON while remaining lossless. Tooling and SDKs already exist for several languages, so it’s ready to plug into pipelines.
Where TOON does best
Uniform tabular arrays: Large lists of same-shaped objects (employees, orders, logs) are TOON’s sweet spot. Declaring
{fields}once and streaming rows saves many tokens compared to repeating keys in JSON. Several independent write-ups and the format’s own benchmarks report large token savings in these cases.
LLM-heavy pipelines: When you’re sending big contexts to LLMs (retrieval questions, structured extraction), the explicit length markers (
[N]) and field headers reduce ambiguity and help models validate and extract reliably. That often raises retrieval accuracy while cutting prompt size.Streaming & memory-constrained flows: TOON’s line-by-line, table-like encoding fits streaming encoders/decoders well, which is useful for very large datasets where you want to avoid holding full JSON in memory.
Where TOON is the wrong tool
Deeply nested or irregular data: If your JSON has lots of nested objects, optional fields, or widely varying item shapes, TOON’s tabular optimizations add little or even make the document longer and harder to manage. Compact JSON or YAML often wins here.
Pure tabular pipelines already on CSV: For flat tables with no hierarchy, plain CSV remains slightly smaller and is universally supported; TOON intentionally trades a small overhead for structural safety.
Latency-sensitive systems where parser speed matters: Token count isn’t the only metric. Some runtimes parse compact JSON faster than a conversion + parse step, so measure latency end-to-end before committing.
TOON is helpful, not magical
Treat TOON as a pragmatic tool: it reduces token bills and often improves LLM retrieval on uniform datasets, but it’s not a universal replacement for JSON/YAML/CSV.
If your data is mostly tabular and you care about prompt size or extraction reliability, TOON is worth testing. If your data is nested, irregular, or your toolchain already optimizes for JSON, stick with what’s simplest.
Run a quick A/B: convert a real dataset, compare token counts, parsing time, and model accuracy, that’s the only honest way to know.


