Design your ontology with restraint

Every field you add to an ontology costs tokens on every extraction. Start with the smallest schema that solves the problem and grow from there.

3 min read

It is tempting to design an ontology by listing every field that could be useful one day. The result is a schema with forty fields, three nested arrays, and a per-document extraction cost that is four to five times what you actually need.

What costs and what does not
Field count and array depth drive cost. Required vs. optional makes no difference — the LLM still has to consider every field on every document. Type validation is free.

The minimal ontology principle

Start with the fields that answer your actual question. If the question is "which contracts renew in the next 90 days", the ontology needs vendor, contract value, start date, end date, and renewal terms. That is five fields, not forty. Optional metadata like jurisdiction, governing_law, and signatures[].title can be added later, on a separate pipeline if needed.

Do
  • ·Write down the report or alert your team needs first; only extract fields that feed it.
  • ·Use a short, flat ontology for high-volume sources; reserve nested arrays for documents you process rarely.
  • ·Ship a 5-field ontology, prove it works, then add fields one at a time and watch credit-per-document change.
Don't
  • ·Add fields "in case we need them later" — you can re-extract later if you really do.
  • ·Nest line items, signatories, and addenda as arrays unless the downstream process actually consumes them.
  • ·Mark every field required to "force quality" — required fields cost the same and produce more retries.

Cost is observable

bizSupply reports credits-per-document on every job. If the number jumps after an ontology change, the ontology is the cause — not the source, not the model. Treat it as a budget line item and optimise it like one.

Still need help?
If this article does not solve it, the bizSupply team is one ticket away.
Submit a ticket