Key Concepts

This page defines the core objects and abstractions you will work with in bizSupply — Documents, Plugins, Ontologies, Pipelines, Jobs, and Credentials — plus the organization, billing, and AI-assisted authoring concepts that sit around them. Understanding these is essential before creating plugins, pipelines, or ontologies.

Last updated: 2026-07-21

Document

A Document is the fundamental unit of data in bizSupply. It represents a file that has been ingested into the platform, along with all metadata and extracted data associated with it.

Property	Type	Description
`id`	string	Unique identifier (ULID) assigned at ingestion, e.g. `01HQZX3K4M2N5P7Q8R9S0T1U2V`.
`filename`	string	Original filename of the ingested document.
`mime_type`	string	MIME type of the document (e.g., `application/pdf`).
`size`	integer	File size in bytes.
`status`	string	Current lifecycle state: `pending`, `classified`, `extracted`, `completed`, or `failed`.
`labels`	array \	null	Classification tags assigned by a Classification plugin (e.g., `["invoice", "utility"]`).
`data`	object \	null	Extracted structured fields populated by an Extraction plugin.
`metadata`	object	Additional metadata: source, ingestion timestamp, sender, tags, custom attributes.
`tenant_id`	string	The tenant this document belongs to.
`created_at`	datetime	Timestamp when the document was ingested.
`updated_at`	datetime	Timestamp of the last modification.

Document Lifecycle

pending — Document has been ingested and stored. No processing has occurred.
classified — A Classification plugin has assigned one or more labels.
extracted — An Extraction plugin has populated the data fields.
completed — All pipeline stages have finished successfully.
failed — An error occurred during processing. Error details are stored in the document metadata.

Plugin

A Plugin is Python code that implements a specific stage of document processing. Every plugin extends one of four base classes and implements a single required method. Plugin methods are async — you must await any platform service call.

Plugin Type	Base Class	Required Method	Return Type
Source	`SourcePlugin`	`fetch()`	`AsyncIterator[DocumentInput]`
Classification	`ClassificationPlugin`	`classify()`	`str \	None`
Extraction	`ExtractionPlugin`	`extract()`	`ExtractionResult`
Benchmark	`BaseBenchmark`	`score()`, `compute()`, `compare()`	`float \	None`,` float`,` bool`

Plugin code and metadata are defined as class attributes and built using the bizsupply-sdk package (pip install bizsupply-sdk). You register a plugin by uploading its Python file — the platform auto-extracts the type and configurable parameters from the code. Every submitted plugin then enters a human-review gate before it can run in a pipeline (see Plugin Fabric and Purgatory Review below).

ℹ️Note

Plugins execute in an isolated environment with controlled access to platform services such as await self.prompt_llm(...) and await self.get_prompt(...). See the Plugin Interface for the full service API.

Ontology

An Ontology defines what to classify and what structured data to extract from documents. It acts as a schema — telling Classification plugins which labels apply and Extraction plugins exactly which fields to pull out.

Ontology Structure

An ontology has two parts:

Taxonomy — a hierarchical tree of labels. Each node has a label, a set of fields, and optional children for more specific sub-types.
Fields — the named data fields to extract for a label. Each field declares a dtype, a description, and whether it is required.

Example Ontology (YAML)

yaml

"color:#e06c75">name: "Invoice Ontology"
"color:#e06c75">description: "Schema for invoice processing"
"color:#e06c75">taxonomy:
  "color:#e06c75">label: "invoice"
  "color:#e06c75">fields:
    - name: "invoice_total"
      "color:#e06c75">dtype: "number"
      "color:#e06c75">required: "color:#d19a66">true
    - name: "invoice_date"
      "color:#e06c75">dtype: "date"
      "color:#e06c75">required: "color:#d19a66">true
    - name: "vendor_name"
      "color:#e06c75">dtype: "string"
      "color:#e06c75">required: "color:#d19a66">true
  "color:#e06c75">children:
    - label: "utility_invoice"
      "color:#e06c75">fields:
        - name: "utility_type"
          "color:#e06c75">dtype: "string"
          "color:#e06c75">required: "color:#d19a66">true

Usage:

Classification plugins use the taxonomy to apply the appropriate labels.
Extraction plugins use the fields to know what data to extract.
Multiple ontologies can be combined in a single pipeline (see Ontology Merging).

Pipeline

A Pipeline defines a complete document processing workflow. It specifies which plugins to run, which ontologies to extract against, and (optionally) which sources to process.

Component	Required	Description
`plugin_ids`	Yes	Ordered list of plugins to execute (Source → Classification → Extraction → Benchmark).
`ontology_catalogs_ids`	Yes	Ontologies to use for classification and extraction.
`source_ids`	No	Specific connected sources to process. If omitted, the pipeline runs against all eligible sources.

Example Pipeline

json

{
  "name": "Gmail Invoice Processing",
  "plugin_ids": [
    "01HQZX3K4M2N5P7Q8R9S0T1U2V",
    "01HQZX7A9B2C4D6E8F0G1H3J5K",
    "01HQZX9M1N3P5Q7R9S2T4U6V8W"
  ],
  "ontology_catalogs_ids": [
    "01HQZXB2C4D6E8F0G1H3J5K7M9"
  ],
  "source_ids": [
    "01HQZXD4E6F8G0H2J4K6M8N0P2"
  ]
}

Job

A Job represents a single execution of a pipeline. When you run a pipeline, bizSupply creates a Job that tracks the progress, status, and results of the execution.

State	Description
`pending`	Job created, waiting to be picked up for processing.
`running`	Actively processing documents.
`completed`	All documents processed successfully.
`failed`	The job encountered a fatal error and stopped.
`credit_exhausted`	Halted mid-job because the tenant's credit balance fell below the configured overshoot margin. Retryable once credits are topped up.

Each job has a unique ID and records documents-processed count, the plugin currently executing, start/end timestamps, and any errors. You can query job status via the API at any time, or receive live updates over Server-Sent Events (SSE).

Credential

A Credential stores the authentication details a Source plugin needs to connect to an external system. Credentials are encrypted at rest and only decrypted when a plugin requires them during execution.

Supported Source Types

Source Type	Description
`gmail`	Gmail inbox — fetches emails and attachments via the Gmail API.
`microsoft_365`	Microsoft 365 email (Outlook) — fetches emails and attachments via the Microsoft Graph API.
`google_drive`	Google Drive files with incremental sync via the Drive Changes API.
`imap`	Any IMAP-compatible email server.
`custom_api`	Custom API with user-defined credential fields.

Credential Formats by Type

OAuth2 (Gmail, Outlook, Google Drive):

json

{
  "client_id": "your-client-id",
  "client_secret": "your-client-secret",
  "refresh_token": "your-refresh-token"
}

IMAP (email servers):

json

{
  "host": "imap.gmail.com",
  "port": 993,
  "username": "user@example.com",
  "password": "your-app-password",
  "use_ssl": true
}

API Key (custom APIs):

json

{
  "api_key": "your-api-key",
  "api_url": "https://api.example.com"
}

⚠️Warning

Never include credential secrets in logs or error messages. The platform automatically redacts sensitive fields in API responses — only the credential name, type, and creation date are returned.

How Concepts Relate

The relationships between bizSupply concepts form a clear hierarchy:

An organization (tenant) owns all other resources — Documents, Plugins, Ontologies, Pipelines, Credentials, and Jobs.
A Pipeline references an ordered list of plugins, one or more ontologies, and zero or more sources.
An Ontology links labels (taxonomy) to the fields extracted for them. When a Classification plugin assigns a label, the matching ontology determines what fields to extract.
A Job is created each time a pipeline executes and tracks per-document progress and results.
A Document flows through the pipeline stages: it starts as pending, becomes classified, then extracted, and finally completed.
Credentials connect Source plugins to external systems and are consumed at execution time.

Data Isolation

bizSupply enforces strict data isolation at the tenant level. Every API request carries a tenant context (via the JWT or API key), and all database queries are automatically scoped to that tenant. There is no mechanism to query across tenants — even platform administrators access tenant data only through tenant-scoped endpoints.

Within a tenant, resources share the same namespace. Plugin names, ontology labels, and pipeline names must be unique within a tenant but can overlap across tenants, so organizations can use the same naming conventions without conflicts.

Organizations & Multi-Tenancy

bizSupply uses a multi-organization membership model for data isolation and collaboration.

Organization Types

Type	Description
Personal	Auto-created on signup. Your private workspace.
Team	Multi-user organizations for collaboration. Users are invited via email.

Membership Roles

Each user has a role per organization they belong to:

Role	Capabilities
Owner	Full control, transfer ownership, delete the organization.
Admin	Manage members, invitations, and organization settings.
Member	Access features and create resources.
Viewer	Read-only access.

Organization Switching

You can belong to multiple organizations and switch between them:

Your active organization determines which data you see and create.
Switching organizations issues a new authentication token scoped to the target organization.
All API operations are automatically scoped to your active organization.

Subscription Plans & Billing

Each organization is on one of four plan tiers — FREE, STARTER, PROFESSIONAL, or ENTERPRISE. The plan controls per-tenant limits (e.g. maximum source connections, maximum users) and the cadence of billing and credit refresh.

Billing Anchor

Each organization has a billing anchor date — the day-of-cycle the tenant joined the plan. Stripe customers' anchors mirror Stripe's current period start; Marketplace customers' anchors record the entitlement event time.

Credit Refresh

Credits refresh on each tenant's personal anniversary at the cadence declared on the plan:

Tier	Cadence	Behavior
FREE / STARTER / PROFESSIONAL	`MONTHLY`	New credits land on the day-of-month matching your anchor.
ENTERPRISE	`ON_RENEWAL`	Refresh only on upstream renewal events (no fixed cadence).

Refresh is additive (carryover) — credits left at the end of a period are not zeroed out. A tenant ending the month with 2,000 credits remaining starts the next month with 2,000 plus the tier's credits.

Viewing Your Cycle

GET /api/v1/billing/cycle returns the active organization's billing anchor, current period start/end, and next billing date. ENTERPRISE customers see the anchor only — period boundaries are negotiated, not computed.

Credit Overshoot Margin

The overshoot margin is a per-tenant safety buffer that controls how far below zero a tenant's credit balance can go during an active job before processing stops.

Margin	Behaviour
`0` (default)	The job stops as soon as the balance goes negative.
`100`	The job continues until the balance reaches −100 credits, then stops.

When a job is halted by credit exhaustion, its status is set to credit_exhausted. It can be retried once the balance is topped up. New jobs are blocked while the balance is ≤ 0 regardless of the margin. The margin is configured per tenant by platform administrators.

Plugin Fabric

Plugin Fabric is the AI-assisted plugin authoring system. Instead of writing a .py file by hand and uploading it, you describe what you want in a multi-turn chat with the LLM and the system produces validated plugin code as the conversation progresses.

Step	Description
Start	Pick a plugin type (`source`, `classification`, or `extraction`) and send a first message describing the goal. The system creates a conversation and runs the first LLM turn immediately.
Iterate	Each follow-up message ("add a check for X", "use this label set instead") drives one LLM turn. The assistant returns a full revised version of the code; the previous version is replaced. Up to 30 turns per conversation.
Validate	Every turn's generated code is validated server-side. If validation fails, the system silently retries once with the validator's error appended — you only see the result, and the retry's tokens are not billed.
Submit	When satisfied, submit. The conversation transitions to a terminal `submitted` state and the code enters the standard plugin review queue.
Review	A platform administrator reviews the submission exactly as if you'd uploaded the file by hand — same review queue, same approve/reject controls, same audit trail. There is no fast-path bypass.

Same security model as manual upload. Every plugin — regardless of authoring path — passes through human review before it can run. Plugin Fabric changes the authoring surface, not the trust boundary.

Cost tracking. LLM tokens consumed per turn are billed against your tenant's credit balance through the same metering as Ontology Fabric and the in-product assistant.

See Create a Plugin for the full authoring journey.

Ontology Fabric

Ontology Fabric is the LLM-powered ontology generation system. Instead of manually defining taxonomy trees and extraction fields, you describe what you need in natural language and the system generates a complete ontology.

Step	Description
Request	Describe your ontology needs in plain language (e.g., "energy contracts in Portugal with pricing and renewal terms").
Generation	The LLM generates a full ontology manifest (taxonomy + fields) using the configured generation prompt.
Scoring	The generated ontology is automatically scored for quality (0–10) using structural validation, overlap detection, and LLM assessment.
Review	High-confidence ontologies (score ≥ 7.0) are auto-approved. Lower scores go to human review by platform administrators.
Merging	Approved ontologies with the same root label are merged at runtime — base fields apply transversally across all matching ontologies.

Purgatory Review

Purgatory is a quality-gate pattern for LLM-generated content. Entities pass through scoring before entering the system.

State	Description
`PENDING_SCORING`	Entity submitted, awaiting quality scoring.
`APPROVED`	Score ≥ 7.0 (auto-approved) or manually approved by an administrator.
`HUMAN_REVIEW`	Score < 7.0, awaiting an administrator decision.
`REJECTED`	Rejected with a reason.

The system is entity-agnostic — currently used for ontology generation, but extensible to benchmarks, plugins, or any LLM-generated content without schema changes.

Ontology Merging

When multiple ontologies share the same root label (e.g., two ontologies both rooted at "contract"), they are merged at runtime. Base-level fields from each ontology are applied transversally, so extraction plugins see a unified field set without manual deduplication.

This enables modular ontology design: create focused ontologies for specific domains (e.g., "energy contracts", "service contracts") and the system combines them automatically when processing documents that match the shared root label.

Model-Aware Prompts

Different LLM models perform best with different prompt styles. Model-aware prompts let you register multiple variants of the same prompt, each optimized for a specific model, and the platform resolves the right variant at runtime.

Concept	Description
`purpose`	Groups prompt variants (e.g., all "classification" prompts share this purpose).
`model_affinity`	Specifies which model tier a prompt is optimized for (`gemini_3`, `gemini_2`, `openai`, `claude`).
Resolution	At runtime, the system automatically selects the best prompt variant for the active model.

Example:

Pipeline config: classification prompt = "prompt-v13" (optimized for Gemini 3)

Active model: gemini-2.5-flash
  → System detects tier "gemini_2"
  → Finds "prompt-v8" with the same purpose + model_affinity="gemini_2"
  → Plugin receives "prompt-v8" transparently

Active model: gemini-3-flash-preview
  → System detects tier "gemini_3"
  → "prompt-v13" already matches → no swap needed

Key properties:

Transparent to plugins — they see the resolved prompt, no code changes needed.
Only global-scope prompts are used for model-specific overrides.
Resolutions are cached at the service level for performance.
If no model-specific variant exists, the base prompt is used as-is.

Next Steps

Install the SDK → pip install bizsupply-sdk
Build a plugin → Create a Plugin
Define extraction schemas → Create an Ontology
Process documents → Process Documents

GETTING STARTEDSystem Overview USER GUIDEBilling and plans