Key Concepts

Learn the core objects and abstractions in bizSupply — Documents, Plugins, Ontologies, Pipelines, Jobs, and Credentials.

Last updated: 2026-04-01

This page defines the core objects and abstractions you will work with in bizSupply. Understanding these concepts is essential before creating plugins, pipelines, or ontologies.


Document

A Document is the fundamental unit of data in bizSupply. It represents a file that has been ingested into the platform, along with all metadata and extracted data associated with it.

PropertyTypeDescription
idstringUnique identifier assigned at ingestion (e.g., doc_a1b2c3d4).
filenamestringOriginal filename of the uploaded document.
mime_typestringMIME type of the document (e.g., application/pdf).
sizeintegerFile size in bytes.
statusstringCurrent lifecycle state: pending, classified, extracted, completed, or failed.
document_typestring | nullClassification result — the document type assigned by a Classification plugin.
fieldsobject | nullExtracted data — key-value pairs populated by an Extraction plugin.
metadataobjectAdditional metadata: source, ingestion timestamp, tags, custom attributes.
tenant_idstringThe tenant this document belongs to.
created_atdatetimeTimestamp when the document was ingested.
updated_atdatetimeTimestamp of the last modification.

Document Lifecycle

  1. pending — Document has been ingested and stored. No processing has occurred.
  2. classified — A Classification plugin has assigned a document type.
  3. extracted — An Extraction plugin has populated the fields object.
  4. completed — All pipeline stages have finished successfully.
  5. failed — An error occurred during processing. The error details are stored in the document metadata.

Plugin

A Plugin is a Python class that implements a specific stage of document processing. Every plugin extends one of four base classes and implements a single required method.

Plugin TypeBase ClassRequired MethodReturn Type
SourceSourcePluginfetch_documents()list[Document]
ClassificationClassificationPluginclassify(document)str (document type)
ExtractionExtractionPluginextract(document, fields)dict[str, Any]
AggregationAggregationPluginaggregate(documents)dict[str, Any]

Plugins are registered with the platform via the API or CLI, specifying the plugin type, name, version, and the Python module path. Once registered, a plugin can be referenced in any pipeline.

ℹ️Note

Plugins execute in an isolated environment with access to platform services like prompt_llm() and format_fields_for_prompt(). See the Plugin Interface Specification for the full service API.


Ontology

An Ontology defines the structured data you want to extract from a specific document type. It acts as a schema — telling Extraction plugins exactly which fields to look for, what types they are, and how to validate them.

Ontology Structure

An ontology consists of three parts:

  • Taxonomy — the document type this ontology applies to (e.g., "invoice", "purchase_order").
  • Fields — a list of named fields, each with a type, description, and validation rules.
  • Validation rules — constraints like required, min/max length, regex patterns, and allowed values.

Example Ontology (YAML)

ontology.yamlyaml
"color:#e06c75">taxonomy: invoice
"color:#e06c75">fields:
  - name: vendor_name
    "color:#e06c75">type: string
    "color:#e06c75">description: "The name of the vendor or supplier."
    "color:#e06c75">required: "color:#d19a66">true
  - name: invoice_number
    "color:#e06c75">type: string
    "color:#e06c75">description: "The unique invoice identifier."
    "color:#e06c75">required: "color:#d19a66">true
  - name: invoice_date
    "color:#e06c75">type: date
    "color:#e06c75">description: "The date the invoice was issued."
    "color:#e06c75">required: "color:#d19a66">true
  - name: due_date
    "color:#e06c75">type: date
    "color:#e06c75">description: "The payment due date."
    "color:#e06c75">required: "color:#d19a66">false
  - name: total_amount
    "color:#e06c75">type: number
    "color:#e06c75">description: "The total amount due, including taxes."
    "color:#e06c75">required: "color:#d19a66">true
  - name: currency
    "color:#e06c75">type: string
    "color:#e06c75">description: "ISO 4217 currency code (e.g., USD, EUR)."
    "color:#e06c75">required: "color:#d19a66">true
    "color:#e06c75">allowed_values: [USD, EUR, GBP, CHF, JPY]
  - name: line_items
    "color:#e06c75">type: array
    "color:#e06c75">description: "Individual line items on the invoice."
    "color:#e06c75">required: "color:#d19a66">false
    "color:#e06c75">items:
      - name: description
        "color:#e06c75">type: string
      - name: quantity
        "color:#e06c75">type: number
      - name: unit_price
        "color:#e06c75">type: number
      - name: amount
        "color:#e06c75">type: number

Pipeline

A Pipeline defines a complete document processing workflow — from ingestion to extraction. It specifies which plugins to run, in what order, and with what configuration.

ComponentRequiredDescription
nameYesHuman-readable pipeline name.
source_pluginYesThe Source plugin that fetches documents.
classification_pluginYesThe Classification plugin that assigns document types.
extraction_pluginYesThe Extraction plugin that extracts structured data.
aggregation_pluginNoOptional Aggregation plugin for post-processing.
ontology_idYesThe ontology to use for extraction.
credentialsDependsCredential references needed by the Source plugin.
scheduleNoOptional cron expression for automated execution.

Example Pipeline

pipeline.jsonjson
{
  "name": "Invoice Processing Pipeline",
  "source_plugin": "imap-source",
  "classification_plugin": "document-classifier",
  "extraction_plugin": "invoice-extractor",
  "aggregation_plugin": "spend-aggregator",
  "ontology_id": "ont_invoice_v2",
  "credentials": ["cred_imap_accounts_payable"],
  "schedule": "0 */6 * * *"
}

Job

A Job represents a single execution of a pipeline. When you run a pipeline, bizSupply creates a Job that tracks the progress, status, and results of the entire execution.

StateDescription
queuedJob has been created and is waiting to be picked up by a worker.
runningJob is actively processing documents.
completedAll documents have been processed successfully.
failedThe job encountered a fatal error and stopped.
partialSome documents succeeded, but one or more failed. Successful results are preserved.
cancelledThe job was cancelled by the user before completion.

Each job has a unique ID and maintains a log of every step executed, including per-document status, timing information, and any errors encountered. You can query job status via the API at any time.


Credential

A Credential stores the authentication details that Source plugins need to connect to external systems. Credentials are encrypted at rest and only decrypted when a plugin requires them during execution.

bizSupply supports three credential types:

OAuth2

json
{
  "type": "oauth2",
  "name": "Google Workspace",
  "client_id": "your-client-id",
  "client_secret": "your-client-secret",
  "refresh_token": "your-refresh-token",
  "token_url": "https://oauth2.googleapis.com/token",
  "scopes": ["https://www.googleapis.com/auth/gmail.readonly"]
}

IMAP

json
{
  "type": "imap",
  "name": "Accounts Payable Inbox",
  "host": "imap.company.com",
  "port": 993,
  "username": "ap@company.com",
  "password": "your-password",
  "use_ssl": true,
  "folder": "INBOX"
}

API Key

json
{
  "type": "api_key",
  "name": "SharePoint Connector",
  "api_key": "your-api-key",
  "base_url": "https://company.sharepoint.com/sites/documents",
  "headers": {
    "X-Custom-Header": "value"
  }
}
⚠️Warning

Never include credential secrets in logs or error messages. The platform automatically redacts sensitive fields in API responses — only the credential name, type, and creation date are returned.


How Concepts Relate

The relationships between bizSupply concepts form a clear hierarchy:

  • A Tenant owns all other resources — Documents, Plugins, Ontologies, Pipelines, Credentials, and Jobs.
  • A Pipeline references one Source Plugin, one Classification Plugin, one Extraction Plugin, optionally one Aggregation Plugin, one Ontology, and zero or more Credentials.
  • An Ontology is linked to a document type (taxonomy). When a Classification plugin assigns a type, the matching Ontology determines what fields to extract.
  • A Job is created each time a Pipeline executes. The Job tracks per-document progress and stores results.
  • A Document flows through the pipeline stages: it starts as pending, becomes classified, then extracted, and finally completed.
  • Credentials are referenced by Pipelines and consumed by Source Plugins at execution time.

Data Isolation

bizSupply enforces strict data isolation at the tenant level. Every API request includes a tenant context (via the API key or OAuth token), and all database queries are automatically scoped to that tenant. There is no mechanism to query across tenants — even platform administrators access tenant data only through tenant-scoped endpoints.

Within a tenant, all resources share the same namespace. Plugin names, ontology taxonomies, and pipeline names must be unique within a tenant but can overlap across tenants. This design ensures that organizations can use the same naming conventions without conflicts.