Key Concepts
Learn the core objects and abstractions in bizSupply — Documents, Plugins, Ontologies, Pipelines, Jobs, and Credentials.
This page defines the core objects and abstractions you will work with in bizSupply. Understanding these concepts is essential before creating plugins, pipelines, or ontologies.
Document
A Document is the fundamental unit of data in bizSupply. It represents a file that has been ingested into the platform, along with all metadata and extracted data associated with it.
| Property | Type | Description |
|---|---|---|
| id | string | Unique identifier assigned at ingestion (e.g., doc_a1b2c3d4). |
| filename | string | Original filename of the uploaded document. |
| mime_type | string | MIME type of the document (e.g., application/pdf). |
| size | integer | File size in bytes. |
| status | string | Current lifecycle state: pending, classified, extracted, completed, or failed. |
| document_type | string | null | Classification result — the document type assigned by a Classification plugin. |
| fields | object | null | Extracted data — key-value pairs populated by an Extraction plugin. |
| metadata | object | Additional metadata: source, ingestion timestamp, tags, custom attributes. |
| tenant_id | string | The tenant this document belongs to. |
| created_at | datetime | Timestamp when the document was ingested. |
| updated_at | datetime | Timestamp of the last modification. |
Document Lifecycle
- pending — Document has been ingested and stored. No processing has occurred.
- classified — A Classification plugin has assigned a document type.
- extracted — An Extraction plugin has populated the fields object.
- completed — All pipeline stages have finished successfully.
- failed — An error occurred during processing. The error details are stored in the document metadata.
Plugin
A Plugin is a Python class that implements a specific stage of document processing. Every plugin extends one of four base classes and implements a single required method.
| Plugin Type | Base Class | Required Method | Return Type |
|---|---|---|---|
| Source | SourcePlugin | fetch_documents() | list[Document] |
| Classification | ClassificationPlugin | classify(document) | str (document type) |
| Extraction | ExtractionPlugin | extract(document, fields) | dict[str, Any] |
| Aggregation | AggregationPlugin | aggregate(documents) | dict[str, Any] |
Plugins are registered with the platform via the API or CLI, specifying the plugin type, name, version, and the Python module path. Once registered, a plugin can be referenced in any pipeline.
Plugins execute in an isolated environment with access to platform services like prompt_llm() and format_fields_for_prompt(). See the Plugin Interface Specification for the full service API.
Ontology
An Ontology defines the structured data you want to extract from a specific document type. It acts as a schema — telling Extraction plugins exactly which fields to look for, what types they are, and how to validate them.
Ontology Structure
An ontology consists of three parts:
- Taxonomy — the document type this ontology applies to (e.g., "invoice", "purchase_order").
- Fields — a list of named fields, each with a type, description, and validation rules.
- Validation rules — constraints like required, min/max length, regex patterns, and allowed values.
Example Ontology (YAML)
"color:#e06c75">taxonomy: invoice
"color:#e06c75">fields:
- name: vendor_name
"color:#e06c75">type: string
"color:#e06c75">description: "The name of the vendor or supplier."
"color:#e06c75">required: "color:#d19a66">true
- name: invoice_number
"color:#e06c75">type: string
"color:#e06c75">description: "The unique invoice identifier."
"color:#e06c75">required: "color:#d19a66">true
- name: invoice_date
"color:#e06c75">type: date
"color:#e06c75">description: "The date the invoice was issued."
"color:#e06c75">required: "color:#d19a66">true
- name: due_date
"color:#e06c75">type: date
"color:#e06c75">description: "The payment due date."
"color:#e06c75">required: "color:#d19a66">false
- name: total_amount
"color:#e06c75">type: number
"color:#e06c75">description: "The total amount due, including taxes."
"color:#e06c75">required: "color:#d19a66">true
- name: currency
"color:#e06c75">type: string
"color:#e06c75">description: "ISO 4217 currency code (e.g., USD, EUR)."
"color:#e06c75">required: "color:#d19a66">true
"color:#e06c75">allowed_values: [USD, EUR, GBP, CHF, JPY]
- name: line_items
"color:#e06c75">type: array
"color:#e06c75">description: "Individual line items on the invoice."
"color:#e06c75">required: "color:#d19a66">false
"color:#e06c75">items:
- name: description
"color:#e06c75">type: string
- name: quantity
"color:#e06c75">type: number
- name: unit_price
"color:#e06c75">type: number
- name: amount
"color:#e06c75">type: numberPipeline
A Pipeline defines a complete document processing workflow — from ingestion to extraction. It specifies which plugins to run, in what order, and with what configuration.
| Component | Required | Description |
|---|---|---|
| name | Yes | Human-readable pipeline name. |
| source_plugin | Yes | The Source plugin that fetches documents. |
| classification_plugin | Yes | The Classification plugin that assigns document types. |
| extraction_plugin | Yes | The Extraction plugin that extracts structured data. |
| aggregation_plugin | No | Optional Aggregation plugin for post-processing. |
| ontology_id | Yes | The ontology to use for extraction. |
| credentials | Depends | Credential references needed by the Source plugin. |
| schedule | No | Optional cron expression for automated execution. |
Example Pipeline
{
"name": "Invoice Processing Pipeline",
"source_plugin": "imap-source",
"classification_plugin": "document-classifier",
"extraction_plugin": "invoice-extractor",
"aggregation_plugin": "spend-aggregator",
"ontology_id": "ont_invoice_v2",
"credentials": ["cred_imap_accounts_payable"],
"schedule": "0 */6 * * *"
}Job
A Job represents a single execution of a pipeline. When you run a pipeline, bizSupply creates a Job that tracks the progress, status, and results of the entire execution.
| State | Description |
|---|---|
| queued | Job has been created and is waiting to be picked up by a worker. |
| running | Job is actively processing documents. |
| completed | All documents have been processed successfully. |
| failed | The job encountered a fatal error and stopped. |
| partial | Some documents succeeded, but one or more failed. Successful results are preserved. |
| cancelled | The job was cancelled by the user before completion. |
Each job has a unique ID and maintains a log of every step executed, including per-document status, timing information, and any errors encountered. You can query job status via the API at any time.
Credential
A Credential stores the authentication details that Source plugins need to connect to external systems. Credentials are encrypted at rest and only decrypted when a plugin requires them during execution.
bizSupply supports three credential types:
OAuth2
{
"type": "oauth2",
"name": "Google Workspace",
"client_id": "your-client-id",
"client_secret": "your-client-secret",
"refresh_token": "your-refresh-token",
"token_url": "https://oauth2.googleapis.com/token",
"scopes": ["https://www.googleapis.com/auth/gmail.readonly"]
}IMAP
{
"type": "imap",
"name": "Accounts Payable Inbox",
"host": "imap.company.com",
"port": 993,
"username": "ap@company.com",
"password": "your-password",
"use_ssl": true,
"folder": "INBOX"
}API Key
{
"type": "api_key",
"name": "SharePoint Connector",
"api_key": "your-api-key",
"base_url": "https://company.sharepoint.com/sites/documents",
"headers": {
"X-Custom-Header": "value"
}
}Never include credential secrets in logs or error messages. The platform automatically redacts sensitive fields in API responses — only the credential name, type, and creation date are returned.
How Concepts Relate
The relationships between bizSupply concepts form a clear hierarchy:
- A Tenant owns all other resources — Documents, Plugins, Ontologies, Pipelines, Credentials, and Jobs.
- A Pipeline references one Source Plugin, one Classification Plugin, one Extraction Plugin, optionally one Aggregation Plugin, one Ontology, and zero or more Credentials.
- An Ontology is linked to a document type (taxonomy). When a Classification plugin assigns a type, the matching Ontology determines what fields to extract.
- A Job is created each time a Pipeline executes. The Job tracks per-document progress and stores results.
- A Document flows through the pipeline stages: it starts as pending, becomes classified, then extracted, and finally completed.
- Credentials are referenced by Pipelines and consumed by Source Plugins at execution time.
Data Isolation
bizSupply enforces strict data isolation at the tenant level. Every API request includes a tenant context (via the API key or OAuth token), and all database queries are automatically scoped to that tenant. There is no mechanism to query across tenants — even platform administrators access tenant data only through tenant-scoped endpoints.
Within a tenant, all resources share the same namespace. Plugin names, ontology taxonomies, and pipeline names must be unique within a tenant but can overlap across tenants. This design ensures that organizations can use the same naming conventions without conflicts.