Plugin Interface Specification
Complete reference for plugin base classes, method signatures, service methods, and return type contracts.
This document specifies the complete interface contract for bizSupply plugins. Every plugin must conform to these interfaces to be registered and executed by the platform.
Plugin Contract
All plugins share a common contract: they extend a base class, implement a single required method, and return a specific type. The platform calls your method during pipeline execution, passing the relevant data for that stage.
Plugins have access to platform services through inherited methods on the base class. These services provide LLM access, logging, configuration, and credential retrieval.
ClassificationPlugin
Analyzes a document and assigns a document type string.
from bizsupply_sdk import ClassificationPlugin
class MyClassifier(ClassificationPlugin):
name = "my-classifier"
version = "1.0.0"
def classify(self, document) -> str:
"""
Classify a document and return its type.
Args:
document: Document object
- document.content: str — extracted text (up to 100KB)
- document.filename: str — original filename
- document.mime_type: str — MIME type
- document.metadata: dict — ingestion metadata
Returns:
str — document type (e.g., "invoice", "contract").
Must match a taxonomy in a registered ontology.
Raises:
PluginError — for expected, handleable failures.
"""
...ExtractionPlugin
Extracts structured fields from a document based on an ontology definition.
from bizsupply_sdk import ExtractionPlugin
class MyExtractor(ExtractionPlugin):
name = "my-extractor"
version = "1.0.0"
def extract(self, document, fields: list[dict]) -> dict:
"""
Extract structured data from a document.
Args:
document: Document object (same as ClassificationPlugin)
fields: list[dict] — field definitions from the ontology.
Each field dict contains:
- name: str — field name (e.g., "vendor_name")
- type: str — field type (string, number, date, boolean, array)
- description: str — human-readable description
- required: bool — whether the field is mandatory
Returns:
dict — mapping of field names to extracted values.
Keys must match field names from the ontology.
Example: {"vendor_name": "Acme Corp", "total_amount": 1500.00}
Raises:
PluginError — for expected, handleable failures.
"""
...SourcePlugin
Fetches documents from an external system and returns them for ingestion.
from bizsupply_sdk import SourcePlugin, RawDocument
class MySource(SourcePlugin):
name = "my-source"
version = "1.0.0"
def fetch_documents(self) -> list[RawDocument]:
"""
Fetch documents from an external source.
Access credentials via self.get_credential(credential_name).
Returns:
list[RawDocument] — list of documents to ingest.
Each RawDocument contains:
- content: bytes — raw file content
- filename: str — suggested filename
- mime_type: str — MIME type
- metadata: dict — optional key-value metadata
Raises:
PluginError — for connection failures, auth errors, etc.
"""
...AggregationPlugin
Processes a batch of extracted documents and returns aggregated results.
from bizsupply_sdk import AggregationPlugin
class MyAggregator(AggregationPlugin):
name = "my-aggregator"
version = "1.0.0"
def aggregate(self, documents: list) -> dict:
"""
Aggregate data across multiple processed documents.
Args:
documents: list — documents with extracted fields.
Each document has:
- document.id: str
- document.document_type: str
- document.fields: dict — extracted key-value data
- document.metadata: dict
Returns:
dict — aggregated results. Structure is plugin-defined.
Example: {"total_spend": 45000, "vendor_count": 12}
Raises:
PluginError — for processing failures.
"""
...Available Service Methods
All plugin base classes provide access to platform services through inherited methods. These are the methods available on self inside your plugin.
prompt_llm()
Sends a prompt to the platform's LLM service and returns the text response. This is the primary way plugins interact with large language models.
# Basic usage
result = self.prompt_llm("Classify this document: ...")
# With options
result = self.prompt_llm(
prompt="Extract the vendor name from: ...",
model="gemini-2.0-flash", # Default: platform-configured model
temperature=0.1, # Default: 0.2
max_tokens=500, # Default: 1024
)| Parameter | Type | Default | Description |
|---|---|---|---|
| prompt | str | (required) | The prompt text to send to the LLM. |
| model | str | Platform default | LLM model identifier. Depends on platform configuration. |
| temperature | float | 0.2 | Sampling temperature. Lower values produce more deterministic output. |
| max_tokens | int | 1024 | Maximum tokens in the LLM response. |
format_fields_for_prompt()
Converts an ontology field list into a formatted string suitable for inclusion in an LLM prompt. This ensures consistent prompt formatting across extraction plugins.
fields_text = self.format_fields_for_prompt(fields)
# Output:
# - vendor_name (string, required): The name of the vendor or supplier.
# - invoice_number (string, required): The unique invoice identifier.
# - total_amount (number, required): The total amount due, including taxes.
prompt = f"""Extract the following fields from this document:
{fields_text}
Document content:
{document.content[:4000]}
Return a JSON object with the field values."""
result = self.prompt_llm(prompt)get_credential()
Retrieves a stored credential by name. Only available in SourcePlugin. The credential is decrypted at retrieval time.
cred = self.get_credential("accounts-payable-imap")
# cred.host -> "imap.company.com"
# cred.port -> 993
# cred.username -> "ap@company.com"
# cred.password -> "decrypted-password"log()
Writes a log entry that is captured in the job execution log. Available levels: debug, info, warning, error.
self.log("info", f"Processing document: {document.filename}")
self.log("warning", "Document content is shorter than expected.")
self.log("error", f"LLM returned invalid JSON: {result}")get_config()
Retrieves a configuration value set for this plugin instance in the pipeline. This is how pipeline-level parameters are passed to plugins.
threshold = self.get_config("confidence_threshold", 0.8)
max_pages = self.get_config("max_pages", 50)Configurable Parameters Pattern
Plugins can declare configurable parameters as class attributes with default values. These parameters can be overridden per-pipeline through the pipeline configuration.
class InvoiceExtractor(ExtractionPlugin):
name = "invoice-extractor"
version = "2.1.0"
# Configurable parameters with defaults
confidence_threshold: float = 0.8
max_content_length: int = 8000
include_line_items: bool = True
supported_currencies: list[str] = ["USD", "EUR", "GBP"]
def extract(self, document, fields) -> dict:
# Access parameters via self or get_config
threshold = self.get_config("confidence_threshold", self.confidence_threshold)
max_len = self.get_config("max_content_length", self.max_content_length)
...When registering the plugin, declare the config schema so the platform can validate pipeline configurations:
{
"config_schema": {
"confidence_threshold": {
"type": "number",
"default": 0.8,
"min": 0.0,
"max": 1.0,
"description": "Minimum confidence score to accept an extraction."
},
"max_content_length": {
"type": "integer",
"default": 8000,
"description": "Maximum characters of document content to process."
},
"include_line_items": {
"type": "boolean",
"default": true,
"description": "Whether to extract individual line items."
}
}
}Return Types
Each plugin type has a strict return type contract:
| Plugin Type | Return Type | Validation |
|---|---|---|
| ClassificationPlugin | str | Must be a non-empty string. Should match a registered ontology taxonomy for extraction to proceed. |
| ExtractionPlugin | dict[str, Any] | Keys must correspond to field names defined in the ontology. Values are validated against field types (string, number, date, boolean, array). |
| SourcePlugin | list[RawDocument] | Each RawDocument must have non-empty content (bytes), a filename (str), and a mime_type (str). |
| AggregationPlugin | dict[str, Any] | Free-form dictionary. Structure is plugin-defined. Stored in the job results. |
Error Handling
Use PluginError for expected failures. Set retryable=True if the error is transient (e.g., network timeout) and the platform should retry the document.
from bizsupply_sdk import PluginError
# Non-retryable error — document is fundamentally unprocessable
raise PluginError(
"Document has no extractable text content.",
retryable=False,
)
# Retryable error — transient failure, try again
raise PluginError(
"LLM service timed out.",
retryable=True,
)Unhandled exceptions (anything other than PluginError) cause the entire job to fail immediately. Always catch unexpected errors and wrap them in PluginError with appropriate retryable flags.