Create a Classification Plugin
Build a classification plugin that categorizes documents with labels using LLM-powered analysis and hierarchical taxonomies.
Classification plugins analyze document content and assign one or more category labels. They are the first processing stage in most pipelines — downstream extraction and aggregation steps depend on accurate classification to select the correct ontology and field definitions.
What Classification Plugins Do
A classification plugin receives a Document object and returns a string label that categorizes the document. This label determines which extraction ontology the platform applies next. For example, a document classified as "invoice" routes to the invoice ontology, while a "contract" routes to the contract ontology.
- Categorize documents into predefined types (invoice, receipt, contract, purchase order, etc.)
- Support hierarchical classification with multi-level taxonomies
- Use LLM analysis, rule-based logic, or a combination of both
- Return a single string label that maps to a registered ontology taxonomy
Prerequisites
- Python 3.10+ installed
- bizSupply SDK installed: pip install bizsupply-sdk
- A valid API key with plugin registration permissions
- A registered ontology with taxonomies matching your classification labels (optional but recommended)
Step 1 — Write the Plugin Code
Create a new Python file and implement your classification logic by extending ClassificationPlugin. The classify() method is the only required method — it receives a Document and must return a string.
from bizsupply_sdk import ClassificationPlugin, PluginError
class InvoiceClassifierPlugin(ClassificationPlugin):
"""Classifies financial documents into invoice, purchase_order,
receipt, contract, or unknown."""
name = "invoice-classifier"
version = "1.0.0"
description = "Classifies financial documents using LLM analysis."
# Configurable parameters
confidence_threshold: float = 0.8
max_content_length: int = 6000
VALID_TYPES = {
"invoice", "purchase_order", "receipt",
"contract", "credit_note", "unknown",
}
def classify(self, document) -> str:
"""
Analyze the document and return a classification label.
Args:
document: Document object with .content, .filename,
.mime_type, and .metadata attributes.
Returns:
A string label (e.g., "invoice").
"""
if not document.content or not document.content.strip():
raise PluginError(
"Document has no extractable text content.",
retryable=False,
)
# Retrieve the classification prompt from the platform
prompt_template = self.get_prompt("financial-doc-classifier")
# Build the final prompt with document context
prompt = prompt_template.replace(
"{{DOCUMENT_CONTENT}}",
document.content[:self.max_content_length],
).replace(
"{{FILENAME}}",
document.filename,
)
# Call the LLM
result = self.prompt_llm(prompt, temperature=0.1)
doc_type = result.strip().lower().replace(" ", "_")
# Validate the result against known types
if doc_type not in self.VALID_TYPES:
self.log("warning", f"LLM returned unknown type '{doc_type}', defaulting to 'unknown'.")
return "unknown"
self.log("info", f"Classified '{document.filename}' as '{doc_type}'.")
return doc_typeUse self.get_prompt() to load reusable prompt templates instead of hardcoding prompts in your plugin. This lets you update prompts without redeploying code. See the Create a Prompt guide for details.
Step 2 — Create a Classification Prompt
Register a reusable prompt template that your plugin will load at runtime. This keeps prompt engineering separate from plugin code.
curl -X POST https://api.bizsupply.com/v1/prompts \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "financial-doc-classifier",
"scope": "tenant",
"template": "You are a document classification expert.\n\nAnalyze the following document and classify it as exactly one of:\n- invoice\n- purchase_order\n- receipt\n- contract\n- credit_note\n- unknown\n\nDocument filename: {{FILENAME}}\n\nDocument content:\n{{DOCUMENT_CONTENT}}\n\nRespond with ONLY the document type label. No explanation, no punctuation."
}'Step 3 — Validate and Register
Validate your plugin locally, then register it with the platform.
="color:#5c6370;font-style:italic"># Validate the plugin structure
bizsupply validate ./plugin.py
="color:#5c6370;font-style:italic"># ✓ Plugin class found: InvoiceClassifierPlugin
="color:#5c6370;font-style:italic"># ✓ Base class: ClassificationPlugin
="color:#5c6370;font-style:italic"># ✓ Required method implemented: classify
="color:#5c6370;font-style:italic"># ✓ Return type annotation: str
="color:#5c6370;font-style:italic"># All checks passed.
="color:#5c6370;font-style:italic"># Test with a sample document
bizsupply test ./plugin.py --document sample-invoice.pdf
="color:#5c6370;font-style:italic"># ✓ classify() returned: "invoice"
="color:#5c6370;font-style:italic"># ✓ Return type: str (valid)
="color:#5c6370;font-style:italic"># ✓ Execution time: 1.2s
="color:#5c6370;font-style:italic"># Register with the platform
curl -X POST https://api.bizsupply.com/v1/plugins \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "invoice-classifier",
"type": "classification",
"version": "1.0.0",
"description": "Classifies financial documents using LLM analysis.",
"module_path": "invoice_classifier.plugin.InvoiceClassifierPlugin",
"config_schema": {
"confidence_threshold": {
"type": "number",
"default": 0.8,
"description": "Minimum confidence score."
},
"max_content_length": {
"type": "integer",
"default": 6000,
"description": "Max characters of content to send to the LLM."
}
}
}'Key Methods
Classification plugins have access to these platform service methods:
| Method | Signature | Description |
|---|---|---|
| classify | classify(self, document) -> str | Required. Analyzes the document and returns a classification label string. |
| prompt_llm | self.prompt_llm(prompt, model?, temperature?, max_tokens?) | Sends a prompt to the LLM and returns the text response. |
| get_prompt | self.get_prompt(name) -> str | Loads a registered prompt template by name. |
| format_fields_for_prompt | self.format_fields_for_prompt(fields) -> str | Formats ontology fields into a prompt-friendly string. |
| log | self.log(level, message) | Writes to the job execution log (debug, info, warning, error). |
| get_config | self.get_config(key, default?) | Retrieves a pipeline-level configuration value. |
Hierarchical Classification
For complex taxonomies, the platform supports hierarchical classification with up to three levels. The Engine traverses the hierarchy top-down, calling your classifier at each level with a narrowed scope.
Consider this three-level taxonomy:
Level 1: Financial
Level 2: Accounts Payable
Level 3: Invoice
Level 3: Credit Note
Level 3: Debit Note
Level 2: Accounts Receivable
Level 3: Sales Invoice
Level 3: Receipt
Level 1: Legal
Level 2: Contracts
Level 3: Service Agreement
Level 3: NDA
Level 3: Employment ContractThe Engine processes this as follows:
- Level 1 — The Engine calls classify() with the full document. Your plugin returns "financial" or "legal".
- Level 2 — The Engine narrows the taxonomy to the Level 1 result (e.g., Financial) and calls classify() again with the sub-categories. Your plugin returns "accounts_payable" or "accounts_receivable".
- Level 3 — The Engine narrows again to Level 2 and calls classify() a final time. Your plugin returns the specific document type (e.g., "invoice").
At each level the Engine passes the available categories as document.metadata["available_categories"]. Your prompt should reference this list rather than hardcoding categories, so the same plugin works at every level.
Common Mistakes
1. Using the old execute() method
ClassificationPlugin requires classify(), not execute(). The generic execute() method was removed in SDK 1.0.
# WRONG — execute() is not recognized
class MyClassifier(ClassificationPlugin):
def execute(self, document):
return "invoice"
# CORRECT — use classify()
class MyClassifier(ClassificationPlugin):
def classify(self, document) -> str:
return "invoice"2. Returning a list instead of a string
# WRONG — classify() must return a single string
def classify(self, document) -> str:
return ["invoice", "financial"]
# CORRECT — return one label
def classify(self, document) -> str:
return "invoice"3. Missing bizsupply_sdk import
# WRONG — missing import causes registration failure
class MyClassifier(ClassificationPlugin): # NameError
...
# CORRECT — import the base class
from bizsupply_sdk import ClassificationPlugin
class MyClassifier(ClassificationPlugin):
...4. Forgetting await in async context
If your plugin runs in an async pipeline context, ensure you use the synchronous service methods provided by the SDK. The SDK handles async/sync bridging internally — do not wrap calls in await.
# WRONG — prompt_llm is synchronous from the plugin's perspective
result = await self.prompt_llm(prompt)
# CORRECT — call it directly
result = self.prompt_llm(prompt)Next Steps
- Create an Extraction Plugin to pull structured data from classified documents.
- Create a Prompt to manage your classification prompt templates separately.
- Create a Pipeline to wire your classifier into a full document processing flow.
- Create a Benchmark to measure and compare classification accuracy.