Create a Classification Plugin

Build a classification plugin that categorizes documents with labels using LLM-powered analysis and hierarchical taxonomies.

Last updated: 2026-04-06

Classification plugins analyze document content and assign one or more category labels. They are the first processing stage in most pipelines — downstream extraction and aggregation steps depend on accurate classification to select the correct ontology and field definitions.


What Classification Plugins Do

A classification plugin receives a Document object and returns a string label that categorizes the document. This label determines which extraction ontology the platform applies next. For example, a document classified as "invoice" routes to the invoice ontology, while a "contract" routes to the contract ontology.

  • Categorize documents into predefined types (invoice, receipt, contract, purchase order, etc.)
  • Support hierarchical classification with multi-level taxonomies
  • Use LLM analysis, rule-based logic, or a combination of both
  • Return a single string label that maps to a registered ontology taxonomy

Prerequisites

  1. Python 3.10+ installed
  2. bizSupply SDK installed: pip install bizsupply-sdk
  3. A valid API key with plugin registration permissions
  4. A registered ontology with taxonomies matching your classification labels (optional but recommended)

Step 1 — Write the Plugin Code

Create a new Python file and implement your classification logic by extending ClassificationPlugin. The classify() method is the only required method — it receives a Document and must return a string.

invoice_classifier/plugin.pypython
from bizsupply_sdk import ClassificationPlugin, PluginError


class InvoiceClassifierPlugin(ClassificationPlugin):
    """Classifies financial documents into invoice, purchase_order,
    receipt, contract, or unknown."""

    name = "invoice-classifier"
    version = "1.0.0"
    description = "Classifies financial documents using LLM analysis."

    # Configurable parameters
    confidence_threshold: float = 0.8
    max_content_length: int = 6000

    VALID_TYPES = {
        "invoice", "purchase_order", "receipt",
        "contract", "credit_note", "unknown",
    }

    def classify(self, document) -> str:
        """
        Analyze the document and return a classification label.

        Args:
            document: Document object with .content, .filename,
                      .mime_type, and .metadata attributes.

        Returns:
            A string label (e.g., "invoice").
        """
        if not document.content or not document.content.strip():
            raise PluginError(
                "Document has no extractable text content.",
                retryable=False,
            )

        # Retrieve the classification prompt from the platform
        prompt_template = self.get_prompt("financial-doc-classifier")

        # Build the final prompt with document context
        prompt = prompt_template.replace(
            "{{DOCUMENT_CONTENT}}",
            document.content[:self.max_content_length],
        ).replace(
            "{{FILENAME}}",
            document.filename,
        )

        # Call the LLM
        result = self.prompt_llm(prompt, temperature=0.1)
        doc_type = result.strip().lower().replace(" ", "_")

        # Validate the result against known types
        if doc_type not in self.VALID_TYPES:
            self.log("warning", f"LLM returned unknown type '{doc_type}', defaulting to 'unknown'.")
            return "unknown"

        self.log("info", f"Classified '{document.filename}' as '{doc_type}'.")
        return doc_type
💡Tip

Use self.get_prompt() to load reusable prompt templates instead of hardcoding prompts in your plugin. This lets you update prompts without redeploying code. See the Create a Prompt guide for details.


Step 2 — Create a Classification Prompt

Register a reusable prompt template that your plugin will load at runtime. This keeps prompt engineering separate from plugin code.

bash
curl -X POST https://api.bizsupply.com/v1/prompts \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "financial-doc-classifier",
    "scope": "tenant",
    "template": "You are a document classification expert.\n\nAnalyze the following document and classify it as exactly one of:\n- invoice\n- purchase_order\n- receipt\n- contract\n- credit_note\n- unknown\n\nDocument filename: {{FILENAME}}\n\nDocument content:\n{{DOCUMENT_CONTENT}}\n\nRespond with ONLY the document type label. No explanation, no punctuation."
  }'

Step 3 — Validate and Register

Validate your plugin locally, then register it with the platform.

bash
="color:#5c6370;font-style:italic"># Validate the plugin structure
bizsupply validate ./plugin.py
="color:#5c6370;font-style:italic"># ✓ Plugin class found: InvoiceClassifierPlugin
="color:#5c6370;font-style:italic"># ✓ Base class: ClassificationPlugin
="color:#5c6370;font-style:italic"># ✓ Required method implemented: classify
="color:#5c6370;font-style:italic"># ✓ Return type annotation: str
="color:#5c6370;font-style:italic"># All checks passed.

="color:#5c6370;font-style:italic"># Test with a sample document
bizsupply test ./plugin.py --document sample-invoice.pdf
="color:#5c6370;font-style:italic"># ✓ classify() returned: "invoice"
="color:#5c6370;font-style:italic"># ✓ Return type: str (valid)
="color:#5c6370;font-style:italic"># ✓ Execution time: 1.2s

="color:#5c6370;font-style:italic"># Register with the platform
curl -X POST https://api.bizsupply.com/v1/plugins \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "invoice-classifier",
    "type": "classification",
    "version": "1.0.0",
    "description": "Classifies financial documents using LLM analysis.",
    "module_path": "invoice_classifier.plugin.InvoiceClassifierPlugin",
    "config_schema": {
      "confidence_threshold": {
        "type": "number",
        "default": 0.8,
        "description": "Minimum confidence score."
      },
      "max_content_length": {
        "type": "integer",
        "default": 6000,
        "description": "Max characters of content to send to the LLM."
      }
    }
  }'

Key Methods

Classification plugins have access to these platform service methods:

MethodSignatureDescription
classifyclassify(self, document) -> strRequired. Analyzes the document and returns a classification label string.
prompt_llmself.prompt_llm(prompt, model?, temperature?, max_tokens?)Sends a prompt to the LLM and returns the text response.
get_promptself.get_prompt(name) -> strLoads a registered prompt template by name.
format_fields_for_promptself.format_fields_for_prompt(fields) -> strFormats ontology fields into a prompt-friendly string.
logself.log(level, message)Writes to the job execution log (debug, info, warning, error).
get_configself.get_config(key, default?)Retrieves a pipeline-level configuration value.

Hierarchical Classification

For complex taxonomies, the platform supports hierarchical classification with up to three levels. The Engine traverses the hierarchy top-down, calling your classifier at each level with a narrowed scope.

Consider this three-level taxonomy:

text
Level 1: Financial
  Level 2: Accounts Payable
    Level 3: Invoice
    Level 3: Credit Note
    Level 3: Debit Note
  Level 2: Accounts Receivable
    Level 3: Sales Invoice
    Level 3: Receipt
Level 1: Legal
  Level 2: Contracts
    Level 3: Service Agreement
    Level 3: NDA
    Level 3: Employment Contract

The Engine processes this as follows:

  1. Level 1 — The Engine calls classify() with the full document. Your plugin returns "financial" or "legal".
  2. Level 2 — The Engine narrows the taxonomy to the Level 1 result (e.g., Financial) and calls classify() again with the sub-categories. Your plugin returns "accounts_payable" or "accounts_receivable".
  3. Level 3 — The Engine narrows again to Level 2 and calls classify() a final time. Your plugin returns the specific document type (e.g., "invoice").
ℹ️Note

At each level the Engine passes the available categories as document.metadata["available_categories"]. Your prompt should reference this list rather than hardcoding categories, so the same plugin works at every level.


Common Mistakes

1. Using the old execute() method

ClassificationPlugin requires classify(), not execute(). The generic execute() method was removed in SDK 1.0.

python
# WRONG — execute() is not recognized
class MyClassifier(ClassificationPlugin):
    def execute(self, document):
        return "invoice"

# CORRECT — use classify()
class MyClassifier(ClassificationPlugin):
    def classify(self, document) -> str:
        return "invoice"

2. Returning a list instead of a string

python
# WRONG — classify() must return a single string
def classify(self, document) -> str:
    return ["invoice", "financial"]

# CORRECT — return one label
def classify(self, document) -> str:
    return "invoice"

3. Missing bizsupply_sdk import

python
# WRONG — missing import causes registration failure
class MyClassifier(ClassificationPlugin):  # NameError
    ...

# CORRECT — import the base class
from bizsupply_sdk import ClassificationPlugin

class MyClassifier(ClassificationPlugin):
    ...

4. Forgetting await in async context

If your plugin runs in an async pipeline context, ensure you use the synchronous service methods provided by the SDK. The SDK handles async/sync bridging internally — do not wrap calls in await.

python
# WRONG — prompt_llm is synchronous from the plugin's perspective
result = await self.prompt_llm(prompt)

# CORRECT — call it directly
result = self.prompt_llm(prompt)

Next Steps

  • Create an Extraction Plugin to pull structured data from classified documents.
  • Create a Prompt to manage your classification prompt templates separately.
  • Create a Pipeline to wire your classifier into a full document processing flow.
  • Create a Benchmark to measure and compare classification accuracy.