Skip duplicates, or pay for the same document four times a day

A scheduled pipeline without state tracking re-processes the same documents on every run. State tracking is the difference between paying once and paying every six hours.

3 min read

A common pattern: a pipeline runs every six hours, points at the same source, and processes whatever it finds. Without state tracking, "whatever it finds" includes the same hundred documents it found six hours ago, and six hours before that. Four runs a day, the same hundred documents, five credits per extraction — that is two thousand credits a day spent confirming nothing changed.

Two layers of protection

bizSupply gives you two independent ways to avoid this, and you should use both:

  • Source-level state tracking. A well-written source plugin records its position (last UID, last run timestamp, last cursor) and only fetches documents added since then. Most built-in connectors do this; custom plugins should follow thehas_new_data() + state-update pattern from the docs.
  • Pipeline-level skip_duplicates: true. A safety net that prevents reprocessing documents the platform has already seen, even if the source plugin returns them again.
Schedules amplify mistakes
A pipeline run on demand and a pipeline run on a cron behave very differently. A bug that costs you 500 credits when triggered manually costs you 60,000 credits a month at four-times-a-day frequency. Set up state tracking before you set up the schedule.
Still need help?
If this article does not solve it, the bizSupply team is one ticket away.
Submit a ticket