cannectors

State management

Durable cursor storage, restart safety, backfills.

State persistence keeps a small per-pipeline JSON file on disk. The input reads it at startup and writes it after every successful batch — see State persistence for the YAML config. This page is about operating it.

Storage location

EnvironmentRecommended storagePath
Local dev./.cannectors-state (add to .gitignore)
Single host, systemd/var/lib/cannectors/state (owned by the cannectors user)
KubernetesA PersistentVolumeClaim mounted at e.g. /state
Fly.io / RailwayAn attached volume mounted at /state
Container with ephemeral diskDon't. State must survive container replacement.

If the storage disappears between runs, the pipeline restarts from scratch — possibly re-processing already-shipped records.

What the file looks like

One file per pipeline, named after pipeline.name:

/var/lib/cannectors/state/sync-orders.json
{
  "id": 4821,
  "timestamp": "2026-04-21T12:34:56Z"
}

Only the cursor types your statePersistence config enables show up. Don't hand-edit while the pipeline is running.

Backups

The state file is small (< 1 KB). Snapshot it with whatever you use for the rest of the volume:

# Linux, daily
sudo cp /var/lib/cannectors/state/*.json /backup/cannectors-state/$(date +%F)/

For Kubernetes, a VolumeSnapshot on the PVC works the same way.

Restoring a state file is just dropping it back in place — the next run reads it.

Backfills

To re-process everything from scratch, delete the state file:

sudo systemctl stop cannectors-orders
sudo rm /var/lib/cannectors/state/sync-orders.json
sudo systemctl start cannectors-orders

To re-process from a specific point, seed the state file:

sudo systemctl stop cannectors-orders
sudo tee /var/lib/cannectors/state/sync-orders.json <<'EOF'
{ "timestamp": "2026-01-01T00:00:00Z" }
EOF
sudo chown cannectors:cannectors /var/lib/cannectors/state/sync-orders.json
sudo systemctl start cannectors-orders

Monitoring

Two things are worth alerting on:

  • State file age — if the file hasn't been touched in N CRON ticks, the pipeline isn't making progress. stat -c %Y state-file.json gives the mtime in seconds.
  • State file growth — if the cursor value isn't advancing, the source might be returning empty pages even though new data exists. Diff between two snapshots tells you.

Idempotence at the destination

State persistence is best-effort restart-safety, not exactly-once delivery. After a crash mid-batch, the runtime re-fetches and re-sends some records that the destination already saw. Design your destination to handle that — INSERT … ON CONFLICT DO UPDATE, idempotency keys, or a dedup layer.

See also