State persistence

When a scheduled pipeline restarts — because of a deploy, an OOM, or your process supervisor — you usually don't want to refetch everything from scratch. State persistence keeps a small cursor file on disk that the input reads at boot and writes after every successful batch.

The three persistable cursors

Type	What's tracked	Typical query parameter
`id`	The largest record ID seen so far	`after_id`, `since_id`
`timestamp`	The largest record timestamp seen so far	`updated_after`, `since`
`offset`	Pagination offset	`offset`

Most inputs support a mix. You pick the ones that match the source API's pagination semantics.

httpPolling example

input:
  type: httpPolling
  schedule: "*/10 * * * *"
  endpoint: https://source.example.com/api/events
  dataField: events
  statePersistence:
    timestamp:
      enabled: true
      field: event.timestamp      # which record field is the timestamp
      queryParam: updated_after   # how the source API expects it
    id:
      enabled: true
      field: event.id             # which record field is the ID
      queryParam: after_id
    storagePath: ./.cannectors-state

Field	Meaning
`enabled`	Master switch for this cursor type.
`field`	Dot-notated path to the field on the record.
`queryParam`	The query parameter name the source API expects.
`storagePath`	Directory where state files live. Shared across cursor types.

After each successful batch, Cannectors writes the max-seen values into ./.cannectors-state/<pipeline-name>.json. On the next run, it reads that file and appends the relevant query parameters to the request.

database example

For database inputs, the cursor is a SQL column instead of a query parameter:

input:
  type: database
  connectionStringRef: ${SOURCE_DATABASE_URL}
  query: |
    SELECT id, updated_at, payload
    FROM events
    WHERE updated_at > $1
    ORDER BY updated_at ASC
    LIMIT 500
  parameters:
    - state.lastRunTimestamp
  statePersistence:
    timestamp:
      enabled: true
      field: updated_at
    storagePath: ./.cannectors-state

The $1 placeholder is bound to the state.lastRunTimestamp parameter expression at runtime. On the first run, when no state exists yet, it evaluates to the epoch (1970-01-01T00:00:00Z) so the query returns everything.

Where to put `storagePath`

Environment	Recommended path
Local dev	`./.cannectors-state` (`.gitignore` it)
Single host (systemd)	`/var/lib/cannectors/state`
Kubernetes	A `PersistentVolumeClaim` mounted at e.g. `/state`
Container with ephemeral disk	Mount an attached block volume; never trust the container filesystem

If the storage disappears between runs, the pipeline restarts from scratch — possibly re-processing already-shipped records.

Before v22.6, the storagePath was only read by the input. The executor would save to a different default path, so cursors never made the round-trip cleanly. From v22.6 onwards, the executor shares the input's storage. If you ever see state file written but never read behaviour on an old version, upgrade.

Delivery guarantee and crashes

Cannectors delivers at least once. A record can reach its destination more than once; it should not silently fail to arrive. Plan for that on the receiving side — an idempotent upsert keyed on a business id is the usual answer, not a plain insert.

Two things cause a repeat:

A retry replays the whole batch. When a batch fails and is retried, every record in it is sent again, including those the destination had already accepted before the failure.
A crash between delivery and the state write. The state is saved after the output succeeds, so a process killed in that window restarts from the previous cursor and re-sends the last batch.

The state file itself is written atomically — to a temporary file, then renamed — so a crash leaves either the previous state or the new one, never a half-written file. That holds under SIGKILL, which the lab exercises at several points of a run.

There is no fsync before the rename, so a power loss (as opposed to a process crash) can still leave the file empty or holding the previous version. Cannectors handles an unreadable state file by logging a warning and continuing without it — which means re-reading the source from the beginning. The run still reports success, so monitoring only the pipeline status will not surface it; watch for failed to load state in the logs.

Backfills

To force a fresh full sync, delete the relevant state file:

rm ./.cannectors-state/<pipeline-name>.json

The next run will treat itself as a fresh start. There is no "backfill from date X" knob in the YAML — instead, write to the state file manually if you really need to seed it:

echo '{"timestamp": "2026-01-01T00:00:00Z"}' \
  > ./.cannectors-state/<pipeline-name>.json

State persistence

The three persistable cursors

httpPolling example

database example

Where to put `storagePath`

Delivery guarantee and crashes

Backfills

Cross-references

Scheduling

httpPolling input

database input

On this page

State persistence

The three persistable cursors

httpPolling example

database example

Where to put storagePath

Delivery guarantee and crashes

Backfills

Cross-references

Scheduling

httpPolling input

database input

On this page

Where to put `storagePath`