Skip to main content

Documentation Index

Fetch the complete documentation index at: https://opendata.dev/docs/llms.txt

Use this file to discover all available pages before exploring further.

Buffer is a library with two main components: multiple producers and a single consumer. Producers and the consumer communicate exclusively through a queue manifest in object storage — there is no direct network path between them and no stateful broker to operate.

Producers

Producers accept arbitrary byte entries from callers, buffer them in memory, and periodically flush them as binary data batches to object storage. Each producer exposes a minimal API:
pub async fn produce(
    &self,
    entries: Vec<Bytes>,
    metadata: Bytes,
) -> Result<WriteHandle>
Flushes are triggered either by elapsed time (default: 100 ms) or by batch size (default: 64 MiB). Data batches are binary and optionally compressed. Each batch is named with a ULID, which encodes a millisecond-precision timestamp. Once a batch is flushed, producers append its location and metadata to the queue. The queue is a single binary manifest in object storage that tracks the locations and metadata of data batches in append order. Producers interact with it through a queue producer, which uses compare-and-swap (CAS) writes to ensure that concurrent producers do not overwrite each other’s entries: an append succeeds only if the manifest has not been modified since it was read. On conflict, the queue producer re-reads the manifest and retries. Additionally, the queue producer assigns consecutive sequence numbers to queue entries. The manifest format is designed for append efficiency. Existing entries are never deserialized during append, so appending a new entry is O(1) in the length of the queue. The WriteHandle returned by produce() contains a DurabilityWatcher. Callers can await_durable() to block until the entries have been persisted to object storage and enqueued in the manifest. This avoids data loss during failover, but it does not provide read-your-own-writes consistency — durability refers to the data batch, not to the database write.

Consumer

The consumer is the read-side counterpart to the producers. It iterates over data batches in object storage and delivers them to the database in the order recorded in the queue:
pub async fn next_batch(&mut self) -> Result<Option<ConsumedBatch>>
pub async fn ack(&mut self, sequence: u64) -> Result<()>
A call to next_batch() reads the next location in the queue manifest, fetches the data batch from object storage, and returns the data, metadata, and a unique sequence number to the caller. Sequence numbers are assigned by the queue producer at ingestion time in increasing order without gaps, and they are used both for acknowledgement and for tracking progress. Acknowledgements must be in order: acknowledging a sequence number that is not immediately after the last one is rejected, ensuring that no batch is silently skipped. While multiple producers can write to a queue manifest, only one consumer can read from it. This matches OpenData’s single-writer paradigm and prevents zombie consumers — surviving after an infrastructure failure — from acknowledging batches and making them invisible to the active consumer. Single-consumer exclusivity is enforced by an epoch stored in the queue manifest: when a consumer starts, it reads the epoch, increments it, and writes it back. Reads with a stale epoch are rejected, so any zombie consumer is fenced as soon as a new one takes over. The consumer is also responsible for cleaning up processed data batches. At startup and every 100 acknowledged batches, all entries for acknowledged batches are dequeued from the manifest. A dedicated garbage collector task then deletes the corresponding data batches from object storage. The consumer API supports exactly-once delivery. To achieve it, the writer atomically persists the batch data and the sequence number from next_batch() to the database. If the atomic write succeeds but a failure occurs before ack() returns, a new writer can construct a new consumer with that last persisted sequence number so it resumes right after it. The new consumer also bumps the epoch, fencing the old one and preventing any duplicate writes.

Usage example

For an example of how Buffer can be integrated into an ingestion pipeline see ingest into Timeseries.