Skip to content

Asynchronous Datastores #137

Closed
Closed
@aschmahmann

Description

@aschmahmann

Proposal

  1. Add a Sync(prefix Key) function to the Datastore interface.
    • This function will be a no-op when the datastore is in synchronous mode (the default).
    • Otherwise, Sync(prefix) guarantees that any Put(prefix + ..., value) calls that returned before Sync(prefix) was called will be observed after Sync(prefix) returns, even if the program crashes.
  2. Insert calls to Sync where appropriate (in go-ipfs and go-libp2p).
  3. When ready, turn off sync writes in go-ipfs's datastore (by default). (we'll have an experimental transition with heavy testing)

Notes:

  1. We're not changing the default behavior. Datastores will still write synchronously unless configured not to do so.
  2. Put will either completely put a value or not put a value. Even when sync writes is turned off, the datastore will never be left in a corrupt state.

Motivation

Writing to disk synchronously has poor performance and is rarely necessary.

Poor performance: ipfs add performance is doubled (on linux/ext4) when badger is used and synchronous writes are turned off.

Rarely necessary:

  • The DHT expects some number of nodes to be faulty so losing a few records is usually fine.
  • IPFS only guarantees that blocks are persisted when pinned. There's no reason to sync after every write.
    • Note: For now, we'll likely want to explicitly sync after a full ipfs add as most users have GC turned off and expect the data to be persisted anyways. However, doing this once is cheaper than doing it for every write.
  • The peerstore definitely doesn't need synchronous writes.

Alternatives

  • Create a buffered/batching/async wrapper. This is what go-ipfs currently does but we could do better.
  • Use the "autobatching" datastore.

However:

  1. Buffering/caching isn't easy.
  2. Unlike buffering inside the OS, they can't (easily) respond to memory pressure.
  3. Conversely, they force one to eagerly sync/flush periodically instead of as-needed. The OS knows when we have enough memory to keep writing into memory.

@Stebalien @whyrusleeping @raulk Seem like a reasonable plan?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions