Skip to content

Conversation

ashwinb
Copy link
Contributor

@ashwinb ashwinb commented Oct 6, 2025

Summary

As of today, if you look at a run.yaml file for the Stack, it is littered with several references to sqlite db files. If you had a postgres SQL store, you will actually see something worse: duplicate postgres credentials everywhere. A result of this is also that we keep many connection objects alive to the database. This obviously indicates we haven't quite factored out our persistence layers correctly. This backwards incompat window for 0.3.0 offers us a chance to quickly clean this up.

This PR replaces duplicated store configurations by introducing a notion of a Persistence Backend. Then, each provider or a core stack component can just reference this persistence backend with perhaps a small annotation to indicate a table / sub-namespace to use.

Changes

Migration

Before

metadata_store:
  type: sqlite
  db_path: .../registry.db
inference_store:
  type: sqlite
  db_path: .../inference.db

After

persistence:
  backends:
    kvstore:
      type: sqlite
      db_path: .../kvstore.db
    sqlstore:
      type: sqlite  
      db_path: .../sqlstore.db
  stores:
    metadata:
      backend: kvstore
    inference:
      backend: sqlstore

Testing

  • Added unit tests for config validation and backend resolution
  • Integration tests verify real distribution configs load correctly
  • Server builds and starts successfully with new config structure

ashwinb added 16 commits October 5, 2025 13:08
- Add StoresConfig to group all store references under persistence.stores
- Use single 'default' backend instead of separate metadata_backend/inference_backend
- Update resolver to access persistence.stores.{metadata,inference,conversations}
- All SQLite distributions now use single store.db file with shared backend
Tests cover critical failure modes:
- Backend reference validation catches config errors
- Namespace overlay logic for KVStore isolation
- Type safety: inference requires SqlStore
- Queue parameter preservation
- Integration with real distribution configs
- Both SqliteKVStoreConfig and SqliteSqlStoreConfig use type='sqlite'
- Pydantic cannot distinguish them in a union
- Solution: Custom validator parses backends based on which stores reference them
- Metadata store requires KVStore, inference/conversations require SqlStore
- Separate kvstore/sqlstore backends in configs for clarity
- Ensures context-aware parsing works correctly
- Metadata uses kvstore backend, inference uses sqlstore backend
- Both can share same postgres/sqlite config but parsed as correct type
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 6, 2025
- Add provider_config_resolver to resolve backend references in provider configs
- Providers can now reference persistence.backends instead of duplicating kvstore configs
- Supports kvstore, metadata_store, persistence_store, responses_store references
- Updated ci-tests to use backend references with namespaces
- Eliminates massive config duplication across providers
- Replace kvstore/sqlstore backend names with unified 'default'
- Context-aware parsing handles KVStore vs SqlStore based on usage
- Provider configs use 'backend: default' everywhere
- Much cleaner: kvstore { backend: default } instead of { backend: kvstore }
BREAKING CHANGE: Provider config field names changed for semantic clarity

- Rename kvstore → persistence for all providers
- Simple providers: flat persistence with backend reference
- Complex providers (agents): nested persistence.agent_state + persistence.responses
- Files provider: metadata_store → persistence
- Provider configs now clearly express 'how do I persist?' not 'what type of store?'

Example:
  # Before
  config:
    kvstore:
      backend: kvstore
      namespace: faiss

  # After
  config:
    persistence:
      backend: kvstore
      namespace: faiss

  # Agents (nested)
  config:
    persistence:
      agent_state:
        backend: kvstore
        namespace: agents
      responses:
        backend: sqlstore
        namespace: responses
elif type_val == "mongodb":
return MongoDBKVStoreConfig(**config)
else:
raise ValueError(f"Unknown KVStore type: {type_val}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print available backends too?

elif type_val == "postgres":
return PostgresSqlStoreConfig(**config)
else:
raise ValueError(f"Unknown SqlStore type: {type_val}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print available backends too?

default=None,
description="Inference store configuration (uses SqlStore backend)",
)
conversations: StoreReference | None = Field(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have a ConversationStoreReference too, even if empty for now?


def resolve_inference_store_config(
persistence: PersistenceConfig | None,
) -> tuple[SqliteSqlStoreConfig | PostgresSqlStoreConfig, int, int]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid returning int, int and return an inference_ref instead? So that we don't change the signature every single time a new field is added?

Copy link
Contributor

@cdoern cdoern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the direction this is going in, but have a general comment about some class structure here now that it seems these StoreConfig classes are combining functionality more than before.

If this is out of scope of this PR, let's open up a follow up issue to track some of this possible refactoring. Thanks!

return self


def _parse_kvstore_config(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am seeing a lot of these _parse_... Methods . Can we instead have all of these ...StoreConfig classes inherit from a base class and just do config_type = BaseStoreConfig(config.get("type")) or something?

Or in the base class we could have a

@classmethod
    def from_dict

I just feel like there's probably a more pythonic way to do this.

PostgresKVStoreConfig,
MongoDBKVStoreConfig,
)
SqlStoreConfigTypes = (SqliteSqlStoreConfig, PostgresSqlStoreConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comment as above, this seems like an indicator we need to combine some of these classes.

@ashwinb
Copy link
Contributor Author

ashwinb commented Oct 6, 2025

Thanks for the reviews @leseb, @cdoern... will act on them, and taking PR to draft stage while I clean it up.

@ashwinb ashwinb marked this pull request as draft October 6, 2025 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants