-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat(stores)!: implement backend registry pattern for persistence configuration #3697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add StoresConfig to group all store references under persistence.stores - Use single 'default' backend instead of separate metadata_backend/inference_backend - Update resolver to access persistence.stores.{metadata,inference,conversations} - All SQLite distributions now use single store.db file with shared backend
Tests cover critical failure modes: - Backend reference validation catches config errors - Namespace overlay logic for KVStore isolation - Type safety: inference requires SqlStore - Queue parameter preservation - Integration with real distribution configs
- Both SqliteKVStoreConfig and SqliteSqlStoreConfig use type='sqlite' - Pydantic cannot distinguish them in a union - Solution: Custom validator parses backends based on which stores reference them - Metadata store requires KVStore, inference/conversations require SqlStore - Separate kvstore/sqlstore backends in configs for clarity
- Ensures context-aware parsing works correctly - Metadata uses kvstore backend, inference uses sqlstore backend - Both can share same postgres/sqlite config but parsed as correct type
- Add provider_config_resolver to resolve backend references in provider configs - Providers can now reference persistence.backends instead of duplicating kvstore configs - Supports kvstore, metadata_store, persistence_store, responses_store references - Updated ci-tests to use backend references with namespaces - Eliminates massive config duplication across providers
- Replace kvstore/sqlstore backend names with unified 'default' - Context-aware parsing handles KVStore vs SqlStore based on usage - Provider configs use 'backend: default' everywhere - Much cleaner: kvstore { backend: default } instead of { backend: kvstore }
BREAKING CHANGE: Provider config field names changed for semantic clarity - Rename kvstore → persistence for all providers - Simple providers: flat persistence with backend reference - Complex providers (agents): nested persistence.agent_state + persistence.responses - Files provider: metadata_store → persistence - Provider configs now clearly express 'how do I persist?' not 'what type of store?' Example: # Before config: kvstore: backend: kvstore namespace: faiss # After config: persistence: backend: kvstore namespace: faiss # Agents (nested) config: persistence: agent_state: backend: kvstore namespace: agents responses: backend: sqlstore namespace: responses
elif type_val == "mongodb": | ||
return MongoDBKVStoreConfig(**config) | ||
else: | ||
raise ValueError(f"Unknown KVStore type: {type_val}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print available backends too?
elif type_val == "postgres": | ||
return PostgresSqlStoreConfig(**config) | ||
else: | ||
raise ValueError(f"Unknown SqlStore type: {type_val}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print available backends too?
default=None, | ||
description="Inference store configuration (uses SqlStore backend)", | ||
) | ||
conversations: StoreReference | None = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to have a ConversationStoreReference
too, even if empty for now?
|
||
def resolve_inference_store_config( | ||
persistence: PersistenceConfig | None, | ||
) -> tuple[SqliteSqlStoreConfig | PostgresSqlStoreConfig, int, int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid returning int, int
and return an inference_ref
instead? So that we don't change the signature every single time a new field is added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the direction this is going in, but have a general comment about some class structure here now that it seems these StoreConfig
classes are combining functionality more than before.
If this is out of scope of this PR, let's open up a follow up issue to track some of this possible refactoring. Thanks!
return self | ||
|
||
|
||
def _parse_kvstore_config( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am seeing a lot of these _parse_...
Methods . Can we instead have all of these ...StoreConfig
classes inherit from a base class and just do config_type = BaseStoreConfig(config.get("type"))
or something?
Or in the base class we could have a
@classmethod
def from_dict
I just feel like there's probably a more pythonic way to do this.
PostgresKVStoreConfig, | ||
MongoDBKVStoreConfig, | ||
) | ||
SqlStoreConfigTypes = (SqliteSqlStoreConfig, PostgresSqlStoreConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar comment as above, this seems like an indicator we need to combine some of these classes.
Summary
As of today, if you look at a run.yaml file for the Stack, it is littered with several references to sqlite db files. If you had a postgres SQL store, you will actually see something worse: duplicate postgres credentials everywhere. A result of this is also that we keep many connection objects alive to the database. This obviously indicates we haven't quite factored out our persistence layers correctly. This backwards incompat window for 0.3.0 offers us a chance to quickly clean this up.
This PR replaces duplicated store configurations by introducing a notion of a Persistence Backend. Then, each provider or a core stack component can just reference this persistence backend with perhaps a small annotation to indicate a table / sub-namespace to use.
Changes
Migration
Before
After
Testing