-
Notifications
You must be signed in to change notification settings - Fork 30
fix: update jsonschema constraint to allow 4.20.0+ for fastmcp compatibility #737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ibility - Changed jsonschema constraint from ~4.17.3 to >=4.17.3,<5.0 - Maintains broad compatibility while unblocking fastmcp requirements - Regenerated poetry.lock with new constraint Co-Authored-By: AJ Steers <[email protected]>
Original prompt from AJ Steers
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1756425696-jsonschema-version-pin#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1756425696-jsonschema-version-pin Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the jsonschema dependency constraint to allow compatibility with fastmcp while maintaining backward compatibility. The change relaxes the version constraint from ~4.17.3
to >=4.17.3,<5.0
, enabling jsonschema versions 4.20.0+ that fastmcp requires.
Key changes:
- Updated jsonschema version constraint to allow broader version range
- Removed restrictive comment about 4.18 breaking changes
- Maintained upper bound to avoid potential 5.x breaking changes
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughDependency constraints updated to jsonschema >=4.17.3,<5.0 and added referencing >=0.36.2. Replaced RefResolver-based $ref handling with referencing Registry/Resolver across schema helpers, transform normalization, and spec schema transformations. Updated manifest_server Dockerfile to Poetry 2.0.1. Minor unit test adjustments and added a my_type definition in a test schema. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller as Normalization/Spec Transform
participant Helpers as SchemaHelpers
participant Ref as referencing.Registry/Resolver
participant Schema as JSON Schema
Caller->>Helpers: expand_refs(schema)
Helpers->>Ref: get_ref_resolver_registry(schema).resolver()
loop For each $ref in schema
Helpers->>Ref: lookup(ref_url)
Ref-->>Helpers: resolved.contents
Helpers->>Helpers: recursively expand nested $refs
Helpers->>Schema: merge resolved keys into parent
end
Helpers->>Schema: remove root "definitions"
Helpers-->>Caller: expanded schema
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (1)
pyproject.toml
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-intercom
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: SDM Docker Image Build
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
pyproject.toml (1)
46-46
: Direction LGTM—range unlocks fastmcp while staying <5.0.Allowing 4.x broadly makes sense given the fastmcp requirement, and keeping the <5 guard is prudent. Nice. wdyt?
…hema 4.18+ compatibility - Replace RefResolver.from_schema() with Registry().with_resource() pattern - Update resolver.resolve() calls to use resolver.lookup().contents - Addresses jsonschema 4.18+ deprecation warnings - Maintains backward compatibility with existing functionality Co-Authored-By: AJ Steers <[email protected]>
- Auto-format YAML and Markdown files per project standards - No functional changes, only formatting consistency Co-Authored-By: AJ Steers <[email protected]>
- Added referencing>=0.30.0 dependency to pyproject.toml - Required for jsonschema 4.18+ RefResolver migration - Regenerated poetry.lock with new dependency Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (4)
airbyte_cdk/manifest_migrations/README.md (1)
24-24
: Prefer backticks for literal asterisk.To avoid escaping artifacts and improve readability, would you render the wildcard as
*
instead of "*", wdyt?airbyte_cdk/sources/utils/schema_helpers.py (2)
67-77
: Minor: rename param for clarity and guard resolver creation.Since this now takes a Registry, would renaming
ref_resolver
toregistry
reduce confusion? Also, ifschema
is not a dict,Resource.from_contents(schema)
could be surprising. Do you want to assertisinstance(schema, MutableMapping)
before building the resource to fail fast with a clearer message, wdyt?
80-86
: Resolution error handling.If a
$ref
cannot be found,resolver.lookup(ref_url)
will raise; previously we surfacedRefResolutionError
. Do you want to catchException
here and raise anAirbyteTracedException
with context (ref string, path) to avoid leaking schema internals to logs, wdyt?airbyte_cdk/utils/spec_schema_transformations.py (1)
17-25
: OK on switching to referencing; consider a defensive fallback.The new resolver path looks good. For resilience, do you want to add a narrow try/except around
resolver.lookup(ref)
to emit a clearer error (include the missing ref) or optionally fall back to returning the original$ref
block when resolution fails, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (8)
.github/actions/check-docker-tag/action.yml
(1 hunks).github/workflows/pytest_matrix.yml
(1 hunks).github/workflows/test-command.yml
(1 hunks)airbyte_cdk/manifest_migrations/README.md
(1 hunks)airbyte_cdk/manifest_server/openapi.yaml
(14 hunks)airbyte_cdk/sources/utils/schema_helpers.py
(2 hunks)airbyte_cdk/sources/utils/transform.py
(2 hunks)airbyte_cdk/utils/spec_schema_transformations.py
(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- .github/workflows/test-command.yml
- .github/actions/check-docker-tag/action.yml
- .github/workflows/pytest_matrix.yml
🧰 Additional context used
🧬 Code graph analysis (2)
airbyte_cdk/utils/spec_schema_transformations.py (2)
airbyte_cdk/sources/file_based/config/abstract_file_based_spec.py (1)
schema
(88-98)airbyte_cdk/sources/config.py (1)
schema
(21-27)
airbyte_cdk/sources/utils/schema_helpers.py (4)
airbyte_cdk/sources/file_based/config/abstract_file_based_spec.py (1)
schema
(88-98)airbyte_cdk/sources/config.py (1)
schema
(21-27)airbyte_cdk/destinations/vector_db_based/config.py (1)
schema
(293-298)unit_tests/destinations/vector_db_based/config_test.py (1)
schema
(64-69)
🪛 GitHub Actions: Dependency Analysis
airbyte_cdk/sources/utils/transform.py
[error] 16-16: DEP003: 'referencing' imported but it is a transitive dependency. Command: poetry run deptry .
airbyte_cdk/utils/spec_schema_transformations.py
[error] 9-9: DEP003: 'referencing' imported but it is a transitive dependency. Command: poetry run deptry .
airbyte_cdk/sources/utils/schema_helpers.py
[error] 16-16: DEP003: 'referencing' imported but it is a transitive dependency. Command: poetry run deptry .
🪛 GitHub Actions: Test Connectors
airbyte_cdk/sources/utils/schema_helpers.py
[error] 16-16: ModuleNotFoundError: No module named 'referencing' (required by airbyte_cdk.sources.utils.schema_helpers).
🪛 Checkov (3.2.334)
airbyte_cdk/manifest_server/openapi.yaml
[MEDIUM] 197-202: Ensure that arrays have a maximum number of items
(CKV_OPENAPI_21)
🪛 LanguageTool
airbyte_cdk/manifest_migrations/README.md
[grammar] ~24-24: There might be a mistake here.=1.2", ">=1.0.0,<2.0.0", "6.48.3" - Each migration entry should include: ...
Context: ...48.3", "
(QB_NEW_EN)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
🔇 Additional comments (1)
airbyte_cdk/manifest_server/openapi.yaml (1)
352-362
: I’ve requested a check of the YAML snippet to confirm whether the integer fields still use float-valued constraints. Once I have the output, I’ll update the comment accordingly.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
- Update constraint to >=4.18.0,<5.0 as suggested by @aaronsteers - Remove all backward compatibility code for 4.17.x - Use referencing library exclusively with DRAFT7 default specification - Fixes CannotDetermineSpecification error for schemas without property Co-Authored-By: AJ Steers <[email protected]>
You're absolutely right about the cleaner approach! I've updated the implementation: Technical constraint discovered: The referencing library itself requires jsonschema 4.18+ (it's not available in 4.17.x), so backward compatibility with 4.17.x is technically impossible anyway. Changes made:
The implementation is now much cleaner and the constraint accurately reflects the technical reality that the referencing library requires 4.18+. |
Co-Authored-By: AJ Steers <[email protected]>
…n transform.py - Update Poetry version from 1.8.3 to 2.0.1 in manifest server Dockerfile for consistency with local development - Fix transform.py resolve function to properly handle both new referencing library and legacy RefResolver - Addresses CI failures in 'Manifest Server Docker Image Build' and pytest transform tests Co-Authored-By: AJ Steers <[email protected]>
…schema-version-pin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
airbyte_cdk/sources/utils/transform.py (1)
170-177
: Fix item type handling for arrays (string -> set of chars bug).
set(subschema.get("items", {}).get("type", set()))
turns"string"
into{'s','t',...}
and can misclassify arrays. Can we normalize the "type" field first, wdyt?- elif target_type == "array": - item_types = set(subschema.get("items", {}).get("type", set())) + elif target_type == "array": + raw_item_type = subschema.get("items", {}).get("type") + if isinstance(raw_item_type, str): + item_types = {raw_item_type} + elif isinstance(raw_item_type, list): + item_types = set(raw_item_type) + else: + item_types = set() if ( - item_types.issubset(json_to_python_simple) + item_types.issubset(set(json_to_python_simple.keys())) and type(original_item) in json_to_python_simple.values() ): return [original_item]
♻️ Duplicate comments (1)
airbyte_cdk/sources/utils/schema_helpers.py (1)
70-83
: LGTM: central registry factory simplifies resolver usage. Also, double-check deps are declared.The
Registry
builder looks good and keeps call sites clean. Can we verifyreferencing
is a direct dependency and that thejsonschema
floor aligns with this usage, wdyt?#!/bin/bash # Verify pyproject declares direct deps and compatible ranges rg -n '^jsonschema\s*=' pyproject.toml rg -n '^referencing\s*=' pyproject.toml # Sanity: ensure lock resolves to <5.0 rg -n 'jsonschema' poetry.lock -n -C1 | head -n 20
🧹 Nitpick comments (7)
unit_tests/sources/utils/test_transform.py (1)
23-26
: Cover the new $ref path with an explicit test case?You added
#/definitions/my_type
and wired it intodef.properties.dd
, but none of the parametrized cases exercisedef.dd
. Adding a minimal case will prove the expand_refs path works end-to-end, wdyt?Example addition to the parametrization (outside this hunk):
@@ [ (SIMPLE_SCHEMA, {"value": 12}, {"value": "12"}, None), + # $ref to definitions/my_type + ( + COMPLEX_SCHEMA, + {"def": {"dd": "abc"}}, + {"def": {"dd": "abc"}}, + None, + ),airbyte_cdk/sources/utils/transform.py (3)
20-23
: Narrow the import exception and avoid runtime dependency for typing-only imports?Catching all exceptions is too broad, and
Validator
is only used for typing here. Shall we gate it under TYPE_CHECKING and catch ImportError otherwise, wdyt?-try: - from jsonschema.validators import Validator -except: - from jsonschema import Validator +if TYPE_CHECKING: + try: + from jsonschema.validators import Validator # type: ignore + except ImportError: + from jsonschema import Validator # type: ignore
209-215
: Avoid re-expanding refs on every validator callback; expand once up-front.Calling
expand_refs(schema)
(and again onproperty_value
) inside each callback is redundant and mutates schemas repeatedly. Expanding once intransform()
on a deepcopy improves perf and avoids side effects, wdyt?Apply these changes:
@@ def normalizator(...): - # Very first step is to expand $refs in the schema itself: - expand_refs(schema) - - # Now we can expand $refs in the property value: - if isinstance(property_value, dict): - expand_refs(property_value) + # Refs are expanded once in transform(); nothing to do here. # Now we can validate and normalize the values:And in
transform()
:@@ def transform(self, record, schema): - normalizer = self._normalizer(schema) + # Work on an expanded copy to avoid mutating caller-provided schemas. + expanded_schema = deepcopy(cast(Dict[str, Any], schema)) + expand_refs(expanded_schema) + normalizer = self._normalizer(expanded_schema)
96-105
: Drop "$ref" from custom validator map to shave overhead?Since refs are expanded prior to validation, the
$ref
keyword won’t appear during traversal. Removing it fromall_validators
avoids needless callback wiring, wdyt?- if key in ["type", "array", "$ref", "properties", "items"] + if key in ["type", "array", "properties", "items"]airbyte_cdk/sources/utils/schema_helpers.py (3)
18-19
: Avoid importing private symbols at runtime.
from referencing._core import Resolver
reaches into a private module. Since it’s only for typing, shall we import it under TYPE_CHECKING and annotate as a string to avoid runtime coupling, wdyt?-from referencing._core import Resolver # used for type hints +if TYPE_CHECKING: # type-only; avoid runtime import of private module + from referencing._core import Resolver # pragma: no cover
85-104
: Preserve sibling keyword precedence when expanding $ref.Merging referenced keys over existing ones (
schema.update(definition)
) can override siblings. JSON Schema semantics are effectively “$ref with siblings” ≈ allOf([ref, siblings]); in a merge model, siblings should take precedence. Shall we flip the merge to prefer siblings, wdyt?- if "$ref" in schema: - ref_url = schema.pop("$ref") - definition = ref_resolver.lookup(ref_url).contents - _expand_refs( - definition, ref_resolver=ref_resolver - ) # expand refs in definitions as well - schema.update(definition) + if "$ref" in schema: + ref_url = schema.pop("$ref") + definition = ref_resolver.lookup(ref_url).contents + _expand_refs(definition, ref_resolver=ref_resolver) + # Merge so that sibling keywords in 'schema' override referenced keys. + siblings = dict(schema) # after popping $ref + schema.clear() + schema.update({**definition, **siblings})Would you also like a focused test ensuring that a sibling like
{"$ref": "#/definitions/Foo", "minLength": 1}
preservesminLength
after expansion, wdyt?
106-117
: Optional: enrich error reporting for unresolved refs.Catching
Unresolvable
and surfacing the failing$ref
with context would speed up debugging. Interested in a thin wrapper that adds the ref URL and a short hint, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (7)
airbyte_cdk/manifest_server/Dockerfile
(2 hunks)airbyte_cdk/sources/utils/schema_helpers.py
(3 hunks)airbyte_cdk/sources/utils/transform.py
(2 hunks)airbyte_cdk/utils/spec_schema_transformations.py
(1 hunks)pyproject.toml
(1 hunks)unit_tests/connector_builder/test_connector_builder_handler.py
(1 hunks)unit_tests/sources/utils/test_transform.py
(2 hunks)
✅ Files skipped from review due to trivial changes (2)
- unit_tests/connector_builder/test_connector_builder_handler.py
- airbyte_cdk/manifest_server/Dockerfile
🚧 Files skipped from review as they are similar to previous changes (2)
- pyproject.toml
- airbyte_cdk/utils/spec_schema_transformations.py
🧰 Additional context used
🧬 Code graph analysis (2)
airbyte_cdk/sources/utils/transform.py (1)
airbyte_cdk/sources/utils/schema_helpers.py (2)
expand_refs
(106-116)get_ref_resolver_registry
(70-82)
airbyte_cdk/sources/utils/schema_helpers.py (4)
airbyte_cdk/sources/file_based/config/abstract_file_based_spec.py (1)
schema
(88-98)airbyte_cdk/sources/config.py (1)
schema
(21-27)airbyte_cdk/destinations/vector_db_based/config.py (1)
schema
(293-298)unit_tests/destinations/vector_db_based/config_test.py (1)
schema
(64-69)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-shopify
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
🔇 Additional comments (1)
unit_tests/sources/utils/test_transform.py (1)
36-36
: LGTM: local definition completes the new reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, LGTM!
fix: update jsonschema constraint to allow 4.20.0+ for fastmcp compatibility
Summary
Updated the jsonschema version constraint from
~4.17.3
to>=4.17.3,<5.0
to unblock fastmcp requirements while maintaining broad compatibility. This change allows jsonschema versions 4.20.0+ that fastmcp requires while avoiding potential breaking changes in 5.x.The poetry lock file was regenerated and now resolves to jsonschema 4.25.1, which represents a significant version jump from the previous 4.17.3. The lock file also shows that the jsonschema package dependencies have changed significantly -
pyrsistent
was removed and new packagesjsonschema-specifications
,referencing
, andrpds-py
were added.Notes
Link to Devin session: https://app.devin.ai/sessions/5a4dd7c708c64eb2bbc2d42fb688c9a7
Requested by: @aaronsteers
Summary by CodeRabbit