Skip to content

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented Aug 29, 2025

fix: update jsonschema constraint to allow 4.20.0+ for fastmcp compatibility

Summary

Updated the jsonschema version constraint from ~4.17.3 to >=4.17.3,<5.0 to unblock fastmcp requirements while maintaining broad compatibility. This change allows jsonschema versions 4.20.0+ that fastmcp requires while avoiding potential breaking changes in 5.x.

The poetry lock file was regenerated and now resolves to jsonschema 4.25.1, which represents a significant version jump from the previous 4.17.3. The lock file also shows that the jsonschema package dependencies have changed significantly - pyrsistent was removed and new packages jsonschema-specifications, referencing, and rpds-py were added.

Notes


Link to Devin session: https://app.devin.ai/sessions/5a4dd7c708c64eb2bbc2d42fb688c9a7
Requested by: @aaronsteers

Summary by CodeRabbit

  • New Features
    • More robust JSON Schema $ref handling by expanding references upfront for normalization and spec transformations.
  • Bug Fixes
    • Improved reliability for nested and definition-based $ref resolution in complex schemas.
  • Refactor
    • Switched internal $ref resolution to a new library with cleaner expansion semantics; no public API changes.
  • Chores
    • Broadened jsonschema compatibility to <5.0 and added a new dependency.
    • Updated manifest server to Poetry 2.0.1.
  • Tests
    • Expanded test coverage for $ref expansion and complex schema scenarios.

…ibility

- Changed jsonschema constraint from ~4.17.3 to >=4.17.3,<5.0
- Maintains broad compatibility while unblocking fastmcp requirements
- Regenerated poetry.lock with new constraint

Co-Authored-By: AJ Steers <[email protected]>
@Copilot Copilot AI review requested due to automatic review settings August 29, 2025 00:10
Copy link
Contributor

Original prompt from AJ Steers
@Devin - Can you take care of this version pin issue for the python CDK for jsonschema library:

jsonschema = "~4.17.3"  # 4.18 has some significant breaking changes: <https://github.com/python-jsonschema/jsonschema/releases/tag/v4.18.0>

Copy link
Contributor

devin-ai-integration bot commented Aug 29, 2025

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions github-actions bot added bug Something isn't working security labels Aug 29, 2025
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1756425696-jsonschema-version-pin#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1756425696-jsonschema-version-pin

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the jsonschema dependency constraint to allow compatibility with fastmcp while maintaining backward compatibility. The change relaxes the version constraint from ~4.17.3 to >=4.17.3,<5.0, enabling jsonschema versions 4.20.0+ that fastmcp requires.

Key changes:

  • Updated jsonschema version constraint to allow broader version range
  • Removed restrictive comment about 4.18 breaking changes
  • Maintained upper bound to avoid potential 5.x breaking changes

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

coderabbitai bot commented Aug 29, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Dependency constraints updated to jsonschema >=4.17.3,<5.0 and added referencing >=0.36.2. Replaced RefResolver-based $ref handling with referencing Registry/Resolver across schema helpers, transform normalization, and spec schema transformations. Updated manifest_server Dockerfile to Poetry 2.0.1. Minor unit test adjustments and added a my_type definition in a test schema.

Changes

Cohort / File(s) Summary
Dependencies
pyproject.toml
Broadened jsonschema to >=4.17.3,<5.0; added referencing >=0.36.2.
$ref resolution refactor
airbyte_cdk/sources/utils/schema_helpers.py, airbyte_cdk/sources/utils/transform.py, airbyte_cdk/utils/spec_schema_transformations.py
Switched from jsonschema.RefResolver to referencing Registry/Resolver; introduced get_ref_resolver_registry; expanded refs by merging resolved contents; removed root definitions post-resolution; updated normalization to use pre-expanded schemas.
Docker/Build
airbyte_cdk/manifest_server/Dockerfile
Updated Poetry to 2.0.1; trailing newline/format tweak.
Tests
unit_tests/connector_builder/test_connector_builder_handler.py, unit_tests/sources/utils/test_transform.py
Reformatted a test call; added my_type definition and ref in complex schema; no behavioral changes to tests.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller as Normalization/Spec Transform
  participant Helpers as SchemaHelpers
  participant Ref as referencing.Registry/Resolver
  participant Schema as JSON Schema

  Caller->>Helpers: expand_refs(schema)
  Helpers->>Ref: get_ref_resolver_registry(schema).resolver()
  loop For each $ref in schema
    Helpers->>Ref: lookup(ref_url)
    Ref-->>Helpers: resolved.contents
    Helpers->>Helpers: recursively expand nested $refs
    Helpers->>Schema: merge resolved keys into parent
  end
  Helpers->>Schema: remove root "definitions"
  Helpers-->>Caller: expanded schema
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

dependencies

Suggested reviewers

  • brianjlai
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1756425696-jsonschema-version-pin

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e4b34b6 and 987f454.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • pyproject.toml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-shopify
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Analyze (python)
🔇 Additional comments (1)
pyproject.toml (1)

46-46: Direction LGTM—range unlocks fastmcp while staying <5.0.

Allowing 4.x broadly makes sense given the fastmcp requirement, and keeping the <5 guard is prudent. Nice. wdyt?

devin-ai-integration bot and others added 2 commits August 29, 2025 00:17
…hema 4.18+ compatibility

- Replace RefResolver.from_schema() with Registry().with_resource() pattern
- Update resolver.resolve() calls to use resolver.lookup().contents
- Addresses jsonschema 4.18+ deprecation warnings
- Maintains backward compatibility with existing functionality

Co-Authored-By: AJ Steers <[email protected]>
- Auto-format YAML and Markdown files per project standards
- No functional changes, only formatting consistency

Co-Authored-By: AJ Steers <[email protected]>
Copy link

github-actions bot commented Aug 29, 2025

PyTest Results (Fast)

3 764 tests  ±0   3 752 ✅ ±0   6m 34s ⏱️ -9s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 09916a9. ± Comparison against base commit 9ef1a3d.

♻️ This comment has been updated with latest results.

@aaronsteers aaronsteers marked this pull request as draft August 29, 2025 00:26
- Added referencing>=0.30.0 dependency to pyproject.toml
- Required for jsonschema 4.18+ RefResolver migration
- Regenerated poetry.lock with new dependency

Co-Authored-By: AJ Steers <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
airbyte_cdk/manifest_migrations/README.md (1)

24-24: Prefer backticks for literal asterisk.

To avoid escaping artifacts and improve readability, would you render the wildcard as * instead of "*", wdyt?

airbyte_cdk/sources/utils/schema_helpers.py (2)

67-77: Minor: rename param for clarity and guard resolver creation.

Since this now takes a Registry, would renaming ref_resolver to registry reduce confusion? Also, if schema is not a dict, Resource.from_contents(schema) could be surprising. Do you want to assert isinstance(schema, MutableMapping) before building the resource to fail fast with a clearer message, wdyt?


80-86: Resolution error handling.

If a $ref cannot be found, resolver.lookup(ref_url) will raise; previously we surfaced RefResolutionError. Do you want to catch Exception here and raise an AirbyteTracedException with context (ref string, path) to avoid leaking schema internals to logs, wdyt?

airbyte_cdk/utils/spec_schema_transformations.py (1)

17-25: OK on switching to referencing; consider a defensive fallback.

The new resolver path looks good. For resilience, do you want to add a narrow try/except around resolver.lookup(ref) to emit a clearer error (include the missing ref) or optionally fall back to returning the original $ref block when resolution fails, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 987f454 and 42e2d7f.

📒 Files selected for processing (8)
  • .github/actions/check-docker-tag/action.yml (1 hunks)
  • .github/workflows/pytest_matrix.yml (1 hunks)
  • .github/workflows/test-command.yml (1 hunks)
  • airbyte_cdk/manifest_migrations/README.md (1 hunks)
  • airbyte_cdk/manifest_server/openapi.yaml (14 hunks)
  • airbyte_cdk/sources/utils/schema_helpers.py (2 hunks)
  • airbyte_cdk/sources/utils/transform.py (2 hunks)
  • airbyte_cdk/utils/spec_schema_transformations.py (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • .github/workflows/test-command.yml
  • .github/actions/check-docker-tag/action.yml
  • .github/workflows/pytest_matrix.yml
🧰 Additional context used
🧬 Code graph analysis (2)
airbyte_cdk/utils/spec_schema_transformations.py (2)
airbyte_cdk/sources/file_based/config/abstract_file_based_spec.py (1)
  • schema (88-98)
airbyte_cdk/sources/config.py (1)
  • schema (21-27)
airbyte_cdk/sources/utils/schema_helpers.py (4)
airbyte_cdk/sources/file_based/config/abstract_file_based_spec.py (1)
  • schema (88-98)
airbyte_cdk/sources/config.py (1)
  • schema (21-27)
airbyte_cdk/destinations/vector_db_based/config.py (1)
  • schema (293-298)
unit_tests/destinations/vector_db_based/config_test.py (1)
  • schema (64-69)
🪛 GitHub Actions: Dependency Analysis
airbyte_cdk/sources/utils/transform.py

[error] 16-16: DEP003: 'referencing' imported but it is a transitive dependency. Command: poetry run deptry .

airbyte_cdk/utils/spec_schema_transformations.py

[error] 9-9: DEP003: 'referencing' imported but it is a transitive dependency. Command: poetry run deptry .

airbyte_cdk/sources/utils/schema_helpers.py

[error] 16-16: DEP003: 'referencing' imported but it is a transitive dependency. Command: poetry run deptry .

🪛 GitHub Actions: Test Connectors
airbyte_cdk/sources/utils/schema_helpers.py

[error] 16-16: ModuleNotFoundError: No module named 'referencing' (required by airbyte_cdk.sources.utils.schema_helpers).

🪛 Checkov (3.2.334)
airbyte_cdk/manifest_server/openapi.yaml

[MEDIUM] 197-202: Ensure that arrays have a maximum number of items

(CKV_OPENAPI_21)

🪛 LanguageTool
airbyte_cdk/manifest_migrations/README.md

[grammar] ~24-24: There might be a mistake here.
Context: ...48.3", "
=1.2", ">=1.0.0,<2.0.0", "6.48.3" - Each migration entry should include: ...

(QB_NEW_EN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
🔇 Additional comments (1)
airbyte_cdk/manifest_server/openapi.yaml (1)

352-362: I’ve requested a check of the YAML snippet to confirm whether the integer fields still use float-valued constraints. Once I have the output, I’ll update the comment accordingly.

This comment was marked as outdated.

This comment was marked as outdated.

- Update constraint to >=4.18.0,<5.0 as suggested by @aaronsteers
- Remove all backward compatibility code for 4.17.x
- Use referencing library exclusively with DRAFT7 default specification
- Fixes CannotDetermineSpecification error for schemas without  property

Co-Authored-By: AJ Steers <[email protected]>
Copy link
Contributor

You're absolutely right about the cleaner approach! I've updated the implementation:

Technical constraint discovered: The referencing library itself requires jsonschema 4.18+ (it's not available in 4.17.x), so backward compatibility with 4.17.x is technically impossible anyway.

Changes made:

  1. ✅ Updated constraint to >=4.18.0,<5.0 as suggested
  2. ✅ Removed all backward compatibility code
  3. ✅ Simplified to use only the referencing library with DRAFT7 default specification
  4. ✅ Fixed CannotDetermineSpecification error for schemas without $schema property
  5. ✅ Local tests now pass

The implementation is now much cleaner and the constraint accurately reflects the technical reality that the referencing library requires 4.18+.

Copy link

github-actions bot commented Aug 29, 2025

PyTest Results (Full)

3 767 tests  ±0   3 755 ✅ ±0   11m 17s ⏱️ +3s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 09916a9. ± Comparison against base commit 9ef1a3d.

♻️ This comment has been updated with latest results.

aaronsteers and others added 11 commits August 29, 2025 11:41
…n transform.py

- Update Poetry version from 1.8.3 to 2.0.1 in manifest server Dockerfile for consistency with local development
- Fix transform.py resolve function to properly handle both new referencing library and legacy RefResolver
- Addresses CI failures in 'Manifest Server Docker Image Build' and pytest transform tests

Co-Authored-By: AJ Steers <[email protected]>
@aaronsteers aaronsteers marked this pull request as ready for review August 29, 2025 21:46
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
airbyte_cdk/sources/utils/transform.py (1)

170-177: Fix item type handling for arrays (string -> set of chars bug).

set(subschema.get("items", {}).get("type", set())) turns "string" into {'s','t',...} and can misclassify arrays. Can we normalize the "type" field first, wdyt?

-            elif target_type == "array":
-                item_types = set(subschema.get("items", {}).get("type", set()))
+            elif target_type == "array":
+                raw_item_type = subschema.get("items", {}).get("type")
+                if isinstance(raw_item_type, str):
+                    item_types = {raw_item_type}
+                elif isinstance(raw_item_type, list):
+                    item_types = set(raw_item_type)
+                else:
+                    item_types = set()
                 if (
-                    item_types.issubset(json_to_python_simple)
+                    item_types.issubset(set(json_to_python_simple.keys()))
                     and type(original_item) in json_to_python_simple.values()
                 ):
                     return [original_item]
♻️ Duplicate comments (1)
airbyte_cdk/sources/utils/schema_helpers.py (1)

70-83: LGTM: central registry factory simplifies resolver usage. Also, double-check deps are declared.

The Registry builder looks good and keeps call sites clean. Can we verify referencing is a direct dependency and that the jsonschema floor aligns with this usage, wdyt?

#!/bin/bash
# Verify pyproject declares direct deps and compatible ranges
rg -n '^jsonschema\s*=' pyproject.toml
rg -n '^referencing\s*=' pyproject.toml
# Sanity: ensure lock resolves to <5.0
rg -n 'jsonschema' poetry.lock -n -C1 | head -n 20
🧹 Nitpick comments (7)
unit_tests/sources/utils/test_transform.py (1)

23-26: Cover the new $ref path with an explicit test case?

You added #/definitions/my_type and wired it into def.properties.dd, but none of the parametrized cases exercise def.dd. Adding a minimal case will prove the expand_refs path works end-to-end, wdyt?

Example addition to the parametrization (outside this hunk):

@@
     [
         (SIMPLE_SCHEMA, {"value": 12}, {"value": "12"}, None),
+        # $ref to definitions/my_type
+        (
+            COMPLEX_SCHEMA,
+            {"def": {"dd": "abc"}},
+            {"def": {"dd": "abc"}},
+            None,
+        ),
airbyte_cdk/sources/utils/transform.py (3)

20-23: Narrow the import exception and avoid runtime dependency for typing-only imports?

Catching all exceptions is too broad, and Validator is only used for typing here. Shall we gate it under TYPE_CHECKING and catch ImportError otherwise, wdyt?

-try:
-    from jsonschema.validators import Validator
-except:
-    from jsonschema import Validator
+if TYPE_CHECKING:
+    try:
+        from jsonschema.validators import Validator  # type: ignore
+    except ImportError:
+        from jsonschema import Validator  # type: ignore

209-215: Avoid re-expanding refs on every validator callback; expand once up-front.

Calling expand_refs(schema) (and again on property_value) inside each callback is redundant and mutates schemas repeatedly. Expanding once in transform() on a deepcopy improves perf and avoids side effects, wdyt?

Apply these changes:

@@ def normalizator(...):
-            # Very first step is to expand $refs in the schema itself:
-            expand_refs(schema)
-
-            # Now we can expand $refs in the property value:
-            if isinstance(property_value, dict):
-                expand_refs(property_value)
+            # Refs are expanded once in transform(); nothing to do here.
             # Now we can validate and normalize the values:

And in transform():

@@ def transform(self, record, schema):
-        normalizer = self._normalizer(schema)
+        # Work on an expanded copy to avoid mutating caller-provided schemas.
+        expanded_schema = deepcopy(cast(Dict[str, Any], schema))
+        expand_refs(expanded_schema)
+        normalizer = self._normalizer(expanded_schema)

96-105: Drop "$ref" from custom validator map to shave overhead?

Since refs are expanded prior to validation, the $ref keyword won’t appear during traversal. Removing it from all_validators avoids needless callback wiring, wdyt?

-            if key in ["type", "array", "$ref", "properties", "items"]
+            if key in ["type", "array", "properties", "items"]
airbyte_cdk/sources/utils/schema_helpers.py (3)

18-19: Avoid importing private symbols at runtime.

from referencing._core import Resolver reaches into a private module. Since it’s only for typing, shall we import it under TYPE_CHECKING and annotate as a string to avoid runtime coupling, wdyt?

-from referencing._core import Resolver  # used for type hints
+if TYPE_CHECKING:  # type-only; avoid runtime import of private module
+    from referencing._core import Resolver  # pragma: no cover

85-104: Preserve sibling keyword precedence when expanding $ref.

Merging referenced keys over existing ones (schema.update(definition)) can override siblings. JSON Schema semantics are effectively “$ref with siblings” ≈ allOf([ref, siblings]); in a merge model, siblings should take precedence. Shall we flip the merge to prefer siblings, wdyt?

-        if "$ref" in schema:
-            ref_url = schema.pop("$ref")
-            definition = ref_resolver.lookup(ref_url).contents
-            _expand_refs(
-                definition, ref_resolver=ref_resolver
-            )  # expand refs in definitions as well
-            schema.update(definition)
+        if "$ref" in schema:
+            ref_url = schema.pop("$ref")
+            definition = ref_resolver.lookup(ref_url).contents
+            _expand_refs(definition, ref_resolver=ref_resolver)
+            # Merge so that sibling keywords in 'schema' override referenced keys.
+            siblings = dict(schema)  # after popping $ref
+            schema.clear()
+            schema.update({**definition, **siblings})

Would you also like a focused test ensuring that a sibling like {"$ref": "#/definitions/Foo", "minLength": 1} preserves minLength after expansion, wdyt?


106-117: Optional: enrich error reporting for unresolved refs.

Catching Unresolvable and surfacing the failing $ref with context would speed up debugging. Interested in a thin wrapper that adds the ref URL and a short hint, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 42e2d7f and 09916a9.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • airbyte_cdk/manifest_server/Dockerfile (2 hunks)
  • airbyte_cdk/sources/utils/schema_helpers.py (3 hunks)
  • airbyte_cdk/sources/utils/transform.py (2 hunks)
  • airbyte_cdk/utils/spec_schema_transformations.py (1 hunks)
  • pyproject.toml (1 hunks)
  • unit_tests/connector_builder/test_connector_builder_handler.py (1 hunks)
  • unit_tests/sources/utils/test_transform.py (2 hunks)
✅ Files skipped from review due to trivial changes (2)
  • unit_tests/connector_builder/test_connector_builder_handler.py
  • airbyte_cdk/manifest_server/Dockerfile
🚧 Files skipped from review as they are similar to previous changes (2)
  • pyproject.toml
  • airbyte_cdk/utils/spec_schema_transformations.py
🧰 Additional context used
🧬 Code graph analysis (2)
airbyte_cdk/sources/utils/transform.py (1)
airbyte_cdk/sources/utils/schema_helpers.py (2)
  • expand_refs (106-116)
  • get_ref_resolver_registry (70-82)
airbyte_cdk/sources/utils/schema_helpers.py (4)
airbyte_cdk/sources/file_based/config/abstract_file_based_spec.py (1)
  • schema (88-98)
airbyte_cdk/sources/config.py (1)
  • schema (21-27)
airbyte_cdk/destinations/vector_db_based/config.py (1)
  • schema (293-298)
unit_tests/destinations/vector_db_based/config_test.py (1)
  • schema (64-69)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
🔇 Additional comments (1)
unit_tests/sources/utils/test_transform.py (1)

36-36: LGTM: local definition completes the new reference.

Copy link
Collaborator

@ChristoGrab ChristoGrab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, LGTM!

@aaronsteers aaronsteers merged commit c4ff128 into main Aug 29, 2025
29 checks passed
@aaronsteers aaronsteers deleted the devin/1756425696-jsonschema-version-pin branch August 29, 2025 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants