-
Notifications
You must be signed in to change notification settings - Fork 30
chore: minor improvements to working on manifest only connectors locally #756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… format python files after generating pydnatic models, upgrade dagger version, install dagger normally as a dev poetry dependency, allow specifying a custom manifest file path in IDE debug configuration.
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@dbgold17/cdk-local-dev-improvements#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch dbgold17/cdk-local-dev-improvements Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughRemoved an inline pip install from the Dagger manifest generator script, declared Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as User/CLI
participant DM as debug_manifest.py
participant AE as AirbyteEntrypoint
participant CMP as components loader
participant YS as YamlDeclarativeSource
U->>DM: Run debug_manifest [--manifest-path <path>] [--components-path <file>]
DM->>AE: AirbyteEntrypoint.parse_args(args)
AE-->>DM: parsed args (manifest_path, components_path, ...)
DM->>DM: manifest_path = parsed_args.manifest_path or "resources/manifest.yaml"
alt components_path provided
DM->>CMP: _register_components_from_file(components_path)
CMP-->>DM: module registered under "components" and "source_declarative_manifest.components"
end
DM->>YS: init(path_to_yaml=manifest_path, ...)
YS-->>DM: Source initialized
DM-->>U: debug_manifest runs using the provided/default manifest
Note right of YS: manifest_path drives YAML loading
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Would you like me to also check CI images and common local dev setups for dagger-io 0.18.6 availability and suggest updating CI images if missing, wdyt? Pre-merge checks (2 passed, 1 inconclusive)❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
pyproject.toml (2)
151-151
: Run ruff-fix before packaging so the wheel includes formatted/generated codeRight now
build
formats afterbuild-package
, so the artifacts may not reflect the fixes. Shall we move ruff-fix ahead of packaging, wdyt?-build = {sequence = ["assemble", "openapi-generate", "build-package", "ruff-fix"], help = "Run all tasks to build the package."} +build = {sequence = ["assemble", "openapi-generate", "ruff-fix", "build-package"], help = "Run all tasks to build the package."}
123-123
: Loosen dagger-io pin to a patch-safe range and ensure dev extras installation?• Change in pyproject.toml from
dagger-io = "0.18.6"
to
dagger-io = ">=0.18.6,<0.19.0"
so future 0.18.x patches (latest upstream is 0.18.17) land without manual bumps.
• Add or document apoetry install --with dev
step (e.g. in README and/or a newpoe install
task) sobin/generate-component-manifest-dagger.sh
reliably finds the dev-group dependency—wdyt?debug_manifest/README.md (2)
25-25
: Minor: mark the snippet as jsonc to match inline commentsSince the snippet uses
//
comments, would you switch the code fence tojsonc
for accuracy and better highlighting in some renderers, wdyt?
30-33
: Great addition; consider documenting --components-path and clarifying env installNice call adding
--manifest-path
. Would you also mention the optional--components-path
flag (supported by the entrypoint) in this snippet and below, so users discover it easily, wdyt? Additionally, to ensure local tasks that rely on dev deps (like Dagger) work out of the box, shall we tweak the install step to suggestpoetry install --with dev --all-extras
, wdyt?debug_manifest/debug_manifest.py (1)
33-39
: Pass through debug and config_path for better traceabilitySince
YamlDeclarativeSource
supportsdebug
andconfig_path
, would you wire them through for improved DX, wdyt?debug_manifest( YamlDeclarativeSource( - path_to_yaml=manifest_path, + path_to_yaml=manifest_path, + debug=getattr(parsed_args, "debug", False), catalog=YamlDeclarativeSource.read_catalog(catalog_path) if catalog_path else None, config=YamlDeclarativeSource.read_config(config_path) if config_path else None, - state=YamlDeclarativeSource.read_state(state_path) if state_path else None, + state=YamlDeclarativeSource.read_state(state_path) if state_path else None, + config_path=config_path, ), args, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (4)
bin/generate-component-manifest-dagger.sh
(0 hunks)debug_manifest/README.md
(1 hunks)debug_manifest/debug_manifest.py
(1 hunks)pyproject.toml
(2 hunks)
💤 Files with no reviewable changes (1)
- bin/generate-component-manifest-dagger.sh
🧰 Additional context used
🧬 Code graph analysis (1)
debug_manifest/debug_manifest.py (2)
airbyte_cdk/entrypoint.py (4)
AirbyteEntrypoint
(53-368)extract_catalog
(352-356)extract_config
(359-363)extract_state
(345-349)airbyte_cdk/sources/declarative/yaml_declarative_source.py (1)
YamlDeclarativeSource
(17-69)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-shopify
- GitHub Check: Check: destination-motherduck
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: SDM Docker Image Build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR provides minor improvements to the development experience when working on manifest-only connectors locally. The changes include upgrading dependencies, improving the development workflow, and adding flexibility to debugging tools.
Key changes:
- Updated dagger dependency version and moved it to pyproject.toml as a dev dependency
- Enhanced the debug manifest script to support custom manifest file paths with a sensible default
- Added automatic code formatting to the build process
Reviewed Changes
Copilot reviewed 3 out of 5 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
pyproject.toml | Added dagger-io dependency and automatic ruff formatting to build task |
debug_manifest/debug_manifest.py | Added support for custom manifest paths via command line arguments |
debug_manifest/README.md | Updated documentation to include manifest-path option and clarified placeholders |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
debug_manifest/debug_manifest.py (3)
19-43
: Harden dynamic component import to avoid partial module pollutionIf exec_module fails, the pre-inserted entries in sys.modules remain and can cause confusing follow-on errors. Could we add existence checks and clean up sys.modules on failure, wdyt?
@@ -def _register_components_from_file(filepath: str) -> None: +def _register_components_from_file(filepath: str) -> None: @@ - components_path = Path(filepath) + components_path = Path(filepath).resolve() + if not components_path.exists(): + raise FileNotFoundError(f"Components file not found: {components_path}") @@ - sys.modules[module_name] = module - sys.modules[sdm_module_name] = module - - spec.loader.exec_module(module) + sys.modules[module_name] = module + sys.modules[sdm_module_name] = module + try: + spec.loader.exec_module(module) + except Exception as e: + # cleanup partially initialized modules + sys.modules.pop(module_name, None) + sys.modules.pop(sdm_module_name, None) + raise ImportError(f"Failed to import components from {components_path}") from e
52-54
: Avoid re-parsing args via extract_ helpers*The extract_* helpers call parse_args again. Since we already have parsed_args, shall we reuse it to avoid triple parsing, wdyt?
- catalog_path = AirbyteEntrypoint.extract_catalog(args) - config_path = AirbyteEntrypoint.extract_config(args) - state_path = AirbyteEntrypoint.extract_state(args) + catalog_path = getattr(parsed_args, "catalog", None) + config_path = getattr(parsed_args, "config", None) + state_path = getattr(parsed_args, "state", None)
58-62
: Pass config_path through for better debug contextYamlDeclarativeSource accepts config_path; passing it can improve emitted debug info. Shall we include it, wdyt?
YamlDeclarativeSource( path_to_yaml=manifest_path, + config_path=config_path, catalog=YamlDeclarativeSource.read_catalog(catalog_path) if catalog_path else None, config=YamlDeclarativeSource.read_config(config_path) if config_path else None, state=YamlDeclarativeSource.read_state(state_path) if state_path else None, ),
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
debug_manifest/README.md
(1 hunks)debug_manifest/debug_manifest.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- debug_manifest/README.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-11-15T01:04:21.272Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Applied to files:
debug_manifest/debug_manifest.py
🧬 Code graph analysis (1)
debug_manifest/debug_manifest.py (2)
airbyte_cdk/entrypoint.py (4)
AirbyteEntrypoint
(53-368)extract_catalog
(352-356)extract_config
(359-363)extract_state
(345-349)airbyte_cdk/sources/declarative/yaml_declarative_source.py (1)
YamlDeclarativeSource
(17-69)
🪛 GitHub Actions: Linters
debug_manifest/debug_manifest.py
[error] 16-22: Ruff formatting would reformat 1 file. Command 'poetry run ruff format --diff .' exited with code 1. Run 'poetry run ruff format' to apply code style fixes.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Pytest (Fast)
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
🔇 Additional comments (2)
debug_manifest/debug_manifest.py (2)
48-51
: Nice consolidation of arg parsing and safer defaultsUsing a single parse and getattr with a default avoids AttributeError on commands like spec and removes the earlier double-parse on manifest_path. LGTM, thanks for addressing the earlier feedback, wdyt?
1-65
: Fix Ruff formatting failureCI is red on Ruff formatting. Could you run poetry run ruff format (or poe ruff-fix) and commit the result, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
debug_manifest/debug_manifest.py (3)
29-36
: Small nits: prefer is_file() and avoid re-importing sys.
- Using Path.is_file() guards against directories.
- sys is already imported at module scope; no need to re-import inside the function. Wdyt?
Apply:
- import sys from pathlib import Path @@ - if not components_path.exists(): + if not components_path.is_file(): raise FileNotFoundError(f"Components file not found: {components_path}")
47-53
: Guard against clobbering existing modules and ensure rollback on import failure.If another module named "components" is already present, this unconditionally overwrites it and could leave sys.modules in a bad state on exceptions. Shall we wrap registration in a try/except and restore prior modules on failure, wdyt?
- # Register then execute the module - # we dual-register the module to mirror what is done elsewhere in the CDK - sys.modules[module_name] = module - sys.modules[sdm_module_name] = module - - spec.loader.exec_module(module) + # Register then execute the module (with rollback if execution fails) + prev_components = sys.modules.get(module_name) + prev_sdm_components = sys.modules.get(sdm_module_name) + try: + sys.modules[module_name] = module + sys.modules[sdm_module_name] = module + spec.loader.exec_module(module) + except Exception: + if prev_components is None: + sys.modules.pop(module_name, None) + else: + sys.modules[module_name] = prev_components + if prev_sdm_components is None: + sys.modules.pop(sdm_module_name, None) + else: + sys.modules[sdm_module_name] = prev_sdm_components + raise
59-66
: Avoid re-parsing args for catalog/config/state.extract_* re-parses the CLI; since you already have parsed_args, can we read these directly to save work and keep things consistent, wdyt?
- catalog_path = AirbyteEntrypoint.extract_catalog(args) - config_path = AirbyteEntrypoint.extract_config(args) - state_path = AirbyteEntrypoint.extract_state(args) + catalog_path = getattr(parsed_args, "catalog", None) + config_path = getattr(parsed_args, "config", None) + state_path = getattr(parsed_args, "state", None)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
debug_manifest/debug_manifest.py
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-11-15T01:04:21.272Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Applied to files:
debug_manifest/debug_manifest.py
🧬 Code graph analysis (1)
debug_manifest/debug_manifest.py (2)
airbyte_cdk/entrypoint.py (4)
AirbyteEntrypoint
(53-368)extract_catalog
(352-356)extract_config
(359-363)extract_state
(345-349)airbyte_cdk/sources/declarative/yaml_declarative_source.py (1)
YamlDeclarativeSource
(17-69)
🪛 GitHub Actions: Linters
debug_manifest/debug_manifest.py
[error] 19-29: Ruff format would reformat 1 file: debug_manifest/debug_manifest.py. Run 'poetry run ruff format' to apply formatting.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: SDM Docker Image Build
🔇 Additional comments (4)
debug_manifest/debug_manifest.py (4)
20-54
: Solid dynamic component registration and clear error handling — LGTM.The helper cleanly loads and dual-registers the module and surfaces useful errors. Nice touch for local dev.
57-62
: Nice: single parse_args() and getattr() defaulting.This addresses the earlier double-parse/AttributeError concern. LGTM.
67-73
: Passing manifest_path through to YamlDeclarativeSource looks correct.Keeps default behavior while enabling overrides. LGTM.
20-29
: Verify Ruff formatting locally
Could you runpoetry run ruff format debug_manifest/debug_manifest.py
(orpoe ruff-fix
) in your local environment and commit any formatting changes so that this file matches the CI’s expectations? wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
debug_manifest/debug_manifest.py (1)
57-65
: Parse args once; reuseparsed_args
for catalog/config/stateWe still re-parse via
AirbyteEntrypoint.extract_*
, which internally callsparse_args
again. Can we read these directly fromparsed_args
to avoid redundant parsing, for consistency and tiny perf/clarity win, wdyt?- catalog_path = AirbyteEntrypoint.extract_catalog(args) - config_path = AirbyteEntrypoint.extract_config(args) - state_path = AirbyteEntrypoint.extract_state(args) + catalog_path = getattr(parsed_args, "catalog", None) + config_path = getattr(parsed_args, "config", None) + state_path = getattr(parsed_args, "state", None)
🧹 Nitpick comments (2)
debug_manifest/debug_manifest.py (2)
67-75
: Propagatedebug
andconfig_path
intoYamlDeclarativeSource
Since
YamlDeclarativeSource.__init__
acceptsdebug
andconfig_path
, would passing them improve diagnostics and parity with runtime behavior, wdyt?debug_manifest( YamlDeclarativeSource( - path_to_yaml=manifest_path, + path_to_yaml=manifest_path, + debug=getattr(parsed_args, "debug", False), catalog=YamlDeclarativeSource.read_catalog(catalog_path) if catalog_path else None, config=YamlDeclarativeSource.read_config(config_path) if config_path else None, state=YamlDeclarativeSource.read_state(state_path) if state_path else None, + config_path=config_path, ), args, )
20-53
: Harden dynamic component loader (file checks; drop redundant import; minor robustness)A couple of small guardrails could make this safer to use locally without surprises: ensure the path is a file with a
.py
suffix, normalize it, and avoid the redundant innerimport sys
. Also, passing aPath
directly tospec_from_file_location
is fine in newer Python, but converting tostr
is the most compatible. Wdyt?def _register_components_from_file(filepath: str) -> None: @@ - import importlib.util - import sys - from pathlib import Path + import importlib.util + from pathlib import Path @@ - components_path = Path(filepath) - if not components_path.exists(): + components_path = Path(filepath).expanduser().resolve() + if not components_path.exists(): raise FileNotFoundError(f"Components file not found: {components_path}") + if not components_path.is_file(): + raise FileNotFoundError(f"Components path is not a file: {components_path}") + if components_path.suffix != ".py": + raise ValueError(f"Components file must be a .py file: {components_path}") @@ - spec = importlib.util.spec_from_file_location(module_name, components_path) + spec = importlib.util.spec_from_file_location(module_name, str(components_path)) @@ - sys.modules[module_name] = module - sys.modules[sdm_module_name] = module + sys.modules[module_name] = module + sys.modules[sdm_module_name] = moduleOptional: if you expect repeated invocations with different files in the same process, do you want to warn when overwriting existing entries in
sys.modules
(e.g., viaif module_name in sys.modules: ... log/warn
), wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
debug_manifest/debug_manifest.py
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-11-15T01:04:21.272Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Applied to files:
debug_manifest/debug_manifest.py
🧬 Code graph analysis (1)
debug_manifest/debug_manifest.py (2)
airbyte_cdk/entrypoint.py (4)
AirbyteEntrypoint
(53-368)extract_catalog
(352-356)extract_config
(359-363)extract_state
(345-349)airbyte_cdk/sources/declarative/yaml_declarative_source.py (1)
YamlDeclarativeSource
(17-69)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Check: source-intercom
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Manifest Server Docker Image Build
🔇 Additional comments (1)
debug_manifest/debug_manifest.py (1)
59-62
: Nice: CLI flexibility + safe defaults look goodUsing
getattr(..., None) or "resources/manifest.yaml"
for bothmanifest_path
andcomponents_path
gracefully supports commands likespec
that don’t define these args. 👍 Wdyt about keeping this pattern consistent elsewhere too?
I found these changes to be helpful when working on updating the cdk to enable new functionality in a manifest-only connector so I figured I'd contribute them back.
bin/generate_component_manifest_files.py
script for generating pydantic models from the declarative_component_schema.yaml file. This fixes an issue I ran into locally when using0.13.3
where the script would error our withdeclare the dagger dependency as a normal dev dependency in the pyproject.toml file instead of hardcoded in the script.
run
poe ruff-fix
automatically after regenerating pydantic models to avoid seeing unrelated spacing / indentation diffsmodify the
debug_manifest.py
script to allow the caller to specify a path to a manifest yaml file and custom components python file, matching what exists for configs and catalogs. I found it useful when I had multiple slightly varying copies of a manifest file locally. If not specified, the script will default to looking for the file in the previously hardcoded location.Summary by CodeRabbit
New Features
Documentation
Chores