-
Notifications
You must be signed in to change notification settings - Fork 30
feat: Add JSON schema generation to build process for schemastore.org registration #700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…tration - Modified bin/generate_component_manifest_files.py to generate JSON schema alongside pydantic models - Added generate_json_schema() function with custom DateTimeEncoder to handle datetime objects - Generated airbyte_cdk/sources/declarative/declarative_component_schema.json from YAML source - JSON file can now be registered with schemastore.org for schema validation and auto-completion Co-Authored-By: AJ Steers <[email protected]>
Original prompt from AJ Steers
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot wasn't able to review any files in this pull request.
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1754519222-add-json-schema-generation#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1754519222-add-json-schema-generation Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
📝 WalkthroughWalkthroughRewrote bin/generate_component_manifest_files.py from a container-based async pipeline to a local, synchronous pipeline: added generate_json_schema(), switched to subprocess datamodel-codegen calls, added post_process_codegen(), removed dagger/anyio and related container scripts, and updated pyproject assemble task. Minor README and header edits elsewhere. Changes
Sequence Diagram(s)sequenceDiagram
participant Dev as Developer / CLI
participant Script as generate_component_manifest_files.py
participant FS as Local FS
participant DMG as datamodel-codegen (subprocess)
Dev->>Script: run script
Note right of Script #DFF2E1: 1. generate_json_schema()\n(read YAML -> write JSON)
Script->>FS: read declarative_component_schema.yaml
FS-->>Script: YAML
Script->>FS: write declarative_component_schema.json
Note right of Script #FFF3D1: 2. per-YAML codegen\n(temp generated/)
Script->>FS: create TemporaryDirectory / generated/
Script->>FS: iterate YAML files
loop per YAML
Script->>DMG: run datamodel-codegen --input <yaml> --output generated/<name>.py ...
DMG-->>Script: generated/<name>.py
end
Note right of Script #F2E7FF: 3. post-processing
Script->>FS: post_process_codegen(generated/, generated_post_processed/)
FS-->>Script: final post-processed files
Script->>FS: write final output (LOCAL_OUTPUT_DIR_PATH)
Script-->>Dev: print status / exit
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Would you like a unit test added for Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
bin/generate_component_manifest_files.py (2)
34-52
: The JSON schema generation looks solid, but what about error handling?The function correctly reads the YAML file, handles datetime serialization with a custom encoder, and writes the JSON output. However, I'm wondering if we should add some error handling for common failure scenarios? wdyt?
For instance:
- What if the YAML file doesn't exist or is malformed?
- What if the output directory isn't writable?
- Should we validate that the generated JSON is actually valid JSON schema format?
Here's a more robust version with error handling:
def generate_json_schema(): """Generate JSON schema from the YAML file for schemastore.org registration.""" yaml_file_path = f"{LOCAL_YAML_DIR_PATH}/declarative_component_schema.yaml" json_file_path = f"{LOCAL_YAML_DIR_PATH}/declarative_component_schema.json" - with open(yaml_file_path, "r") as yaml_file: - schema_data = yaml.safe_load(yaml_file) + try: + with open(yaml_file_path, "r") as yaml_file: + schema_data = yaml.safe_load(yaml_file) + except FileNotFoundError: + print(f"Error: YAML file not found at {yaml_file_path}") + raise + except yaml.YAMLError as e: + print(f"Error: Invalid YAML in {yaml_file_path}: {e}") + raise class DateTimeEncoder(json.JSONEncoder): def default(self, obj): if hasattr(obj, "isoformat"): return obj.isoformat() return super().default(obj) - with open(json_file_path, "w") as json_file: - json.dump(schema_data, json_file, indent=2, cls=DateTimeEncoder) + try: + with open(json_file_path, "w") as json_file: + json.dump(schema_data, json_file, indent=2, cls=DateTimeEncoder) + except IOError as e: + print(f"Error: Could not write JSON file to {json_file_path}: {e}") + raise print(f"Generated JSON schema: {json_file_path}")
42-46
: Smart datetime handling! Could we make the encoder name more specific?The
DateTimeEncoder
correctly handles objects withisoformat
method, which covers datetime objects. However, the class name might be a bit generic since it handles any object withisoformat
, not just datetime objects. Maybe something likeIsoFormatEncoder
would be more precise? wdyt?- class DateTimeEncoder(json.JSONEncoder): + class IsoFormatEncoder(json.JSONEncoder): def default(self, obj): if hasattr(obj, "isoformat"): return obj.isoformat() return super().default(obj) with open(json_file_path, "w") as json_file: - json.dump(schema_data, json_file, indent=2, cls=DateTimeEncoder) + json.dump(schema_data, json_file, indent=2, cls=IsoFormatEncoder)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
bin/generate_component_manifest_files.py
(3 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
📚 Learning: in the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-g...
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
Applied to files:
bin/generate_component_manifest_files.py
📚 Learning: the files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from ...
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Applied to files:
bin/generate_component_manifest_files.py
📚 Learning: in the airbytehq/airbyte-python-cdk repository, ignore all `__init__.py` files when providing a reco...
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, ignore all `__init__.py` files when providing a recommended reviewing order.
Applied to files:
bin/generate_component_manifest_files.py
📚 Learning: in the project, the `run` function is defined in `airbyte_cdk/cli/source_declarative_manifest/_run.p...
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: pyproject.toml:108-111
Timestamp: 2024-11-15T00:58:15.446Z
Learning: In the project, the `run` function is defined in `airbyte_cdk/cli/source_declarative_manifest/_run.py` and is imported into the module's `__init__.py`.
Applied to files:
bin/generate_component_manifest_files.py
📚 Learning: when building the docker image for the `source-declarative-manifest`, copying files from `site-packa...
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#90
File: Dockerfile:16-21
Timestamp: 2024-12-02T18:36:04.346Z
Learning: When building the Docker image for the `source-declarative-manifest`, copying files from `site-packages` ensures that the SDM code runs from an installed package context, maintaining import paths and module consistency.
Applied to files:
bin/generate_component_manifest_files.py
📚 Learning: when modifying the `yamldeclarativesource` class in `airbyte_cdk/sources/declarative/yaml_declarativ...
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Applied to files:
bin/generate_component_manifest_files.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (3)
bin/generate_component_manifest_files.py (3)
3-3
: LGTM on the new imports!The
json
andyaml
imports are appropriately placed and necessary for the new JSON schema generation functionality.Also applies to: 11-11
136-136
: Perfect placement of the JSON schema generation!Calling
generate_json_schema()
at the beginning ofmain()
ensures the JSON schema is generated before the rest of the build process. This maintains the build flow as intended in the PR objectives.
36-37
: Required schemastore.org fields are already present
The declarative schema atairbyte_cdk/sources/declarative/declarative_component_schema.yaml
includes both$schema
and$id
, so the generated JSON will satisfy schemastore.org requirements without extra validation. Great to go! wdyt?
PyTest Results (Full)3 765 tests 3 754 ✅ 11m 16s ⏱️ Results for commit 49c5a48. ♻️ This comment has been updated with latest results. |
- Updated generate_json_schema() to create JSON file in generated/ subfolder - Added directory creation logic to ensure generated/ folder exists - Removed old JSON file from root declarative/ directory - Improves organization by clearly marking auto-generated files Co-Authored-By: AJ Steers <[email protected]>
- Fixed whitespace formatting to comply with Ruff format check - Added proper blank line after import statement - Removed trailing whitespace Co-Authored-By: AJ Steers <[email protected]>
/poe build
❌ Poe command |
/autofix
|
Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
bin/generate_component_manifest_files.py (2)
103-120
: Fix typo and make import injection robust/idempotent.
- Typo: “warinings” → “warnings”.
- Current insertion relies on “\n\n” and may skip import, breaking replaced bases.
Apply:
- # update the imports to include the new base model with deprecation warinings + # update the imports to include the new base model with deprecation warnings @@ - if len(classes_with_deprecated_fields) > 0: - # Find where to insert the base model - after imports but before class definitions - imports_end = post_processed_content.find( - "\n\n", - post_processed_content.find("from pydantic.v1 import"), - ) - if imports_end > 0: - post_processed_content = ( - post_processed_content[:imports_end] - + "\n\n" - + "from airbyte_cdk.sources.declarative.models.base_model_with_deprecations import (\n" - + " BaseModelWithDeprecations,\n" - + ")" - + post_processed_content[imports_end:] - ) + if classes_with_deprecated_fields: + # Insert once, right after the first pydantic import (fallback to top). + if "BaseModelWithDeprecations" not in post_processed_content: + m = re.search(r"^from pydantic\.v1 import[^\n]*\n", post_processed_content, flags=re.M) + insert_at = m.end() if m else 0 + injection = ( + "from airbyte_cdk.sources.declarative.models.base_model_with_deprecations import (\n" + " BaseModelWithDeprecations,\n" + ")\n" + ) + post_processed_content = ( + post_processed_content[:insert_at] + injection + post_processed_content[insert_at:] + )wdyt?
142-146
: Make “_parameters” rename indentation-agnostic.Exact string match is brittle; use regex so any indent works.
Apply:
- post_processed_content = original_content.replace( - " _parameters:", " parameters:" - ).replace("from pydantic", "from pydantic.v1") + post_processed_content = re.sub(r"(^\s+)_parameters:", r"\1parameters:", original_content, flags=re.M) + post_processed_content = post_processed_content.replace("from pydantic", "from pydantic.v1")wdyt?
♻️ Duplicate comments (1)
bin/generate_component_manifest_files.py (1)
42-50
: Docstring update looks good.The schemastore phrasing from the prior suggestion has been applied as discussed.
🧹 Nitpick comments (2)
bin/generate_component_manifest_files.py (2)
35-39
: Deterministic module export order.Small: ensure stable star-import order.
Apply:
- for module_name in get_all_yaml_files_without_ext(): + for module_name in sorted(get_all_yaml_files_without_ext()): header += f"from .{module_name} import *\n"wdyt?
193-201
: Improve error messaging when codegen tool is absent.Catching only CalledProcessError misses the common “module not found” case. Add a tailored hint.
Apply:
try: result = subprocess.run(cmd, check=True, capture_output=True, text=True) print(f"Generated {output_py}") - except subprocess.CalledProcessError as e: + except subprocess.CalledProcessError as e: print(f"Error generating {output_py}: {e}") print(f"stdout: {e.stdout}") print(f"stderr: {e.stderr}") + if "No module named 'datamodel_code_generator'" in (e.stderr or ""): + print("Hint: install `datamodel-code-generator` (e.g., via `uv run` script deps or as a dev dependency) or adjust the assemble task.", file=sys.stderr) sys.exit(1)wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(0 hunks)bin/generate-component-manifest-dagger.sh
(0 hunks)bin/generate_component_manifest_files.py
(4 hunks)pyproject.toml
(1 hunks)
💤 Files with no reviewable changes (2)
- airbyte_cdk/sources/declarative/models/declarative_component_schema.py
- bin/generate-component-manifest-dagger.sh
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2024-11-15T01:04:21.272Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Applied to files:
bin/generate_component_manifest_files.py
📚 Learning: 2024-12-11T16:34:46.319Z
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
Applied to files:
bin/generate_component_manifest_files.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-intercom
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-pokeapi
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
yaml_file_path = f"{LOCAL_YAML_DIR_PATH}/declarative_component_schema.yaml" | ||
json_file_path = f"{LOCAL_YAML_DIR_PATH}/generated/declarative_component_schema.json" | ||
|
||
with open(yaml_file_path, "r") as yaml_file: | ||
schema_data = yaml.safe_load(yaml_file) | ||
|
||
class DateTimeEncoder(json.JSONEncoder): | ||
def default(self, obj): | ||
if hasattr(obj, "isoformat"): | ||
return obj.isoformat() | ||
return super().default(obj) | ||
|
||
import os | ||
|
||
os.makedirs(os.path.dirname(json_file_path), exist_ok=True) | ||
|
||
with open(json_file_path, "w") as json_file: | ||
json.dump(schema_data, json_file, indent=2, cls=DateTimeEncoder) | ||
|
||
print(f"Generated JSON schema: {json_file_path}") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Harden JSON Schema generation: ensure $schema/$id, validate, and drop redundant import.
- Add defaults for $schema and $id if missing.
- Validate early with jsonschema to fail fast.
- Remove the inner
import os
(already imported).
Apply:
yaml_file_path = f"{LOCAL_YAML_DIR_PATH}/declarative_component_schema.yaml"
json_file_path = f"{LOCAL_YAML_DIR_PATH}/generated/declarative_component_schema.json"
- with open(yaml_file_path, "r") as yaml_file:
+ if not os.path.exists(yaml_file_path):
+ raise FileNotFoundError(f"Missing schema source: {yaml_file_path}")
+ with open(yaml_file_path, "r") as yaml_file:
schema_data = yaml.safe_load(yaml_file)
class DateTimeEncoder(json.JSONEncoder):
def default(self, obj):
if hasattr(obj, "isoformat"):
return obj.isoformat()
return super().default(obj)
- import os
-
- os.makedirs(os.path.dirname(json_file_path), exist_ok=True)
+ # Ensure required fields for schemastore consumers
+ schema_data.setdefault("$schema", "https://json-schema.org/draft/2020-12/schema")
+ schema_data.setdefault("$id", "https://json.schemastore.org/airbyte-declarative-component.json")
+ os.makedirs(os.path.dirname(json_file_path), exist_ok=True)
+ # Validate and fail fast
+ try:
+ from jsonschema import Draft202012Validator
+ Draft202012Validator.check_schema(schema_data)
+ except Exception as e:
+ print(f"JSON Schema validation failed: {e}", file=sys.stderr)
+ sys.exit(1)
with open(json_file_path, "w") as json_file:
- json.dump(schema_data, json_file, indent=2, cls=DateTimeEncoder)
+ json.dump(schema_data, json_file, indent=2, ensure_ascii=False, cls=DateTimeEncoder)
wdyt?
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
yaml_file_path = f"{LOCAL_YAML_DIR_PATH}/declarative_component_schema.yaml" | |
json_file_path = f"{LOCAL_YAML_DIR_PATH}/generated/declarative_component_schema.json" | |
with open(yaml_file_path, "r") as yaml_file: | |
schema_data = yaml.safe_load(yaml_file) | |
class DateTimeEncoder(json.JSONEncoder): | |
def default(self, obj): | |
if hasattr(obj, "isoformat"): | |
return obj.isoformat() | |
return super().default(obj) | |
import os | |
os.makedirs(os.path.dirname(json_file_path), exist_ok=True) | |
with open(json_file_path, "w") as json_file: | |
json.dump(schema_data, json_file, indent=2, cls=DateTimeEncoder) | |
print(f"Generated JSON schema: {json_file_path}") | |
yaml_file_path = f"{LOCAL_YAML_DIR_PATH}/declarative_component_schema.yaml" | |
json_file_path = f"{LOCAL_YAML_DIR_PATH}/generated/declarative_component_schema.json" | |
if not os.path.exists(yaml_file_path): | |
raise FileNotFoundError(f"Missing schema source: {yaml_file_path}") | |
with open(yaml_file_path, "r") as yaml_file: | |
schema_data = yaml.safe_load(yaml_file) | |
class DateTimeEncoder(json.JSONEncoder): | |
def default(self, obj): | |
if hasattr(obj, "isoformat"): | |
return obj.isoformat() | |
return super().default(obj) | |
# Ensure required fields for schemastore consumers | |
schema_data.setdefault( | |
"$schema", "https://json-schema.org/draft/2020-12/schema" | |
) | |
schema_data.setdefault( | |
"$id", "https://json.schemastore.org/airbyte-declarative-component.json" | |
) | |
os.makedirs(os.path.dirname(json_file_path), exist_ok=True) | |
# Validate and fail fast | |
try: | |
from jsonschema import Draft202012Validator | |
Draft202012Validator.check_schema(schema_data) | |
except Exception as e: | |
print(f"JSON Schema validation failed: {e}", file=sys.stderr) | |
sys.exit(1) | |
with open(json_file_path, "w") as json_file: | |
json.dump( | |
schema_data, | |
json_file, | |
indent=2, | |
ensure_ascii=False, | |
cls=DateTimeEncoder, | |
) | |
print(f"Generated JSON schema: {json_file_path}") |
🤖 Prompt for AI Agents
In bin/generate_component_manifest_files.py around lines 51 to 71, the JSON
schema generation should be hardened: remove the redundant inner "import os" and
instead rely on the module-level import, ensure the loaded YAML has defaults for
"$schema" and "$id" (set sensible defaults if missing), validate the resulting
schema object with jsonschema.validate (import jsonschema at module top and
raise/exit on validation errors to fail fast), then proceed to create the output
directory and dump using the DateTimeEncoder; make sure any mutation of
schema_data to add defaults happens before validation and before json.dump.
cmd = [ | ||
"datamodel-codegen", | ||
"--input", | ||
input_yaml, | ||
"--output", | ||
output_py, | ||
"--disable-timestamp", | ||
"--enum-field-as-literal", | ||
"one", | ||
"--set-default-enum-member", | ||
"--use-double-quotes", | ||
"--remove-special-field-name-prefix", | ||
# allow usage of the extra key such as `deprecated`, etc. | ||
"--field-extra-keys", | ||
# account the `deprecated` flag provided for the field. | ||
"deprecated", | ||
# account the `deprecation_message` provided for the field. | ||
"deprecation_message", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Prefer module invocation to avoid console-script PATH issues.
Calling via sys.executable -m datamodel_code_generator
is more portable and fixes failures when the CLI shim isn’t present.
Apply:
- cmd = [
- "datamodel-codegen",
+ cmd = [
+ sys.executable,
+ "-m",
+ "datamodel_code_generator",
"--input",
input_yaml,
"--output",
output_py,
wdyt?
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In bin/generate_component_manifest_files.py around lines 173 to 191, the
subprocess invocation uses the console script "datamodel-codegen" which can fail
if the CLI shim is not on PATH; change the command to invoke the package module
via the current Python interpreter by replacing the first entry with
sys.executable and adding "-m" and "datamodel_code_generator" (i.e.
[sys.executable, "-m", "datamodel_code_generator", ...]); also add an import sys
at the top of the file if not present so the script uses the running interpreter
to run the tool.
|
||
# Build tasks | ||
assemble = {cmd = "bin/generate-component-manifest-dagger.sh", help = "Generate component manifest files."} | ||
assemble = {cmd = "uv run bin/generate_component_manifest_files.py", help = "Generate component manifest files."} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Build failure risk: uv
may be unavailable in CI.
The job logs show build/autofix failing; likely uv
isn’t on PATH. Two options:
- Run
uv
within Poetry’s venv and declare it as a dev dep. - Or drop
uv
and rely on dev deps exclusively.
Apply option 1 minimal changes:
-assemble = {cmd = "uv run bin/generate_component_manifest_files.py", help = "Generate component manifest files."}
+assemble = {cmd = "poetry run uv run bin/generate_component_manifest_files.py", help = "Generate component manifest files."}
And add dev deps (near the existing [tool.poetry.group.dev.dependencies]):
+[tool.poetry.group.dev.dependencies]
+uv = "^0.5"
+datamodel-code-generator = "0.26.3"
Alternatively, option 2:
-assemble = {cmd = "uv run bin/generate_component_manifest_files.py", help = "Generate component manifest files."}
+assemble = {cmd = "poetry run python bin/generate_component_manifest_files.py", help = "Generate component manifest files."}
(then keep datamodel-code-generator
in dev deps). wdyt?
Run to confirm the current CI failure mode:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "uv on PATH?"; command -v uv || echo "missing"
echo "datamodel-codegen on PATH?"; command -v datamodel-codegen || echo "missing"
rg -n '^assemble\s*=.*generate_component_manifest_files\.py' pyproject.toml -n
Length of output: 407
Install uv
& datamodel-code-generator
for the assemble script
CI logs confirm both uv
and datamodel-codegen
are missing, causing the task to fail. Two options:
- Add them as dev dependencies and prefix with
poetry run
:
- assemble = {cmd = "uv run bin/generate_component_manifest_files.py", help = "Generate component manifest files."}
+ assemble = {cmd = "poetry run uv run bin/generate_component_manifest_files.py", help = "Generate component manifest files."}
Under [tool.poetry.group.dev.dependencies]
:
uv = "^0.5"
datamodel-code-generator = "0.26.3"
- Or invoke the script directly via Python:
- assemble = {cmd = "uv run bin/generate_component_manifest_files.py", help = "Generate component manifest files."}
+ assemble = {cmd = "poetry run python bin/generate_component_manifest_files.py", help = "Generate component manifest files."}
Which approach do you prefer? wdyt?
🤖 Prompt for AI Agents
In pyproject.toml around line 146, the assemble task uses the external tool "uv"
which is not installed in CI; update the assemble command to run the script with
Python under Poetry (e.g. change the cmd to invoke the Python script via "poetry
run python ...") so it no longer depends on the "uv" CLI, or alternatively add
"uv" and "datamodel-code-generator" to [tool.poetry.group.dev.dependencies] with
appropriate versions and keep the current command—pick one approach and apply it
consistently.
Add JSON schema generation to build process for schemastore.org registration
Pairs with this PR, to be re-opened after ours merges:
manifest.yaml
) SchemaStore/schemastore#4524Summary
This PR adds JSON schema generation to the Python CDK build process alongside the existing pydantic model generation. The build process now creates
declarative_component_schema.json
from the existingdeclarative_component_schema.yaml
file, enabling registration with schemastore.org for enhanced schema validation and auto-completion support.Key Changes:
bin/generate_component_manifest_files.py
to include JSON schema generationgenerate_json_schema()
function with customDateTimeEncoder
to handle datetime objectspoetry run poe build
$schema
and$id
fieldsMaintenance:
This fits neatly into the already-existing
build
process which generates the Pydantic models from the existing yaml file. We just now generate 2 files instead of just the one.This can be run with
poe build
locally or with the slash command/poe build
in PRs. (The/poe build
command is also documented in the welcome message, so it is easy to remember the command name.)Notes
$schema
and$id
fields, making conversion straightforwardDateTimeEncoder
was necessary due to datetime objects in the YAML that aren't JSON serializableLink to Devin run: https://app.devin.ai/sessions/5fc8aa08db104607acbe970add5f1002
Requested by: @aaronsteers (AJ Steers - [email protected])
Summary by CodeRabbit
New Features
Chores
Documentation