Skip to content

Conversation

lmossman
Copy link
Contributor

@lmossman lmossman commented Aug 4, 2025

What

A user raised a bug with the current logic for handling Request Path cursor page token injection: https://airbyte1416.zendesk.com/agent/tickets/13510

Basically, if the "next page token" contains a full URL which contains a different domain than the url field of the stream's HttpRequester, we currently just naively concatenate the two together, which usually results in a failure.

Here is an example from the zendesk ticket above:

  • Currently, their API Endpoint URL for that stream is https://globalus251.dayforcehcm.com/api/parachute/v1/Reports/job_posting
  • But the Cursor Pagination returns a "next page" URL that looks like this: https://globalus251.dayforcehcm.com:443/api/parachute/v1/Reports/job_posting?25c92c37-2ba7-468c-89f9-834f6c050ddc=2024-01-01T00%3a00%3a00.000000&cursor=UhsMQmz0rZtvj7oCJ1e7DLe2O7v5nb5Gco1YD9NCTtU%253D
  • Notice that that next page URL has :443 after the .com, but the API Endpoint URL doesn't
  • This results in the second page request being sent to https://globalus251.dayforcehcm.com/api/parachute/v1/Reports/job_postinghttps://globalus251.dayforcehcm.com:443/api/parachute/v1/Reports/job_posting?25c92c37-2ba7-468c-89f9-834f6c050ddc=2024-01-01T00%3A00%3A00.000000&cursor=UdaBrnaiXuKw%252BFyD6FzERZba7d%252FPA0HuIoON3QbfZwI%253D&pageSize=5 (which is just the concatenation of the API Endpoint URL and the next page token)

How

To fix this, I simply modify the logic to call _join_url() on the url and path instead of naively concatenating them together.

This fixes the issue, because _join_url() will prefer the path if it contains its own full http scheme and domain, which is what we want in this case.

This also has a side-benefit of correctly handling the case where the url does not have a trailing / and the path does not have a leading / - the old implementation would not insert a / between these, whereas the new implementation does.

Testing

I have added unit tests to validate this fix, and you can reproduce the situation with this manifest: https://gist.github.com/lmossman/404b656c3e5726ddebc026eae118b7f8

Summary by CodeRabbit

  • Bug Fixes

    • Improved URL joining logic to ensure consistent and reliable construction of request URLs in all scenarios.
  • Tests

    • Added new tests to verify correct URL formation when using different combinations of base URLs and paths.

@github-actions github-actions bot added bug Something isn't working security labels Aug 4, 2025
Copy link

github-actions bot commented Aug 4, 2025

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@lmossman/fix-url-path-joining#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch lmossman/fix-url-path-joining

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

@lmossman lmossman marked this pull request as ready for review August 4, 2025 17:44
@lmossman lmossman requested review from Copilot and ChristoGrab August 4, 2025 17:44
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug in URL construction where full URLs from cursor pagination tokens were incorrectly concatenated with base URLs, causing request failures. The fix ensures proper URL joining by using the existing _join_url() method instead of naive string concatenation.

  • Replaces string concatenation with _join_url() method call in the _get_url() method
  • Adds comprehensive unit tests to validate the URL joining behavior with various scenarios

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
airbyte_cdk/sources/declarative/requesters/http_requester.py Updates URL construction logic to use _join_url() method for proper URL handling
unit_tests/sources/declarative/requesters/test_http_requester.py Adds test cases to validate URL joining behavior with different path scenarios

Copy link

github-actions bot commented Aug 4, 2025

PyTest Results (Fast)

3 699 tests  +4   3 688 ✅ +4   6m 37s ⏱️ +12s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 5d2d4f2. ± Comparison against base commit 6c0d36d.

Copy link
Contributor

coderabbitai bot commented Aug 4, 2025

📝 Walkthrough

Walkthrough

The _get_url method in HttpRequester was updated to consistently use the _join_url method for combining a base URL and a path, regardless of whether the base URL comes from url_base or url. A new parameterized test was added to verify this behavior when using url instead of url_base.

Changes

Cohort / File(s) Change Summary
HttpRequester URL Construction Logic
airbyte_cdk/sources/declarative/requesters/http_requester.py
Modified _get_url to always use _join_url for joining url and path when url_base is not present.
Unit Tests for URL Joining
unit_tests/sources/declarative/requesters/test_http_requester.py
Added a parameterized test to verify correct URL joining when initialized with url instead of url_base.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Case
    participant HttpRequester as HttpRequester
    participant HttpClient as HTTP Client

    Test->>HttpRequester: Initialize with url and path
    Test->>HttpRequester: Make request
    HttpRequester->>HttpRequester: _get_url (uses _join_url)
    HttpRequester->>HttpClient: Send request with joined URL
    HttpClient-->>Test: Return prepared request (assert URL)
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~7 minutes

Possibly related PRs

Suggested labels

airbyte-python-cdk, airbyte-python-cdk/low-code/http-requester

Suggested reviewers

  • chandlerprall
  • artem1205

Would you like to consider looping in anyone else for this review, or does this cover the main stakeholders? Wdyt?

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c0d36d and 5d2d4f2.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/declarative/requesters/http_requester.py (1 hunks)
  • unit_tests/sources/declarative/requesters/test_http_requester.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Check: source-shopify
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/requesters/http_requester.py (1)

171-177: LGTM! This change properly addresses the URL concatenation issue described in the PR.

The modification to consistently use _join_url() for both url_base + path and url + path scenarios is a solid improvement. This ensures that when pagination tokens contain full URLs, they're handled correctly rather than being naively concatenated. The existing _join_url implementation already handles the edge cases well, including full URLs in the path parameter.

unit_tests/sources/declarative/requesters/test_http_requester.py (1)

864-906: Excellent test coverage for the URL joining fix! 🎯

This parameterized test does a great job covering the key scenarios that the PR aims to fix, especially the cases where the path contains a full URL (test cases 3 & 4). The test structure mirrors the existing test_join_url nicely and provides comprehensive coverage for the new code path when using url instead of url_base.

The scenarios you've chosen directly address the pagination URL concatenation issue mentioned in the PR objectives, wdyt?

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch lmossman/fix-url-path-joining

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

github-actions bot commented Aug 4, 2025

PyTest Results (Full)

3 702 tests  +4   3 691 ✅ +4   11m 40s ⏱️ ±0s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 5d2d4f2. ± Comparison against base commit 6c0d36d.

Copy link
Collaborator

@ChristoGrab ChristoGrab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@lmossman lmossman merged commit 950acc6 into main Aug 4, 2025
29 of 30 checks passed
@lmossman lmossman deleted the lmossman/fix-url-path-joining branch August 4, 2025 20:05
lmossman added a commit that referenced this pull request Aug 5, 2025
lmossman added a commit that referenced this pull request Aug 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants