-
Notifications
You must be signed in to change notification settings - Fork 29
fix: properly join url and path when calculating full url #692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@lmossman/fix-url-path-joining#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch lmossman/fix-url-path-joining Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a bug in URL construction where full URLs from cursor pagination tokens were incorrectly concatenated with base URLs, causing request failures. The fix ensures proper URL joining by using the existing _join_url()
method instead of naive string concatenation.
- Replaces string concatenation with
_join_url()
method call in the_get_url()
method - Adds comprehensive unit tests to validate the URL joining behavior with various scenarios
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
airbyte_cdk/sources/declarative/requesters/http_requester.py | Updates URL construction logic to use _join_url() method for proper URL handling |
unit_tests/sources/declarative/requesters/test_http_requester.py | Adds test cases to validate URL joining behavior with different path scenarios |
📝 WalkthroughWalkthroughThe Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Test Case
participant HttpRequester as HttpRequester
participant HttpClient as HTTP Client
Test->>HttpRequester: Initialize with url and path
Test->>HttpRequester: Make request
HttpRequester->>HttpRequester: _get_url (uses _join_url)
HttpRequester->>HttpClient: Send request with joined URL
HttpClient-->>Test: Return prepared request (assert URL)
Estimated code review effort🎯 2 (Simple) | ⏱️ ~7 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Would you like to consider looping in anyone else for this review, or does this cover the main stakeholders? Wdyt? Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🧰 Additional context used🧠 Learnings (1)📓 Common learnings
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (2)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…)" This reverts commit 950acc6.
…)" This reverts commit 950acc6.
What
A user raised a bug with the current logic for handling
Request Path
cursor page token injection: https://airbyte1416.zendesk.com/agent/tickets/13510Basically, if the "next page token" contains a full URL which contains a different domain than the
url
field of the stream's HttpRequester, we currently just naively concatenate the two together, which usually results in a failure.Here is an example from the zendesk ticket above:
https://globalus251.dayforcehcm.com/api/parachute/v1/Reports/job_posting
https://globalus251.dayforcehcm.com:443/api/parachute/v1/Reports/job_posting?25c92c37-2ba7-468c-89f9-834f6c050ddc=2024-01-01T00%3a00%3a00.000000&cursor=UhsMQmz0rZtvj7oCJ1e7DLe2O7v5nb5Gco1YD9NCTtU%253D
:443
after the.com
, but the API Endpoint URL doesn'thttps://globalus251.dayforcehcm.com/api/parachute/v1/Reports/job_postinghttps://globalus251.dayforcehcm.com:443/api/parachute/v1/Reports/job_posting?25c92c37-2ba7-468c-89f9-834f6c050ddc=2024-01-01T00%3A00%3A00.000000&cursor=UdaBrnaiXuKw%252BFyD6FzERZba7d%252FPA0HuIoON3QbfZwI%253D&pageSize=5
(which is just the concatenation of the API Endpoint URL and the next page token)How
To fix this, I simply modify the logic to call
_join_url()
on theurl
andpath
instead of naively concatenating them together.This fixes the issue, because
_join_url()
will prefer thepath
if it contains its own full http scheme and domain, which is what we want in this case.This also has a side-benefit of correctly handling the case where the
url
does not have a trailing/
and thepath
does not have a leading/
- the old implementation would not insert a/
between these, whereas the new implementation does.Testing
I have added unit tests to validate this fix, and you can reproduce the situation with this manifest: https://gist.github.com/lmossman/404b656c3e5726ddebc026eae118b7f8
Summary by CodeRabbit
Bug Fixes
Tests