Skip to content

Conversation

maxi297
Copy link
Contributor

@maxi297 maxi297 commented Aug 26, 2025

What

https://airbytehq-team.slack.com/archives/C0980AJD0F7/p1756166020078259?thread_ts=1756138243.067239&cid=C0980AJD0F7

https://github.com/airbytehq/airbyte-python-cdk/actions/runs/17223546541/job/48863700904?pr=719#step:8:33688

How

There is no version of requests_cache that got release with this fix so we monkey_patch stuff until there is a new release

I've added a test so that when the version gets updated, we can re-assess what is the next step

Summary by CodeRabbit

  • Bug Fixes

    • Resolved intermittent cache read errors that could cause HTTP-based syncs to fail.
    • Improved reliability when accessing the local cache, reducing unexpected exceptions during runs.
  • Chores

    • Applied a temporary workaround to ensure stable cache behavior without requiring user configuration changes.
  • Tests

    • Added a safeguard test to detect incompatible cache library versions early.

@github-actions github-actions bot added bug Something isn't working security labels Aug 26, 2025
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/fix_sqlite3_interface_error#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/fix_sqlite3_interface_error

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

Copy link
Contributor

coderabbitai bot commented Aug 26, 2025

📝 Walkthrough

Walkthrough

Adds a temporary monkey-patch monkey_patched_get_item(self, key) in airbyte_cdk/sources/streams/http/http_client.py that replaces requests_cache.SQLiteDict.__getitem__ by querying the DB via self.connection(), raising KeyError if missing and returning self.deserialize(...); also adds a unit test asserting requests_cache.__version__ == "1.2.1".

Changes

Cohort / File(s) Summary
HTTP cache monkey patch
airbyte_cdk/sources/streams/http/http_client.py
Added monkey_patched_get_item(self, key) which gets a DB connection via self.connection(), executes SELECT value FROM {table_name} WHERE key='{key}', fetches the row, closes the cursor, raises KeyError if not found, and returns self.deserialize(key, row[0]). Assigns requests_cache.SQLiteDict.__getitem__ = monkey_patched_get_item. Docstring notes this is a temporary workaround and references an external fix. (No new top-level sqlite3 import added in the patch.)
Version guard test
unit_tests/test_requests_cache_monkeypatch_version.py
New test asserting requests_cache.__version__ == "1.2.1" to surface future version bumps that may invalidate the monkey-patch.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant Cache as requests_cache.SQLiteDict
  participant Patch as monkey_patched_get_item
  participant DB as sqlite (via self.connection())

  Caller->>Cache: __getitem__(key)
  activate Cache
  Cache->>Patch: invoke(self, key)
  activate Patch
  Patch->>DB: open connection (self.connection())
  Patch->>DB: SELECT value FROM {table_name} WHERE key='{key}'
  DB-->>Patch: row or none
  alt key found
    Patch->>Patch: deserialize(key, row[0])
    Patch-->>Cache: return value
  else key missing
    Patch-->>Cache: raise KeyError
  end
  deactivate Patch
  Cache-->>Caller: value or KeyError
  deactivate Cache
  note right of Patch: Temporary monkey-patch overrides SQLiteDict.__getitem__
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Would you like me to suggest alternative approaches (e.g., conditional patching based on detected requests_cache version, using parameterized SQL to avoid interpolation, or adding an integration test against an actual cache) — wdyt?

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 07f1ce3 and e358cce.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/streams/http/http_client.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • airbyte_cdk/sources/streams/http/http_client.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Check: source-shopify
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch maxi297/fix_sqlite3_interface_error

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

github-actions bot commented Aug 26, 2025

PyTest Results (Fast)

3 693 tests  +1   3 682 ✅ +1   6m 36s ⏱️ -1s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit e358cce. ± Comparison against base commit f525803.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
airbyte_cdk/sources/streams/http/http_client.py (3)

60-66: Clarify intent and add type hints on the monkey patch for maintainability

Could we add minimal type hints and an explicit note that this patch can be removed once requests_cache releases the upstream fix? It helps future readers. Wdyt?

You could adjust the signature/docstring like:

-def monkey_patched_get_item(self, key):
-    """
+def monkey_patched_get_item(self, key: Any) -> Any:
+    """
     con.execute can lead to `sqlite3.InterfaceError: bad parameter or other API misuse`. There was a fix implemented
@@
-    but there is still no official releases of requests_cache that this is part of. Hence, we will monkeypatch it for now.
+    but there is still no official release of requests_cache that includes it. Hence, we monkey-patch for now.
+    TODO: Remove when upstream fix is available in a released version.
     """

78-78: Make the monkey-patch idempotent and optionally gate it

Assigning unconditionally can lead to surprising reassignments if this module is reloaded in-process (e.g., in tests) or if other code tries to patch as well. Could we guard it with a sentinel flag and optionally let users disable it via an env var for debugging? Wdyt?

Apply this diff:

-import requests_cache
+import requests_cache
@@
-requests_cache.SQLiteDict.__getitem__ = monkey_patched_get_item
+if os.getenv("AIRBYTE_DISABLE_SQLITEDICT_MONKEYPATCH") not in {"1", "true", "True"}:
+    if not getattr(requests_cache.SQLiteDict, "_airbyte_sqlitedict_patched", False):
+        requests_cache.SQLiteDict.__getitem__ = monkey_patched_get_item
+        # Idempotence marker to avoid double-patching
+        requests_cache.SQLiteDict._airbyte_sqlitedict_patched = True

66-75: Cursor cleanup robustness

Minor: wrapping fetch/close in a try/finally avoids leaking the cursor if fetchone() raises for any reason. The diff in my earlier comment includes that; can we keep it? Wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f525803 and 7ec2767.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/streams/http/http_client.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: MyPy Check
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)

Copy link

github-actions bot commented Aug 26, 2025

PyTest Results (Full)

3 696 tests  +1   3 685 ✅ +1   9m 38s ⏱️ -3s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit e358cce. ± Comparison against base commit f525803.

♻️ This comment has been updated with latest results.

coderabbitai[bot]
coderabbitai bot previously requested changes Aug 26, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
airbyte_cdk/sources/streams/http/http_client.py (1)

66-74: Stop interpolating the key into SQL; use a parameterized query with a safe fallback (security + correctness)

Directly embedding key into the SQL string will break for keys containing quotes and exposes an injection vector. We can try a parameterized query first and only fall back to a safely-escaped literal if we hit the Py3.12 concurrent sqlite3.InterfaceError you noted. Also add LIMIT 1, ensure the cursor is closed via finally, and keep behavior identical otherwise. Wdyt?

-        # Using placeholders here with python 3.12+ and concurrency results in the error:
-        # sqlite3.InterfaceError: bad parameter or other API misuse
-        cur = con.execute(f"SELECT value FROM {self.table_name} WHERE key='{key}'")
-        row = cur.fetchone()
-        cur.close()
-        if not row:
-            raise KeyError(key)
-
-        return self.deserialize(key, row[0])
+        # Try a parameterized query for safety; fall back if we hit the Py3.12+ InterfaceError under concurrency.
+        try:
+            cur = con.execute(
+                f"SELECT value FROM {self.table_name} WHERE key = ? LIMIT 1",
+                (key,),
+            )
+        except sqlite3.InterfaceError:
+            # Fallback: use a safely-escaped literal for keys; parameter binding can fail under concurrency on 3.12+
+            safe_key = _escape_sqlite_literal(key)
+            cur = con.execute(
+                f"SELECT value FROM {self.table_name} WHERE key = '{safe_key}' LIMIT 1"
+            )
+        try:
+            row = cur.fetchone()
+        finally:
+            cur.close()
+        if not row:
+            raise KeyError(key)
+        return self.deserialize(key, row[0])

Additions outside this hunk (can live near the patch):

# Needed for the fallback path
import sqlite3  # at top of file if not already present

from typing import Any  # already imported; shown here for clarity

def _escape_sqlite_literal(value: Any) -> str:
    if isinstance(value, bytes):
        try:
            value = value.decode("utf-8", "surrogatepass")
        except Exception:
            value = value.decode(errors="ignore")
    # Double single quotes per SQLite escaping rules
    return str(value).replace("'", "''")

One more sanity check: self.table_name is interpolated as an identifier. It’s typically controlled by the backend and not end-user input, but if there’s any chance it’s user-influenced in your code-paths, would you like me to add a whitelist/validator (e.g., allow only [A-Za-z0-9_]+) for extra safety, wdyt?

🧹 Nitpick comments (3)
unit_tests/test_requests_cache_monkeypatch_version.py (2)

6-10: Tighten docstring phrasing to clarify the intent

Minor wording tweaks to improve clarity and grammar, wdyt?

-    We need to be alerted once the requests_cache version is updated. The reason is that we monkey patch one of the
-    method in order to fix a bug until a new version is released.
+    We need to be alerted whenever the requests_cache version changes. We monkey-patch one of its methods
+    to work around a bug until a fixed release is available.
 
-    For more information about the reasons of this test, see monkey_patched_get_item in http_client.py
+    For more context, see monkey_patched_get_item in http_client.py.

11-11: Make the failure actionable (keep exact pin, add a message)

Since this test is intentionally brittle to alert on any version bump, could we keep the exact match but add an explicit failure message with a quick pointer to the patch location, wdyt?

-    assert requests_cache.__version__ == "1.2.1"
+    expected = "1.2.1"
+    assert requests_cache.__version__ == expected, (
+        f"requests_cache version changed to {requests_cache.__version__}. "
+        "Re-evaluate whether the monkey patch in "
+        "airbyte_cdk/sources/streams/http/http_client.py is still needed and update this test."
+    )

Optionally, do you want an additional sanity test that the patch is actually applied (identity check on getitem) once http_client is imported? I can add that in a follow-up, wdyt?

airbyte_cdk/sources/streams/http/http_client.py (1)

61-64: Tighten wording in the docstring

Nit: small grammar tweaks; also “official releases” -> “official release” reads better. Wdyt?

-    con.execute can lead to `sqlite3.InterfaceError: bad parameter or other API misuse`. There was a fix implemented
-    [here](https://github.com/requests-cache/requests-cache/commit/5ca6b9cdcb2797dd2fed485872110ccd72aee55d#diff-f43db4a5edf931647c32dec28ea7557aae4cae8444af4b26c8ecbe88d8c925aaL330-R332)
-    but there is still no official releases of requests_cache that this is part of. Hence, we will monkeypatch it for now.
+    con.execute can raise `sqlite3.InterfaceError: bad parameter or other API misuse`. A fix landed
+    [here](https://github.com/requests-cache/requests-cache/commit/5ca6b9cdcb2797dd2fed485872110ccd72aee55d#diff-f43db4a5edf931647c32dec28ea7557aae4cae8444af4b26c8ecbe88d8c925aaL330-R332),
+    but there is still no official release of requests_cache that includes it. Hence, we will monkey-patch temporarily.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7ec2767 and 07f1ce3.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/streams/http/http_client.py (1 hunks)
  • unit_tests/test_requests_cache_monkeypatch_version.py (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linters
airbyte_cdk/sources/streams/http/http_client.py

[error] 59-59: Function is missing a type annotation. [no-untyped-def] (Command: poetry run mypy --config-file mypy.ini airbyte_cdk)


[error] 77-77: Cannot assign to a method. [method-assign] (Command: poetry run mypy --config-file mypy.ini airbyte_cdk)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-intercom
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Build and Inspect Python Package
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: SDM Docker Image Build
🔇 Additional comments (1)
airbyte_cdk/sources/streams/http/http_client.py (1)

77-77: Gate monkey patch by requests-cache version and suppress mypy’s warning

We’ve confirmed that the fix for the sqlite3.InterfaceError in SQLiteDict.__getitem__ lands in the upcoming 1.3.0 release of requests-cache (see HISTORY.md under 1.3.0 “Fix sqlite3.InterfaceError…”). (requests-cache.readthedocs.io)

Could we update the patch as follows?

• Import and define the fixed version at the top of the module
• Only apply the monkey-patch when the installed version is less than 1.3.0
• Silence mypy’s method-assign error on purpose

-from packaging.version import Version
+from packaging.version import Version

+# First release including the sqlite3.InterfaceError fix in SQLiteDict.__getitem__
+FIXED_VER = "1.3.0"

 if Version(requests_cache.__version__) < Version(FIXED_VER):
-    requests_cache.SQLiteDict.__getitem__ = monkey_patched_get_item
+    requests_cache.SQLiteDict.__getitem__ = monkey_patched_get_item  # type: ignore[method-assign]

Alternatively, to reduce global side-effects, would you prefer subclassing our own SQLiteDict (or SQLiteCache) in HttpClient and overriding __getitem__ there instead of a module-level monkey-patch? Wdyt?

Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pain levain

@maxi297 maxi297 dismissed coderabbitai[bot]’s stale review August 26, 2025 16:48

coderabbitai is fighting for a copy/paste from a different repo. We won't fix this

@maxi297 maxi297 merged commit 08ac482 into main Aug 26, 2025
28 checks passed
@maxi297 maxi297 deleted the maxi297/fix_sqlite3_interface_error branch August 26, 2025 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants