-
Notifications
You must be signed in to change notification settings - Fork 30
fix: monkey patch SQLiteDict in order to fix sqlite3.InterfaceError #725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/fix_sqlite3_interface_error#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/fix_sqlite3_interface_error Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
📝 WalkthroughWalkthroughAdds a temporary monkey-patch Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Caller
participant Cache as requests_cache.SQLiteDict
participant Patch as monkey_patched_get_item
participant DB as sqlite (via self.connection())
Caller->>Cache: __getitem__(key)
activate Cache
Cache->>Patch: invoke(self, key)
activate Patch
Patch->>DB: open connection (self.connection())
Patch->>DB: SELECT value FROM {table_name} WHERE key='{key}'
DB-->>Patch: row or none
alt key found
Patch->>Patch: deserialize(key, row[0])
Patch-->>Cache: return value
else key missing
Patch-->>Cache: raise KeyError
end
deactivate Patch
Cache-->>Caller: value or KeyError
deactivate Cache
note right of Patch: Temporary monkey-patch overrides SQLiteDict.__getitem__
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Would you like me to suggest alternative approaches (e.g., conditional patching based on detected requests_cache version, using parameterized SQL to avoid interpolation, or adding an integration test against an actual cache) — wdyt? Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
airbyte_cdk/sources/streams/http/http_client.py (3)
60-66
: Clarify intent and add type hints on the monkey patch for maintainabilityCould we add minimal type hints and an explicit note that this patch can be removed once requests_cache releases the upstream fix? It helps future readers. Wdyt?
You could adjust the signature/docstring like:
-def monkey_patched_get_item(self, key): - """ +def monkey_patched_get_item(self, key: Any) -> Any: + """ con.execute can lead to `sqlite3.InterfaceError: bad parameter or other API misuse`. There was a fix implemented @@ - but there is still no official releases of requests_cache that this is part of. Hence, we will monkeypatch it for now. + but there is still no official release of requests_cache that includes it. Hence, we monkey-patch for now. + TODO: Remove when upstream fix is available in a released version. """
78-78
: Make the monkey-patch idempotent and optionally gate itAssigning unconditionally can lead to surprising reassignments if this module is reloaded in-process (e.g., in tests) or if other code tries to patch as well. Could we guard it with a sentinel flag and optionally let users disable it via an env var for debugging? Wdyt?
Apply this diff:
-import requests_cache +import requests_cache @@ -requests_cache.SQLiteDict.__getitem__ = monkey_patched_get_item +if os.getenv("AIRBYTE_DISABLE_SQLITEDICT_MONKEYPATCH") not in {"1", "true", "True"}: + if not getattr(requests_cache.SQLiteDict, "_airbyte_sqlitedict_patched", False): + requests_cache.SQLiteDict.__getitem__ = monkey_patched_get_item + # Idempotence marker to avoid double-patching + requests_cache.SQLiteDict._airbyte_sqlitedict_patched = True
66-75
: Cursor cleanup robustnessMinor: wrapping fetch/close in a try/finally avoids leaking the cursor if fetchone() raises for any reason. The diff in my earlier comment includes that; can we keep it? Wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
airbyte_cdk/sources/streams/http/http_client.py
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-pokeapi
- GitHub Check: MyPy Check
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
airbyte_cdk/sources/streams/http/http_client.py (1)
66-74
: Stop interpolating the key into SQL; use a parameterized query with a safe fallback (security + correctness)Directly embedding
key
into the SQL string will break for keys containing quotes and exposes an injection vector. We can try a parameterized query first and only fall back to a safely-escaped literal if we hit the Py3.12 concurrentsqlite3.InterfaceError
you noted. Also addLIMIT 1
, ensure the cursor is closed viafinally
, and keep behavior identical otherwise. Wdyt?- # Using placeholders here with python 3.12+ and concurrency results in the error: - # sqlite3.InterfaceError: bad parameter or other API misuse - cur = con.execute(f"SELECT value FROM {self.table_name} WHERE key='{key}'") - row = cur.fetchone() - cur.close() - if not row: - raise KeyError(key) - - return self.deserialize(key, row[0]) + # Try a parameterized query for safety; fall back if we hit the Py3.12+ InterfaceError under concurrency. + try: + cur = con.execute( + f"SELECT value FROM {self.table_name} WHERE key = ? LIMIT 1", + (key,), + ) + except sqlite3.InterfaceError: + # Fallback: use a safely-escaped literal for keys; parameter binding can fail under concurrency on 3.12+ + safe_key = _escape_sqlite_literal(key) + cur = con.execute( + f"SELECT value FROM {self.table_name} WHERE key = '{safe_key}' LIMIT 1" + ) + try: + row = cur.fetchone() + finally: + cur.close() + if not row: + raise KeyError(key) + return self.deserialize(key, row[0])Additions outside this hunk (can live near the patch):
# Needed for the fallback path import sqlite3 # at top of file if not already present from typing import Any # already imported; shown here for clarity def _escape_sqlite_literal(value: Any) -> str: if isinstance(value, bytes): try: value = value.decode("utf-8", "surrogatepass") except Exception: value = value.decode(errors="ignore") # Double single quotes per SQLite escaping rules return str(value).replace("'", "''")One more sanity check:
self.table_name
is interpolated as an identifier. It’s typically controlled by the backend and not end-user input, but if there’s any chance it’s user-influenced in your code-paths, would you like me to add a whitelist/validator (e.g., allow only[A-Za-z0-9_]+
) for extra safety, wdyt?
🧹 Nitpick comments (3)
unit_tests/test_requests_cache_monkeypatch_version.py (2)
6-10
: Tighten docstring phrasing to clarify the intentMinor wording tweaks to improve clarity and grammar, wdyt?
- We need to be alerted once the requests_cache version is updated. The reason is that we monkey patch one of the - method in order to fix a bug until a new version is released. + We need to be alerted whenever the requests_cache version changes. We monkey-patch one of its methods + to work around a bug until a fixed release is available. - For more information about the reasons of this test, see monkey_patched_get_item in http_client.py + For more context, see monkey_patched_get_item in http_client.py.
11-11
: Make the failure actionable (keep exact pin, add a message)Since this test is intentionally brittle to alert on any version bump, could we keep the exact match but add an explicit failure message with a quick pointer to the patch location, wdyt?
- assert requests_cache.__version__ == "1.2.1" + expected = "1.2.1" + assert requests_cache.__version__ == expected, ( + f"requests_cache version changed to {requests_cache.__version__}. " + "Re-evaluate whether the monkey patch in " + "airbyte_cdk/sources/streams/http/http_client.py is still needed and update this test." + )Optionally, do you want an additional sanity test that the patch is actually applied (identity check on getitem) once http_client is imported? I can add that in a follow-up, wdyt?
airbyte_cdk/sources/streams/http/http_client.py (1)
61-64
: Tighten wording in the docstringNit: small grammar tweaks; also “official releases” -> “official release” reads better. Wdyt?
- con.execute can lead to `sqlite3.InterfaceError: bad parameter or other API misuse`. There was a fix implemented - [here](https://github.com/requests-cache/requests-cache/commit/5ca6b9cdcb2797dd2fed485872110ccd72aee55d#diff-f43db4a5edf931647c32dec28ea7557aae4cae8444af4b26c8ecbe88d8c925aaL330-R332) - but there is still no official releases of requests_cache that this is part of. Hence, we will monkeypatch it for now. + con.execute can raise `sqlite3.InterfaceError: bad parameter or other API misuse`. A fix landed + [here](https://github.com/requests-cache/requests-cache/commit/5ca6b9cdcb2797dd2fed485872110ccd72aee55d#diff-f43db4a5edf931647c32dec28ea7557aae4cae8444af4b26c8ecbe88d8c925aaL330-R332), + but there is still no official release of requests_cache that includes it. Hence, we will monkey-patch temporarily.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
airbyte_cdk/sources/streams/http/http_client.py
(1 hunks)unit_tests/test_requests_cache_monkeypatch_version.py
(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linters
airbyte_cdk/sources/streams/http/http_client.py
[error] 59-59: Function is missing a type annotation. [no-untyped-def] (Command: poetry run mypy --config-file mypy.ini airbyte_cdk)
[error] 77-77: Cannot assign to a method. [method-assign] (Command: poetry run mypy --config-file mypy.ini airbyte_cdk)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-intercom
- GitHub Check: Pytest (Fast)
- GitHub Check: Build and Inspect Python Package
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: SDM Docker Image Build
🔇 Additional comments (1)
airbyte_cdk/sources/streams/http/http_client.py (1)
77-77
: Gate monkey patch by requests-cache version and suppress mypy’s warningWe’ve confirmed that the fix for the
sqlite3.InterfaceError
inSQLiteDict.__getitem__
lands in the upcoming 1.3.0 release of requests-cache (see HISTORY.md under 1.3.0 “Fix sqlite3.InterfaceError…”). (requests-cache.readthedocs.io)Could we update the patch as follows?
• Import and define the fixed version at the top of the module
• Only apply the monkey-patch when the installed version is less than 1.3.0
• Silence mypy’smethod-assign
error on purpose-from packaging.version import Version +from packaging.version import Version +# First release including the sqlite3.InterfaceError fix in SQLiteDict.__getitem__ +FIXED_VER = "1.3.0" if Version(requests_cache.__version__) < Version(FIXED_VER): - requests_cache.SQLiteDict.__getitem__ = monkey_patched_get_item + requests_cache.SQLiteDict.__getitem__ = monkey_patched_get_item # type: ignore[method-assign]Alternatively, to reduce global side-effects, would you prefer subclassing our own
SQLiteDict
(orSQLiteCache
) inHttpClient
and overriding__getitem__
there instead of a module-level monkey-patch? Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pain levain
coderabbitai is fighting for a copy/paste from a different repo. We won't fix this
What
https://airbytehq-team.slack.com/archives/C0980AJD0F7/p1756166020078259?thread_ts=1756138243.067239&cid=C0980AJD0F7
https://github.com/airbytehq/airbyte-python-cdk/actions/runs/17223546541/job/48863700904?pr=719#step:8:33688
How
There is no version of requests_cache that got release with this fix so we monkey_patch stuff until there is a new release
I've added a test so that when the version gets updated, we can re-assess what is the next step
Summary by CodeRabbit
Bug Fixes
Chores
Tests