-
Notifications
You must be signed in to change notification settings - Fork 18
FEAT: Adding setinputsizes #192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: jahnvi/connection_timeout
Are you sure you want to change the base?
Conversation
@@ -463,6 +464,71 @@ def _check_closed(self): | |||
if self.closed: | |||
raise Exception("Operation cannot be performed: the cursor is closed.") | |||
|
|||
def setinputsizes(self, sizes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please consider adding type annotations to new methods such as setinputsizes, _reset_inputsizes, and _get_c_type_for_sql_type. Type annotations will improve code clarity, enable better static analysis, and make the codebase more maintainable as it grows.
) | ||
|
||
# Check if we have explicit type information from setinputsizes | ||
if hasattr(self, '_inputsizes') and self._inputsizes and i < len(self._inputsizes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are places where you check if the object (self
) has an attribute called _inputsizes
using hasattr(self, '_inputsizes')
.
The reviewer noticed that the _inputsizes
attribute is always created (initialized) when the object is constructed (in the class’s __init__
method).
If an attribute is always present (because it’s defined in the constructor), you don’t need to check if it exists every time you use it.
(It will always exist, unless something very unusual happens in your code.)
These hasattr
checks are, therefore, unnecessary ("redundant").
Removing them will make your code cleaner, easier to read, and easier to maintain.
sql_type, c_type, column_size, decimal_digits = self._map_sql_type( | ||
parameter, parameters_list, i | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please double-check that all parameterized queries remain fully protected against SQL injection—even when input sizes or types are set by users via setinputsizes
. It's important to ensure that user-supplied values for input sizes/types cannot be used to inject malicious SQL or bypass query parameterization. If possible, add validation or sanitization where needed, and consider adding a test case for this scenario.
|
||
# Set input sizes for parameters | ||
cursor.setinputsizes([ | ||
(ConstantsDDBC.SQL_WVARCHAR.value, 100, 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets refine usage of constants a bit more, we should probably export them to them module level
example usage from pyodbc
crsr.setinputsizes([(pyodbc.SQL_WVARCHAR, 50, 0), (pyodbc.SQL_DECIMAL, 18, 4)])
we can probably go for something like mssql_python.SQL_WVARCHAR
can be a separate task since the usage is end user facing
except Exception as e: | ||
log('warning', f"Failed to set query timeout: {e}") | ||
|
||
param_info = ddbc_bindings.ParamInfo | ||
param_count = len(seq_of_parameters[0]) | ||
parameters_type = [] | ||
|
||
# Make a copy of the parameters for potential transformation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider raising a warning or error if the number of input sizes set via setinputsizes
does not match the number of parameters provided to executemany
. This will help catch user mistakes early and prevent subtle bugs due to mismatched parameter and input size definitions.
except Exception as e: | ||
log('warning', f"Failed to set query timeout: {e}") | ||
|
||
param_info = ddbc_bindings.ParamInfo | ||
param_count = len(seq_of_parameters[0]) | ||
parameters_type = [] | ||
|
||
# Make a copy of the parameters for potential transformation | ||
processed_parameters = [list(params) for params in seq_of_parameters] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is using [list(params) for params in seq_of_parameters] to create a new list of lists from seq_of_parameters (which is probably a list or sequence of parameters for batch inserts).
When you do this for a very large number of rows (for example, thousands or millions), it creates a copy of every row in memory. This can use a lot of memory and might slow things down or even cause crashes if there isn’t enough memory.
If possible, don’t create a big copy of all the data at once.
Instead, you could use a generator expression (which makes one item at a time, only when needed) or change the items in place (if it’s safe to do so).
Current Implementation:
processed_parameters = [list(params) for params in seq_of_parameters]
This creates a new list in memory that contains a copy of every params as a list.
If seq_of_parameters has 1,000,000 items, Python immediately builds a list with 1,000,000 copies in memory.
This can use a lot of memory at once.
Generator Expression:
processed_parameters = (list(params) for params in seq_of_parameters)
This creates a generator—not a list. It doesn’t copy anything right away.
Each list(params) is created only when you need it (for example, when you loop over new_seq).
Much less memory is used because only one item is in memory at a time.
List comprehension is eager: makes everything up front, uses more memory.
Generator expression is lazy: makes each result only when needed, uses less memory.
@@ -1556,6 +1556,189 @@ def test_decimal_separator_calculations(cursor, db_connection): | |||
cursor.execute("DROP TABLE IF EXISTS #pytest_decimal_calc_test") | |||
db_connection.commit() | |||
|
|||
def test_cursor_setinputsizes_basic(db_connection): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add tests for cases where the number of input sizes does not match the number of parameters.
- Add tests with
None
/NULL values to verify robust handling. - Add tests for all supported SQL types, including edge types (DATE, TIME, BINARY).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments...
Work Item / Issue Reference
Summary
This pull request adds support for the
setinputsizes
method to themssql_python
DB-API cursor, allowing users to explicitly specify SQL parameter types and sizes for queries. This enhancement improves parameter binding control, especially for batch operations and cases where automatic type inference may be insufficient. The changes also ensure that input size specifications are reset after each execution, and comprehensive tests are included to verify the new functionality and its integration with bothexecute
andexecutemany
.New feature: Explicit parameter typing with
setinputsizes
setinputsizes
method to thecursor
class, enabling users to declare SQL types, sizes, and decimal digits for query parameters. This method stores the input sizes and provides detailed documentation and usage examples. (mssql_python/cursor.py
)mssql_python/cursor.py
)Robustness and reset behavior
execute
orexecutemany
, preventing unintended reuse across statements. (mssql_python/cursor.py
)Testing and validation
setinputsizes
. These tests verify correct parameter binding, data insertion, and reset semantics. (tests/test_004_cursor.py
)Internal improvements
mssql_python/cursor.py
)These changes provide more reliable and predictable parameter binding for users, especially in complex or high-performance scenarios.