Skip to content

Conversation

jahnvi480
Copy link
Contributor

@jahnvi480 jahnvi480 commented Aug 21, 2025

Work Item / Issue Reference

AB#32890


Summary

This pull request adds support for the setinputsizes method to the mssql_python DB-API cursor, allowing users to explicitly specify SQL parameter types and sizes for queries. This enhancement improves parameter binding control, especially for batch operations and cases where automatic type inference may be insufficient. The changes also ensure that input size specifications are reset after each execution, and comprehensive tests are included to verify the new functionality and its integration with both execute and executemany.

New feature: Explicit parameter typing with setinputsizes

  • Added a setinputsizes method to the cursor class, enabling users to declare SQL types, sizes, and decimal digits for query parameters. This method stores the input sizes and provides detailed documentation and usage examples. (mssql_python/cursor.py)
  • Implemented logic in parameter binding to use explicitly set input sizes when available, falling back to automatic type inference otherwise. This applies to both single and batch executions. (mssql_python/cursor.py)

Robustness and reset behavior

  • Ensured that input size specifications are automatically reset after each call to execute or executemany, preventing unintended reuse across statements. (mssql_python/cursor.py)

Testing and validation

  • Added comprehensive tests to cover basic usage, batch inserts with floats, reset behavior, and explicit override of type inference using setinputsizes. These tests verify correct parameter binding, data insertion, and reset semantics. (tests/test_004_cursor.py)

Internal improvements

  • Introduced helper methods for mapping SQL types to C types and for resetting input sizes, improving code clarity and maintainability. (mssql_python/cursor.py)

These changes provide more reliable and predictable parameter binding for users, especially in complex or high-performance scenarios.

@github-actions github-actions bot added the pr-size: medium Moderate update size label Aug 21, 2025
@@ -463,6 +464,71 @@ def _check_closed(self):
if self.closed:
raise Exception("Operation cannot be performed: the cursor is closed.")

def setinputsizes(self, sizes):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please consider adding type annotations to new methods such as setinputsizes, _reset_inputsizes, and _get_c_type_for_sql_type. Type annotations will improve code clarity, enable better static analysis, and make the codebase more maintainable as it grows.

)

# Check if we have explicit type information from setinputsizes
if hasattr(self, '_inputsizes') and self._inputsizes and i < len(self._inputsizes):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are places where you check if the object (self) has an attribute called _inputsizes using hasattr(self, '_inputsizes').
The reviewer noticed that the _inputsizes attribute is always created (initialized) when the object is constructed (in the class’s __init__ method).
If an attribute is always present (because it’s defined in the constructor), you don’t need to check if it exists every time you use it.
(It will always exist, unless something very unusual happens in your code.)
These hasattr checks are, therefore, unnecessary ("redundant").
Removing them will make your code cleaner, easier to read, and easier to maintain.

sql_type, c_type, column_size, decimal_digits = self._map_sql_type(
parameter, parameters_list, i
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please double-check that all parameterized queries remain fully protected against SQL injection—even when input sizes or types are set by users via setinputsizes. It's important to ensure that user-supplied values for input sizes/types cannot be used to inject malicious SQL or bypass query parameterization. If possible, add validation or sanitization where needed, and consider adding a test case for this scenario.


# Set input sizes for parameters
cursor.setinputsizes([
(ConstantsDDBC.SQL_WVARCHAR.value, 100, 0),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets refine usage of constants a bit more, we should probably export them to them module level
example usage from pyodbc

crsr.setinputsizes([(pyodbc.SQL_WVARCHAR, 50, 0), (pyodbc.SQL_DECIMAL, 18, 4)])

we can probably go for something like mssql_python.SQL_WVARCHAR
can be a separate task since the usage is end user facing

except Exception as e:
log('warning', f"Failed to set query timeout: {e}")

param_info = ddbc_bindings.ParamInfo
param_count = len(seq_of_parameters[0])
parameters_type = []

# Make a copy of the parameters for potential transformation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider raising a warning or error if the number of input sizes set via setinputsizes does not match the number of parameters provided to executemany. This will help catch user mistakes early and prevent subtle bugs due to mismatched parameter and input size definitions.

except Exception as e:
log('warning', f"Failed to set query timeout: {e}")

param_info = ddbc_bindings.ParamInfo
param_count = len(seq_of_parameters[0])
parameters_type = []

# Make a copy of the parameters for potential transformation
processed_parameters = [list(params) for params in seq_of_parameters]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is using [list(params) for params in seq_of_parameters] to create a new list of lists from seq_of_parameters (which is probably a list or sequence of parameters for batch inserts).

When you do this for a very large number of rows (for example, thousands or millions), it creates a copy of every row in memory. This can use a lot of memory and might slow things down or even cause crashes if there isn’t enough memory.

If possible, don’t create a big copy of all the data at once.
Instead, you could use a generator expression (which makes one item at a time, only when needed) or change the items in place (if it’s safe to do so).

Current Implementation:

processed_parameters = [list(params) for params in seq_of_parameters]

This creates a new list in memory that contains a copy of every params as a list.
If seq_of_parameters has 1,000,000 items, Python immediately builds a list with 1,000,000 copies in memory.
This can use a lot of memory at once.

Generator Expression:

processed_parameters = (list(params) for params in seq_of_parameters)

This creates a generator—not a list. It doesn’t copy anything right away.
Each list(params) is created only when you need it (for example, when you loop over new_seq).
Much less memory is used because only one item is in memory at a time.

List comprehension is eager: makes everything up front, uses more memory.
Generator expression is lazy: makes each result only when needed, uses less memory.

@@ -1556,6 +1556,189 @@ def test_decimal_separator_calculations(cursor, db_connection):
cursor.execute("DROP TABLE IF EXISTS #pytest_decimal_calc_test")
db_connection.commit()

def test_cursor_setinputsizes_basic(db_connection):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add tests for cases where the number of input sizes does not match the number of parameters.
  • Add tests with None/NULL values to verify robust handling.
  • Add tests for all supported SQL types, including edge types (DATE, TIME, BINARY).

Copy link
Contributor

@sumitmsft sumitmsft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-size: medium Moderate update size
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants