⚡️ Speed up function outputs_to_objects
by 88%
#440
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 88% (0.88x) speedup for
outputs_to_objects
inunstructured_inference/models/tables.py
⏱️ Runtime :
19.7 milliseconds
→10.5 milliseconds
(best of31
runs)📝 Explanation and details
The optimized code achieves an 87% speedup through several key optimizations:
1. Eliminated redundant list conversions and element-wise operations
list(m.indices.detach().cpu().numpy())[0]
creates an intermediate listm.indices.detach().cpu().numpy()[0]
[elem.tolist() for elem in rescale_bboxes(...)]
calls.tolist()
on each bbox individually.tolist()
call after all tensor operations:rescaled.tolist()
2. Vectorized padding adjustment
[float(elem) - shift_size for elem in bbox]
in Python looprescaled = rescaled - pad
before conversion to list3. Reduced function call overhead
objects.append()
performs attribute lookup on each iterationappend = objects.append
caches the method reference, eliminating repeated lookups4. GPU tensor optimization
device=out_bbox.device
parameter totorch.tensor()
creation to avoid potential device transfer overheadTest case performance patterns:
The optimization is particularly effective for table detection models that typically process many bounding boxes simultaneously.
✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
models/test_tables.py::test_padded_results_has_right_dimensions
🌀 Generated Regression Tests and Runtime
⏪ Replay Tests and Runtime
test_pytest_test_unstructured_inference__replay_test_0.py::test_unstructured_inference_models_tables_outputs_to_objects
To edit these changes
git checkout codeflash/optimize-outputs_to_objects-metbo2xp
and push.