Skip to content

Conversation

vlad-perevezentsev
Copy link
Collaborator

This PR suggests extending dpnp.linalg.lu_factor() #2557 for batch arrays

In addition, this PR includes:

  1. An updated implementation of getrf_batch to support non-square matrices.
  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • Have you added documentation for your changes, if necessary?
  • Have you added your changes to the changelog?

Copy link
Contributor

View rendered docs @ https://intelpython.github.io/dpnp/pull/2565/index.html

Copy link
Contributor

github-actions bot commented Aug 21, 2025

Array API standard conformance tests for dpnp=0.19.0dev3=py313h509198e_35 ran successfully.
Passed: 1227
Failed: 0
Skipped: 9

Comment on lines +512 to +529
# Copy each 2D slice to a new array because getrf will destroy
# the input matrix
a_vecs[i] = dpnp.empty_like(a[i], order="F", dtype=res_type)
ht_ev, copy_ev = ti._copy_usm_ndarray_into_usm_ndarray(
src=a_usm_arr[i],
dst=a_vecs[i].get_array(),
sycl_queue=a_sycl_queue,
depends=dep_evs,
)
_manager.add_event_pair(ht_ev, copy_ev)

ipiv_vecs[i] = dpnp.empty(
(k,),
dtype=dpnp.int64,
order="C",
usm_type=a_usm_type,
sycl_queue=a_sycl_queue,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason not to perform single copies and allocations, then just make views along the batch axis?

that is, copy all of src into one new array, and allocate all ipiv_vecs arrays as one allocation, then split both along batch axis in the loop

dep_evs = _manager.submitted_events

# Process each batch
for i in range(batch_size):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may not make sense in this PR, but it might be worth looking at moving this entire loop into C++, a custom pybind11 function that takes lists of arrays instead and loops, calling _getrf on each

@abagusetty
Copy link

Just wanted to tag, it could fix: #2498

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants