-
Notifications
You must be signed in to change notification settings - Fork 35
Rewrite pairwise to remove concatenation from blockwise #447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c835e07
to
9f3c549
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @aktech! Scanned this PR, looks good, some small comments below, but nothing critical. As discussed in the chat, it would be nice to see the correlation
metric performance report between blocks and blockwise.
004205e
to
289d550
Compare
@ravwojdyla I think I have addressed all the comments. I have added a quick performance test for After the analysis of the dask performance reports, I found out the compute time was same and the issue as due to serialisation/deserialisation as I had included the |
pre-commit seems to fail for unrelated changes:
|
This PR has conflicts, @aktech please rebase and push updated version 🙏 |
4c33143
to
92898e0
Compare
92898e0
to
1945e14
Compare
c42e8c1
to
dd0c537
Compare
The tests failures are unrelated, comments have been addressed. @ravwojdyla |
37af78b
to
e754773
Compare
Codecov Report
@@ Coverage Diff @@
## master #447 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 34 34
Lines 2361 2384 +23
=========================================
+ Hits 2361 2384 +23
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work @aktech! (please remove the notebooks before the merge)
@ravwojdyla Thanks for guiding me throughout, I have removed the notebooks. |
This rewrites pairwise distance to fix the concatenation in blockwise. The implementation is motivated from dask's matmul PR.
Fixes #375
I ran this on MalariaGEN data, it took nearly the same time as blocks implementation on 50 Workers, 16 GB each. The time comparions can be seen in the notebook here
The dask reports can be seen here:
Blocks implementation:
Blockwise implementation:
Important note: These times are only rough estimates for comparison, not for quoting. When I ran the blocks implementation on the same configuration of coiled cloud couple of months ago it took ~ 10 min, now its taking more, since workers are dying due to:
OSError: Timed out during handshake while connecting to tls:...
. I am not sure why is that happening since no code change has bee made in the blocks implementation. I am communicating with coiled team to find the root cause at the moment.cc @ravwojdyla
Update
A quick performance for correlation on only 10% of malariaGEN data:
TODO: