-
Notifications
You must be signed in to change notification settings - Fork 604
Arm backend: Use dbg_fail when node visitors raise exceptions #9391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds a try-expect around the node_visitor call to be able to call dbg_fail() when an error/exception is raised. Signed-off-by: Oscar Andersson <[email protected]> Change-Id: I3b633e1ff255fa5b3a5257016acfe2e9dc03b033
Signed-off-by: Oscar Andersson <[email protected]> Change-Id: Id7f44c539682a55e06564c7b3294988c122c00b3
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9391
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 Cancelled JobAs of commit ac1f223 with merge base 97bca05 ( CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Hi could you help us point out what error you got from this, to help us knowing how to avoid this better in the future or even better could the internal tests be ported to be runned in github also so we can get proper testing while merging? |
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@zingo @oscarandersson8218 can you guys take a look at the failing job first: https://github.com/pytorch/executorch/actions/runs/13940909648/job/39017417934?pr=9391 |
It seem to just have been killed in the middle of building. As for https://github.com/pytorch/executorch/actions/runs/13940909648/job/39017419390?pr=9391 |
I retriggered a run of the arm tests |
Taking with @huydhn, he suspects if we are running out of RAM on this 16G (c5.2xlarge) runner. Do you know @zingo on top of your head how much RAM we use during the container run. 137 seems like SIGKILL (9) so could be OOM. We can try with larger runner and see if this goes away. Let's see if CI passes on this - #9409 |
Interesting, maybe that is what happens. But also the interesting if it is that when we only building that should be kind of same. |
Interesting, maybe that is what happens. But also the interesting if it is that when we only test building, maybe we are the the only one using --parallel making it run more stuff in parallel and stuff as grown over time 🤔 Also I think the cmake version got bumped recently maybe the --parallel flag got a bit different behavior? |
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
hmm.. this is interesting. We could try and root cause this and revert back to 16G if we can. At least it is green for now. |
Yes maybe, that would be interesting |
@digantdesai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Sorry still on me, I will take a look today or tomorrow. Retrying after rebase if it works :p |
Unrelated CI failures. @digantdesai can you have a look at this again? :) |
Internal CI looks good. Stamping. |
@digantdesai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Adds a try-expect around the node_visitor call to be able to call dbg_fail() when an error/exception is raised. Signed-off-by: Oscar Andersson <[email protected]> Co-authored-by: Digant Desai <[email protected]>
Adds a try-expect around the node_visitor call to be able to call dbg_fail() when an error/exception is raised.
cc @digantdesai @freddan80 @per @zingo