Skip to content

ONNX Transform Crashing or Freezing #1228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vaeksare opened this issue Oct 11, 2018 · 4 comments
Closed

ONNX Transform Crashing or Freezing #1228

vaeksare opened this issue Oct 11, 2018 · 4 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@vaeksare
Copy link
Member

ONNX Transform occasionally crashes or freezes when running certain onnx models (currently found with the Split operator).

  • The error is non deterministic and varies based on different machines
  • On some machines, the ML.NET process will freeze, while on others it will simply silently crash without producing any output
  • The larger the input, the more likely the issue seems to occur - almost never happens with <100 inputs, sometimes happens with 100-300 inputs, and almost always happens with >300 inputs
  • When ML.NET is built in debug mode, the error will never occur. It only happens if it is built in release mode
  • The issue is not tied to any specific input. Smaller inputs will never produce an error, while larger ones almost always will (in release mode)
@vaeksare vaeksare added the bug Something isn't working label Oct 11, 2018
@Zruty0 Zruty0 added the need info This issue needs more info before triage label Oct 15, 2018
@Zruty0
Copy link
Contributor

Zruty0 commented Oct 15, 2018

Could you give an update please? Is it actively worked on?

@vaeksare
Copy link
Member Author

@Zruty0 Jignesh is currently investigating this issue.

@Zruty0 Zruty0 removed the need info This issue needs more info before triage label Oct 15, 2018
@shauheen shauheen added this to the 1018 milestone Oct 18, 2018
@jignparm
Copy link
Contributor

To close this out, the crash was occurring at the Sonoma Tensor.CopyTo(List) function. Replacing it with a call to CopyTo(float []) has resolved the issue. The backend Sonoma code for these two functions is quite different, especially memory management. Arrays need to have memory pre-allocated, whereas Lists are appended onto one element at a time. Additionally, the CopyTo(float[]) function explicitly pins the destination memory using "fixed" statement before copying the contents, whereas the CopyTo(List) function is not using "fixed", and therefore not explicitly pinning the destination during the append operations.

We'll need to fix the CopyTo(list) function in Sonoma, but for now using the more stable (and more efficient) CopyTo(float[]) function has fixed the issue in several trial runs offline.

@shauheen
Copy link
Contributor

This issue is not fixed only avoided by #1310

@shauheen shauheen removed this from the 1018 milestone Oct 22, 2018
@shauheen shauheen added this to the 1118 milestone Nov 26, 2018
@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants