Skip to content

TypeError: A float32 tensor's data must be type of function Float32Array() when running under jest #57

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ekolve opened this issue Mar 28, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@ekolve
Copy link
Contributor

ekolve commented Mar 28, 2023

Given the following script:

const { pipeline, env } = require("@xenova/transformers");
env.onnx.wasm.numThreads = 1;


(async() => {
  let embedder = await pipeline('embeddings', 'sentence-transformers/all-MiniLM-L6-v2');
  let sentences = [
      'The quick brown fox jumps over the lazy dog.'
  ];
  let output = (await embedder(sentences)).tolist();
  console.log(output);
})();

This will execute without error when running the following from the shell

$ node test.js

But if the same script is executed using jest, I receive the following error:

$ npx jest test.js
TypeError: A float32 tensor's data must be type of function Float32Array() { [native code] }
    at new h (node_modules/onnxruntime-common/dist/webpack:/onnxruntime-common/lib/tensor-impl.ts:111:17)
    at m.run (node_modules/onnxruntime-common/dist/webpack:/onnxruntime-common/lib/inference-session-impl.ts:112:28)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at sessionRun (node_modules/@xenova/transformers/src/models.js:52:18)
    at Function._call (node_modules/@xenova/transformers/src/models.js:365:16)
    at Function._call (node_modules/@xenova/transformers/src/pipelines.js:69:23)
    at Function._call (node_modules/@xenova/transformers/src/pipelines.js:351:33)
    at test.js:10:17

This appears to be related to the choice of the ONNX runtime. If this line: https://github.com/xenova/transformers.js/blob/main/src/backends/onnx.js#L10 is changed to onnxruntime-web (instead of onnxruntime-node). Executing under jest will now succeed, so there appears to be some issue with jest + onnxruntime-node. In terms of resolution, one option would be to detect within backends/onnx.js whether execution is being done under jest, which can be done by checking process.env.JEST_WORKER_ID (this will be populated when running under jest). In terms of root cause, I'm not sure where the actual bug is (jest or onnxruntime-node), but it would make the most sense for it to be resolved there if its possible to determine which package is responsible.

Steps to reproduce:

  • create new directory
  • copy test script to file named test.js
  • npm install @xenova/transformers jest
  • node test.js
  • npx jest test.js
@ekolve ekolve added the bug Something isn't working label Mar 28, 2023
@xenova
Copy link
Collaborator

xenova commented Mar 28, 2023

Well that is quite strange indeed. Does jest have a certain layout/structure that must be followed?

Edit: After looking at their getting started docs, it does seem like it should be in a certain format. That said, the error message you get seems very unrelated.

Could you try use the same structure they do, and see if it works then?

@ekolve
Copy link
Contributor Author

ekolve commented Mar 29, 2023

Yeah, I get the same error after following the setup described in the getting started docs. I put the transformers code into the sum() function defined in the getting started tutorial and received the same error.

This is where the error comes from: https://github.com/microsoft/onnxruntime-openenclave/blob/bdc51bfe62822b2acd190c3bc98bfc92babf24e5/nodejs/lib/tensor-impl.ts#L100

constructor in this case is Float32Array (https://github.com/microsoft/onnxruntime-openenclave/blob/bdc51bfe62822b2acd190c3bc98bfc92babf24e5/nodejs/lib/tensor-impl.ts#L13)

and arg1 is the Float32Array of embeddings, but for some reason the instanceof call fails even though they appear to be identical.

@xenova
Copy link
Collaborator

xenova commented Mar 29, 2023

Is there a reason you are referencing the openenclave repo (which seems to be a public repo)? Isn't the correct source is located here?
https://github.com/microsoft/onnxruntime/blob/a6279d4cfb51be98a5cc25dc642013e358fcd01f/js/common/lib/tensor-impl.ts#L110-L124

I'll try debugging on my side too 👍

@ekolve
Copy link
Contributor Author

ekolve commented Mar 29, 2023

That was a mistake linking to the openclave repo - I searched across github for org:microsoft "tensor's data must be type of" and that was the only reference.

@ekolve
Copy link
Contributor Author

ekolve commented Mar 29, 2023

I figured out the source of this:

https://backend.cafe/should-you-use-jest-as-a-testing-library

jest creates a separate context to run in so it can provide features for testing (mocking, fake timers, etc.).

By following the fix in the article:

npx jest --testEnvironment jest-environment-node-single-context test.js

the error above goes away.

@ekolve ekolve closed this as completed Mar 29, 2023
@xenova
Copy link
Collaborator

xenova commented Mar 29, 2023

Well that is very interesting (and strange 👀)... Thanks for getting back to us!

On another note, it would probably be quite good to use a solid testing framework (like Jest) for this library. So, it's good to get this problem sorted out haha!

Durisvk pushed a commit to Durisvk/langchainjs that referenced this issue Sep 3, 2023
…shared in jest context isolation

The way chromadb imports @xenova/transformers package in file
chromadb/src/embeddings/TransformersEmbeddingFunction.ts:33 makes it result in random segment fault errors terminating the tests prematurely.
This fix contains a code that bypasses chromadb package and directly uses the @xenova/transformers package

Due to how jest isolates the context of each running test (huggingface/transformers.js#57, https://github.com/kayahr/jest-environment-node-single-context,
jestjs/jest#2549) - it makes it impossible for onnxruntime-node package to validate the array passed as an input to it is actually an `instanceof Float32Array`
type. The `instanceof` results in false because the globals are different between context. This commit shares the Float32Array global between each context.
jacoblee93 added a commit to langchain-ai/langchainjs that referenced this issue Sep 5, 2023
…shared in jest context isolation (#2487)

* Local embeddings function with Transformer.js API

* added packages

* Adds integration test, update naming and deps

* Formatting

* Fix random test fails due to segfault by chromadb & Float32Array not shared in jest context isolation

The way chromadb imports @xenova/transformers package in file
chromadb/src/embeddings/TransformersEmbeddingFunction.ts:33 makes it result in random segment fault errors terminating the tests prematurely.
This fix contains a code that bypasses chromadb package and directly uses the @xenova/transformers package

Due to how jest isolates the context of each running test (huggingface/transformers.js#57, https://github.com/kayahr/jest-environment-node-single-context,
jestjs/jest#2549) - it makes it impossible for onnxruntime-node package to validate the array passed as an input to it is actually an `instanceof Float32Array`
type. The `instanceof` results in false because the globals are different between context. This commit shares the Float32Array global between each context.

* Update deps

* Docs change

* Add entrypoint

* Fix concurrency

* Docs update

* Fix docs typo

---------

Co-authored-by: l4b4r4b4b4 <[email protected]>
Co-authored-by: Lucas Hänke de Cansino <[email protected]>
Co-authored-by: jacoblee93 <[email protected]>
Co-authored-by: Juraj Carnogursky <[email protected]>
@sneko
Copy link

sneko commented Mar 26, 2024

Thanks for pointing me to the solution! In my case and to not change the logic of my other tests, I put it specifically on top of my test file concerned by the issue:

/**
 * @jest-environment jest-environment-node-single-context
 */

describe(...);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants