-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Fix random test fails due to segfault by chromadb & Float32Array not shared in jest context isolation #2487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…shared in jest context isolation The way chromadb imports @xenova/transformers package in file chromadb/src/embeddings/TransformersEmbeddingFunction.ts:33 makes it result in random segment fault errors terminating the tests prematurely. This fix contains a code that bypasses chromadb package and directly uses the @xenova/transformers package Due to how jest isolates the context of each running test (huggingface/transformers.js#57, https://github.com/kayahr/jest-environment-node-single-context, jestjs/jest#2549) - it makes it impossible for onnxruntime-node package to validate the array passed as an input to it is actually an `instanceof Float32Array` type. The `instanceof` results in false because the globals are different between context. This commit shares the Float32Array global between each context.
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Makes sense, thank you for looking into this! |
import { HuggingFaceTransformersEmbeddings } from "langchain/embeddings/hf_transformers"; | ||
|
||
const model = new HuggingFaceTransformersEmbeddings({ | ||
modelName: "Xenova/all-MiniLM-L6-v2", // In Node.js defaults to process.env.OPENAI_API_KEY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't say that this defaults to OPENAI_API_KEY
but I guess this is generated by some automation tool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh whoops thanks - was from the original PR
const documentRes = await model.embedDocuments(["Hello world", "Bye bye"]); | ||
console.log({ documentRes }); | ||
}; | ||
const model = new GooglePaLMEmbeddings({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm starting to dislike this commit cause it's messing up with other embeddings 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to give more clarifications - with new top-level async-await
ability this is all okay, but if someone is using import { run } ...
from this module then it would fail for them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, but these are just displayed in the docs though and aren't really meant to be imported. We're moving towards using top-level async/await
for all examples and just thought I'd take the opportunity to change it as I saw it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They also enforce that examples in the docs at least pass the TS compiler/linter/formatter
You're the best @jacoblee93 for finalizing this! Hail to the opensource community!! |
Thank you again! |
The way chromadb imports @xenova/transformers package in file chromadb/src/embeddings/TransformersEmbeddingFunction.ts:33 makes it result in random segment fault errors terminating the tests prematurely. This fix contains a code that bypasses chromadb package and directly uses the @xenova/transformers package
Due to how jest isolates the context of each running test (huggingface/transformers.js#57, https://github.com/kayahr/jest-environment-node-single-context, jestjs/jest#2549) - it makes it impossible for onnxruntime-node package to validate the array passed as an input to it is actually an
instanceof Float32Array
type. Theinstanceof
results in false because the globals are different between context. This commit shares the Float32Array global between each context.