Skip to content

Fix random test fails due to segfault by chromadb & Float32Array not shared in jest context isolation #2487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Sep 5, 2023

Conversation

Durisvk
Copy link
Contributor

@Durisvk Durisvk commented Sep 3, 2023

The way chromadb imports @xenova/transformers package in file chromadb/src/embeddings/TransformersEmbeddingFunction.ts:33 makes it result in random segment fault errors terminating the tests prematurely. This fix contains a code that bypasses chromadb package and directly uses the @xenova/transformers package

Due to how jest isolates the context of each running test (huggingface/transformers.js#57, https://github.com/kayahr/jest-environment-node-single-context, jestjs/jest#2549) - it makes it impossible for onnxruntime-node package to validate the array passed as an input to it is actually an instanceof Float32Array type. The instanceof results in false because the globals are different between context. This commit shares the Float32Array global between each context.

l4b4r4b4b4 and others added 7 commits August 13, 2023 23:09
…shared in jest context isolation

The way chromadb imports @xenova/transformers package in file
chromadb/src/embeddings/TransformersEmbeddingFunction.ts:33 makes it result in random segment fault errors terminating the tests prematurely.
This fix contains a code that bypasses chromadb package and directly uses the @xenova/transformers package

Due to how jest isolates the context of each running test (huggingface/transformers.js#57, https://github.com/kayahr/jest-environment-node-single-context,
jestjs/jest#2549) - it makes it impossible for onnxruntime-node package to validate the array passed as an input to it is actually an `instanceof Float32Array`
type. The `instanceof` results in false because the globals are different between context. This commit shares the Float32Array global between each context.
@vercel
Copy link

vercel bot commented Sep 3, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
langchainjs-docs ✅ Ready (Inspect) Visit Preview Sep 5, 2023 11:17pm

@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Sep 3, 2023
@jacoblee93
Copy link
Collaborator

Makes sense, thank you for looking into this!

@jacoblee93 jacoblee93 self-assigned this Sep 5, 2023
@jacoblee93 jacoblee93 added question Further information is requested lgtm PRs that are ready to be merged as-is labels Sep 5, 2023
@jacoblee93 jacoblee93 changed the base branch from transformer-embeddings to main September 5, 2023 21:29
@jacoblee93 jacoblee93 removed the question Further information is requested label Sep 5, 2023
import { HuggingFaceTransformersEmbeddings } from "langchain/embeddings/hf_transformers";

const model = new HuggingFaceTransformersEmbeddings({
modelName: "Xenova/all-MiniLM-L6-v2", // In Node.js defaults to process.env.OPENAI_API_KEY
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say that this defaults to OPENAI_API_KEY but I guess this is generated by some automation tool

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh whoops thanks - was from the original PR

const documentRes = await model.embedDocuments(["Hello world", "Bye bye"]);
console.log({ documentRes });
};
const model = new GooglePaLMEmbeddings({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to dislike this commit cause it's messing up with other embeddings 😬

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to give more clarifications - with new top-level async-await ability this is all okay, but if someone is using import { run } ... from this module then it would fail for them

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, but these are just displayed in the docs though and aren't really meant to be imported. We're moving towards using top-level async/await for all examples and just thought I'd take the opportunity to change it as I saw it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They also enforce that examples in the docs at least pass the TS compiler/linter/formatter

@Durisvk
Copy link
Contributor Author

Durisvk commented Sep 5, 2023

You're the best @jacoblee93 for finalizing this! Hail to the opensource community!!

@jacoblee93
Copy link
Collaborator

Thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature lgtm PRs that are ready to be merged as-is
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants