Skip to content

Move dlopen file operations into native code. NFC #19310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 11, 2023
Merged

Conversation

sbc100
Copy link
Collaborator

@sbc100 sbc100 commented May 9, 2023

Fixes: #19245

@sbc100 sbc100 requested a review from dschuff May 9, 2023 02:00
@sbc100 sbc100 force-pushed the proxy_dlopen_file_ops branch from 3b98113 to f2b1a00 Compare May 9, 2023 02:02
@sbc100 sbc100 requested a review from kripken May 9, 2023 16:34
@sbc100 sbc100 force-pushed the proxy_dlopen_file_ops branch 2 times, most recently from ed6bbc5 to 547a7d3 Compare May 9, 2023 18:38
sbc100 added a commit that referenced this pull request May 9, 2023
sbc100 added a commit that referenced this pull request May 9, 2023
sbc100 added a commit that referenced this pull request May 9, 2023
sbc100 added a commit that referenced this pull request May 9, 2023
sbc100 added a commit that referenced this pull request May 10, 2023
@sbc100 sbc100 force-pushed the proxy_dlopen_file_ops branch 2 times, most recently from 175e077 to 281ae61 Compare May 10, 2023 21:28
@sbc100 sbc100 changed the title Proxy dlopen file operations back to the main thread. NFC Move dlopen file operations into native code. NFC May 10, 2023
@sbc100 sbc100 force-pushed the proxy_dlopen_file_ops branch 5 times, most recently from 4f8a552 to 1b3ba8e Compare May 10, 2023 23:59
@sbc100 sbc100 enabled auto-merge (squash) May 11, 2023 15:47
@sbc100 sbc100 requested a review from kripken May 11, 2023 15:48
@sbc100 sbc100 force-pushed the proxy_dlopen_file_ops branch from 1b3ba8e to ab0703f Compare May 11, 2023 15:48
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! lgtm % comments

off_t size = lseek(fd, 0, SEEK_END);
if (size != (off_t)-1) {
lseek(fd, 0, SEEK_SET);
p->file_data = malloc(size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check for a failing malloc? (I don't think we can depend on p->file_data being null leading to an error in read)

This allows the file data to be read by just a single thread (the
calling thread) and shared with all the others via shared memory.

Fixes: #19245
@sbc100 sbc100 force-pushed the proxy_dlopen_file_ops branch from ab0703f to b64a46e Compare May 11, 2023 16:05
@sbc100 sbc100 merged commit a36c3f6 into main May 11, 2023
@sbc100 sbc100 deleted the proxy_dlopen_file_ops branch May 11, 2023 17:20
Comment on lines -963 to -964
if (flags.fs && flags.fs.findObject(libName)) {
var libData = flags.fs.readFile(libName, {encoding: 'binary'});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a regression for us because we've been using the fs argument to loadDynamicLibrary. Do you have a recommended alternative @sbc100?

cc @ryanking13

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you call _dlopen(...) perhaps instead?

loadDynamicLibrary and the other symbols in library_dylink.js are not really part of the public API .. or at least I would prefer them not to be considered public.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear. What was the intended purpose of this hook if it's not supposed to be used downstream?

There's a few issues. One is asynchronous compilation: dlopen is synchronous I think, so it has to instantiate the module synchronously, which fails on the main thread for large modules.

A second problem is resolving dynamic library dependencies. If we dlopen a library a.so that depends on b.so then the lookup logic by default it does something we don't want.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear. What was the intended purpose of this hook if it's not supposed to be used downstream?

There's a few issues. One is asynchronous compilation: dlopen is synchronous I think, so it has to instantiate the module synchronously, which fails on the main thread for large modules.

I think we can find a way to work around that. There are 3 different versions of dlopen in emscripten and 2 of them are async (emscripten_dlopen_promise and emscripten_dlopen) . They are mostly designed to be called from native code though. There is you loading code happening? Is it originating in native code?

A second problem is resolving dynamic library dependencies. If we dlopen a library a.so that depends on b.so then the lookup logic by default it does something we don't want.

Can you say little more about this. The logic for loading the dependencies of dynamic libraries is all embedded in loadWebAssemblyModule on the JS side so I would expect them to be the same.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear. What was the intended purpose of this hook if it's not supposed to be used downstream?

No, sadly we don't have much clarity about what is supposed to be internal vs public JS APIs. I wish we did. In this case we used that hook internally, but after this change it was no longer needed. I'd rather not add it back because the new method of loading the data from the filesystem once on the main thread and sharing it via shared memory has a lot of advantages (the alternative is that each thread would need to do FS operations of their own, which themselves get proxied back to main thread, which adds a up to load wasted work).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue here is that we need to adjust how dependencies are looked up. If we call FS.createPreloadedFile directly, it will do its normal dependency resolution process without any way for us to control it. So we either need a hook or we need a way to work out the dependencies ahead of time and resolve them manually. As far as I can tell getDylinkMetadata is the only way for us to work out dependencies ahead of time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say "adjust how dependencies are looked up" do you mean controlling where .so files are found (like LD_LIBRARY_PATH)? Or something else?

Is that because you have DSO with the same name in different locations?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean controlling where .so files are found (like LD_LIBRARY_PATH)? Or something else?

This may be all we need, but I'm not 100% sure. Presumably there is some correct standard way to do this we could be using?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping. Let me make it a little clearer

When you say "adjust how dependencies are looked up" do you mean controlling where .so files are found (like LD_LIBRARY_PATH)? Or something else?

Yes, "adjust how dependencies are looked up" is why Pyodide uses fs.findObject and fs.readfile now.

If a.so depends on b.so, we need a way to locate b.so somewhere in the (virtual) file system. So we want a global paths (LD_LIBRARY_PATH) that dependencies can be searched from.

Additionally, It would be great to have a thing like RPATH (pyodide/pyodide#3854). To make Python packages portable, we are trying to bundle shared libraries into Python wheel, so we want a way to search those "local" paths, which is availble in Linux with RPATH.

Is that because you have DSO with the same name in different locations?

It's similar, but a little different.

The reason we modify LDSO.loadedLibsByName is to handle cases when we pass an absolute path and just a lib name into loadDynamicLibrary: when two libraries are the same.

For example, if we have a situation where we have previously called loadDynamicLibrary (/usr/lib/b.so) to load b.so, and we subsequently load a.so that depends on b.so, we want to know that the b.so that a.so depends on is /usr/lib/b.so, which has already been loaded, so that we don't get a duplicate load.

For now, a.so will call loadDynamicLibrary(b.so) in the process of loading the dependent library, so we need to tell it that this is the same file as /usr/lib/b.so.

Copy link
Contributor

@ryanking13 ryanking13 Jun 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell getDylinkMetadata is the only way for us to work out dependencies ahead of time.

To be more specific, the reason why we are using getDylinkMetadata now in Pyodide is to handle the bug described in #18264.

If a library is loaded locally with RTLD_LOCAL option,
the dependent libraries will also be loaded locally. On Linux, dependent libraries can resolve symbols even if they are loaded locally, but I think there is currently a bug in Emscripten that prevents this.

As a workaround, Pyodide is changing the option to force libraries that are dependencies of other libraries to be loaded globally, using the metadata value returned by getDylinkMetadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why doesn't dlopen() see preloaded files when called from a worker?
4 participants