-
Notifications
You must be signed in to change notification settings - Fork 74.7k
[PluggableDevice] PluggableDevice mechanism implementation #45784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PluggableDevice] PluggableDevice mechanism implementation #45784
Conversation
tensorflow/c/experimental/stream_executor/stream_executor_internal.h
Outdated
Show resolved
Hide resolved
tensorflow/core/common_runtime/pluggable_device/pluggable_device.h
Outdated
Show resolved
Hide resolved
tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc
Outdated
Show resolved
Hide resolved
tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc
Outdated
Show resolved
Hide resolved
tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc
Outdated
Show resolved
Hide resolved
CHECK(process_state_); | ||
const string& allocator_type = options.allocator_type(); | ||
se::Platform* platform = PluggableDeviceMachineManager(platform_name_); | ||
mutex_lock lock(mu_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct me if I missed it, it seems like we are using the default BFC memory allocator for device allocations, and not going through the SE interface discussed in the Stream Executor RFC:
void (*create_allocator)(const SP_Platform* platform,
SE_CreateAllocatorParams* params, TF_Status* status);
void (*destroy_allocator)(const SP_Platform* platform,
SP_Allocator* allocator,
SP_AllocatorFns* allocator_fns);
Also I think custom allocator support might need to be added. I can follow up with another PR to add support for that. Please let me know if I missed it elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right, custom allocator support is needed in streamexecutor c api. we are appreciate that you can contribute custom allocator feature in PluggableDevice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @annarev has already started an internal PR for this, but was deciding between
- Having a separate registration for the custom allocator, e.g.,
void TF_InitAllocatorPlugin(TF_AllocatorRegistrationParams* params, TF_Status* status);
- Adding them as part of the StreamExecutor C API surface, e.g.,
// For BFC.
void (*create_allocator)(const SP_Platform* platform, const SP_Device* device,
SE_CreateAllocatorParams* params, TF_Status* status);
void (*destroy_allocator)(const SP_Platform* platform,
const SP_Device* device, SP_Allocator* allocator);
void (*create_allocator_fns)(const SP_Platform* platform,
SP_AllocatorFns* allocator_fns,
TF_Status* status);
void (*destroy_allocator_fns)(const SP_Platform* platform,
SP_AllocatorFns* allocator_fns);
// For custom allocator.
void (*create_custom_allocator)(const SP_Platform* platform,
const SP_Device* device,
SE_CreateCustomAllocatorParams* params,
TF_Status* status);
void (*destroy_custom_allocator)(const SP_Platform* platform,
const SP_Device* device,
SP_CustomAllocator* allocator);
void (*create_custom_allocator_fns)(const SP_Platform* platform,
SP_CustomAllocatorFns* allocator_fns,
TF_Status* status);
void (*destroy_custom_allocator_fns)(const SP_Platform* platform,
SP_CustomAllocatorFns* allocator_fns);
(Only the top 4 or the bottom 4 can be set.)
It has been put on hold for a while. I think @annarev plans to revisit this in the next few weeks. (Please correct me if I'm wrong.)
Let us know if you have any preference/suggestion. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linking this here for wider coverage:
Anna has a PR for custom allocator up for comments: #47598
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm so sorry for the delay! Submitting some comments first.
tensorflow/c/experimental/stream_executor/stream_executor_internal.h
Outdated
Show resolved
Hide resolved
CHECK(process_state_); | ||
const string& allocator_type = options.allocator_type(); | ||
se::Platform* platform = PluggableDeviceMachineManager(platform_name_); | ||
mutex_lock lock(mu_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @annarev has already started an internal PR for this, but was deciding between
- Having a separate registration for the custom allocator, e.g.,
void TF_InitAllocatorPlugin(TF_AllocatorRegistrationParams* params, TF_Status* status);
- Adding them as part of the StreamExecutor C API surface, e.g.,
// For BFC.
void (*create_allocator)(const SP_Platform* platform, const SP_Device* device,
SE_CreateAllocatorParams* params, TF_Status* status);
void (*destroy_allocator)(const SP_Platform* platform,
const SP_Device* device, SP_Allocator* allocator);
void (*create_allocator_fns)(const SP_Platform* platform,
SP_AllocatorFns* allocator_fns,
TF_Status* status);
void (*destroy_allocator_fns)(const SP_Platform* platform,
SP_AllocatorFns* allocator_fns);
// For custom allocator.
void (*create_custom_allocator)(const SP_Platform* platform,
const SP_Device* device,
SE_CreateCustomAllocatorParams* params,
TF_Status* status);
void (*destroy_custom_allocator)(const SP_Platform* platform,
const SP_Device* device,
SP_CustomAllocator* allocator);
void (*create_custom_allocator_fns)(const SP_Platform* platform,
SP_CustomAllocatorFns* allocator_fns,
TF_Status* status);
void (*destroy_custom_allocator_fns)(const SP_Platform* platform,
SP_CustomAllocatorFns* allocator_fns);
(Only the top 4 or the bottom 4 can be set.)
It has been put on hold for a while. I think @annarev plans to revisit this in the next few weeks. (Please correct me if I'm wrong.)
Let us know if you have any preference/suggestion. :)
tensorflow/core/common_runtime/pluggable_device/pluggable_device.h
Outdated
Show resolved
Hide resolved
tensorflow/core/common_runtime/pluggable_device/pluggable_device.cc
Outdated
Show resolved
Hide resolved
tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.h
Outdated
Show resolved
Hide resolved
@penpornk Seems still one checks failed(MacOs CPU python3), It seems a strange error(one is direct_session, another is grappler), I am not sure whether it is related with this PR since I have no MacOS machine to reproduce, can you help to check this also? I will also check the possible issue. Thanks very much! |
@jzhoulon This check failed 4 hours ago too (same errors) when it passed elsewhere. So I reran it. It's still failing so it's likely related to this PR. I'll try to find how I can reproduce this from my side as well. |
@jzhoulon and @penpornk , I have pulled in the latest patches and will try to repro on my side. I will update if I find something locally. |
@kulinseth Thanks for the help! I have some findings , though I still didn't figure out why only MacOS has this issue Details:
debug log:
fix patch:
(edited by penpornk@ to fix markdown formatting error which makes the patch hard to read) |
@jzhoulon Thank you for the quick findings! I think the duplicate symbol might be because both PluggableDevice runtime and GPU runtime are linked statically. Could you please help test linking PluggableDevice runtime dynamically? Our tpu_runtime is also linked dynamically. @kulinseth Thank you for your help as well! I've pulled the PR in to test internally and am fixing other failures (which obscure this failure on Mac OS CI). I'll get to this once I'm done with other failures. |
Thanks @jzhoulon for the findings. I was also curious why this was Mac only failure. I looked at the tests and they have the "no_gpu" tag to it (although Ubuntu CPU would have caught it too). Its also possible that the Xcode toolchain with compiler flags have different behavior with duplicate symbols. |
@jzhoulon I had a question regarding the Pluggable impl. Did you test the pluggable implementation with host_memory_allocate/deallocate APIs being exercised ? I am locally not seeing them getting used, was curious if we need to set some special flag. |
…ream_executor_test to a common file, so test_pluggable_device can use it too. This change is a preparation for PR #45784. In the PR, the test `LibraryPluggableDeviceLoadFunctions` in //tensorflow/c:c_api_experimental_test eventually verifies that all structs in `test_pluggable_device.so` are populated properly. PR #45784: #45784 PiperOrigin-RevId: 363065561 Change-Id: Ic3e019ea21d998922ff8fd84ebd6faabdaccfd5d
@kulinseth Thanks for the review. the host_memory_allocate is used in
|
Thanks @jzhoulon. I do see the DeviceHostAllocator getting called (using the BFCAllocator) and registered, and we have the host_memory_allocate and deallocate functions registered but they are not getting invoked (rest of the alloc functions are working fine, like device mem allocate()/deallocate()) . Are we missing any I am curious in your backend impl did you see any issues with using host_memory allocation features. |
@kulinseth I just confirmed that host_memory_allocate can be invoked.(I add printf in hsot_memory_allocate and printed), can you try
|
… from stream_executor_test to a common file, so test_pluggable_device can use it too. This change is a preparation for PR #45784. In the PR, the test `LibraryPluggableDeviceLoadFunctions` in //tensorflow/c:c_api_experimental_test eventually verifies that all structs in `test_pluggable_device.so` are populated properly. PR #45784: #45784 PiperOrigin-RevId: 363205807 Change-Id: I903882bf82391ebc3493ecfa3c111dfc0916b9cb
…plicate symbols in PR #45784 (PluggableDevice implementation). PiperOrigin-RevId: 363299146 Change-Id: I8f7a9dbbf491020eb0cec54616c6f7c181887924
… symbols in PR #45784 (PluggableDevice implementation). PiperOrigin-RevId: 363302623 Change-Id: Ibb31404a32151989e3f5f3bec2663bf275c0398f
…ream_executor_test to a common file, so test_pluggable_device can use it too. This change is a preparation for PR #45784. In the PR, the test `LibraryPluggableDeviceLoadFunctions` in //tensorflow/c:c_api_experimental_test eventually verifies that all structs in `test_pluggable_device.so` are populated properly. PR #45784: #45784 PiperOrigin-RevId: 363315826 Change-Id: I36391180bbc0aca6d41575d5b6f0299a5e99af76
…ementation PiperOrigin-RevId: 363579316 Change-Id: I2ac70795e5ab14fce9c788ba00dd1923ff72a26d
This PR was merged in 3a3878f. I'm closing the PR now. Thank you very much everyone for the hard work! |
Thanks so much for checking. |
"//tensorflow/core/platform:stream_executor", | ||
"//tensorflow/stream_executor:event", | ||
"//tensorflow/stream_executor:kernel", | ||
] + if_static([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jzhoulon and @penpornk
This MacOS test failure commit in pluggable impl seems to have caused regression on mac platforms to register pluggable device. When _pywrap_tensorflow .so loads up the plugin it goes through the device initialization fine then it hands over to libtensorflow_framework dylib to query the plugin handle using the Platform name in MultiPlatformManager
. And during this part it fails with "Platform " not found. Seems like a linker issue where the plugin registry is not getting shared across the so
and dylib
.
Reverting this locally workarounds it. Will create a PR with proper fix.
Add PluggableDevice mechanism implementation, including plugin initialization, PluggableDevice create and compute, and also add a new flag(is_pluggable_device) in DeviceFactory to do some low-level device specialization.