Skip to content

[Kernel C API] Implementation of variable ops RFC. #49717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 23, 2021

Conversation

kulinseth
Copy link
Contributor

@kulinseth kulinseth commented May 26, 2021

@google-ml-butler google-ml-butler bot added the size:L CL Change Size: Large label May 26, 2021
@google-cla google-cla bot added the cla: yes label May 26, 2021
@@ -551,3 +554,257 @@ TF_Tensor* TF_AllocateTemp(TF_OpKernelContext* context, TF_DataType dtype,
}
return tf_tensor;
}

tensorflow::Status EnsureSparseVariableAccess(TF_OpKernelContext* ctx,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am moving these to c_api_experimental file (ran into some build issues). I wanted to get started with PR, to get feedback.

@penpornk penpornk requested a review from saxenasaurabh May 26, 2021 00:31
@gbaned gbaned self-assigned this May 26, 2021
var->tensor()->shape(), &tmp, attr));
tensorflow::Status s;
TF_Tensor *tf_tmp = TF_TensorFromTensor(tmp, &s);
TF_Tensor *tf_tensor = TF_TensorFromTensor(*var->tensor(), &s);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at the end of the function, TF_DeleteTensor() is needed, TF_TensorFromTensor will new a TF_Tensor struct and will cause memory leak if TF_DeleteTensor() is not invoked.

/ Non-static for testing.
TF_Tensor* TF_TensorFromTensor(const tensorflow::Tensor& src, Status* status) {
  *status = tensorflow::Status::OK();
  if (!src.IsInitialized()) {
    *status = FailedPrecondition(
        "attempt to use a tensor with an uninitialized value");
    return nullptr;
  }
  if (src.NumElements() == 0) {
    return EmptyTensor(static_cast<TF_DataType>(src.dtype()), src.shape());
  }
  if (src.dtype() == tensorflow::DT_RESOURCE) {
    if (src.shape().dims() != 0) {
      *status = InvalidArgument(
          "Unexpected non-scalar DT_RESOURCE tensor seen (shape: ",
          src.shape().DebugString(),
          "). Please file a bug at "
          "https://github.com/tensorflow/tensorflow/issues/new, "
          "ideally with a "
          "short code snippet that reproduces this error.");
      return nullptr;
    }
    const string str =
        src.scalar<tensorflow::ResourceHandle>()().SerializeAsString();
    TF_Tensor* t = TF_AllocateTensor(TF_RESOURCE, {}, 0, str.size());
    std::memcpy(TF_TensorData(t), str.c_str(), str.size());
    return t;
  }

  Tensor tensor;
  if (!tensor.CopyFrom(src, src.shape())) {
    return nullptr;
  }
  return new TF_Tensor{new tensorflow::TensorInterface(std::move(tensor))};
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

context->allocate_temp(tensor->dtype(), tensor->shape(), &tmp, attr));
tensorflow::Status s;
TF_Tensor *tf_tmp = TF_TensorFromTensor(tmp, &s);
TF_Tensor *tf_tensor = TF_TensorFromTensor(*tensor, &s);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check the two status ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can return the status back to caller of the PrepareUpdateVariable().

@kulinseth kulinseth force-pushed the variable_ops_impl branch from fc79d57 to cc6a3f1 Compare June 3, 2021 07:45
// &total_size)).
TF_CAPI_EXPORT extern void TF_OpKernelConstruction_GetAttrTensorShape(
TF_OpKernelConstruction* ctx,
const char* attr_name, int64_t* values,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Rename values to dims and max_vals to num_dims to be consistent with TF_GraphGetTensorShape.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -551,3 +571,301 @@ TF_Tensor* TF_AllocateTemp(TF_OpKernelContext* context, TF_DataType dtype,
}
return tf_tensor;
}

tensorflow::Status EnsureSparseVariableAccess(TF_OpKernelContext* ctx,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to share code with the existing impl for EnsureSparseVariableAccess in training_op_helpers.h? Same comment for other helpers below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @saxenasaurabh for the review. I have used the helper functions where possible like LookupResource. With EnsureSparseVariableAccess , there is Device dependency which we are passing in as Copy functors. I do agree there is duplication of code which can be avoided. One possibility is to refactor the core helper functions to remove this dependency and we can adopt it here. Maybe we can do it as a followup cleanup as it will require invasive changes in the core which can break things. What do you think ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Please try to clean this is up as a follow-up. That would give test coverage for free as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will do.

tf_tmp = TF_TensorFromTensor(tmp, &s);
tf_tensor = TF_TensorFromTensor(*var->tensor(), &s);
TF_Tensor *tf_tmp = TF_TensorFromTensor(tmp, &s);
TF_Tensor *tf_tensor = TF_TensorFromTensor(*var->tensor(), &s);
copyFunc(ctx, tf_tensor, tf_tmp);
Copy link
Contributor Author

@kulinseth kulinseth Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are expecting plugin to call TF_DeleteTensor in copyFunc. This would be more in line with rest of the TF, where we release the tensors in the Compute.

TF_OpKernelContext* ctx,
int input,
bool lock_held,
bool isVariantType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to expose isVariantType in the API or can that be inferred from the tensor? I believe this is equivalent to TF_TensorType(tensor) == TF_VARIANT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saxenasaurabh thanks! (sorry for the delay), you are right , we can probably skip the isVariantType in the API. Currently in our metal-plugin release we are using this API. If its not too much of a concern, can we keep this way?

@@ -113,6 +114,19 @@ struct TF_OperationDescription {
std::set<tensorflow::string> colocation_constraints;
};

struct TF_VariableInputLockHolder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be in c_api_experimental as well?

@penpornk penpornk added the ready to pull PR ready for merge process label Jun 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes size:L CL Change Size: Large
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants