-
Notifications
You must be signed in to change notification settings - Fork 52
Model Execution API #87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Good catch! One possible solution is allowing user code to query the output shape, such as Android NNAPI @miaowang14.
I don't quite get this issue. Could you please elaborate the case that the API allows to compile the full graph while execute a sub-graph? |
But the output shape may not always be statically known. It would be good if the API allows for the possibility of the output buffer being allocated during execution when its size is known. Can we treat the call to "setOutput" as optional (not required)? |
+1. The API may also need to allow specifying the input shape that is not supported currently.
Then how to get the execution results? Could you please elaborate the idea? |
The execution API essentialy needs to allow a way to specify the inputs, optionally specify output buffers, and get back the outputs. I think various possibilities exist for this. In the long run, it may actually be useful to support multiple signatures to support different ways of doing this (mostly as a syntactic convenience). One option would be to have a return result (for example, a dictionary mapping output-name to its value). |
+1. I'm also looking at a few related
|
+1. According to current Execution API, the missing functionalities are specifying the input shape, get the output shape and data. So semantically, we may extend the interface Execution {
- void setInput(DOMString name, ArrayBufferView data);
+ void setInput(DOMString name, ArrayBufferView data, optional sequence<long> shape);
void setOutput(DOMString name, ArrayBufferView data);
Promise<void> startCompute();
+ sequence<long> getOutputShape(DOMString name);
+ Promise<void> readOutput(DOMString name, ArrayBufferView buffer);
}; With that, the sample code for dynamic input/output shape would be // Create an Execution object for the compiled model.
const execution = compilation.createExecution();
const inputShape = [2, 2, 2, 2];
// Setup the input buffers with value 1.
const inputBuffer1 = new Float32Array(sizeOfShape(inputShape)).fill(1);
const inputBuffer2 = new Float32Array(sizeOfShape(inputShape)).fill(1);
// Specify the input data and its shape.
execution.setInput('input1', inputBuffer1, inputShape);
execution.setInput('input2', inputBuffer2, inputShape);
// Start the asynchronous computation.
await execution.startCompute();
// Get the output shape and allocate the buffer.
const outputShape = execution.getOutputShape('output');
const outputBuffer = new Float32Array(sizeOfShape(outputShape));
// Read the output values into the allocated buffer.
await execution.readOutput('output', outputBuffer);
// The computed result is now in outputBuffer.
console.log(outputBuffer);
This could be a syntactic improvement. One difference is above proposal allows to get the value of a single output instead of returning them all. |
@wchao1115 , I agree there are rooms to improve |
@huningxin I understand that Concretely, a few things stand out for me:
A leaner design in my mind would be one with an operand exposing a method for the app to optionally download its content to an ArrayBuffer only if they want to. This call would fail if it's made before the graph is executed since the output tensor won't be known then. With that API addition, it turns out that |
In #17 it was "RESOLVED: The specification will reference a subset of the ONNX operations". The ONNX operations include type constraints i.e. https://github.com/onnx/onnx/blob/master/docs/Operators.md#MatMulInteger . Can we just follow them? |
That's a good point. Besides passing the GPU-backed output tensor to other API, I think the Execution API should be able to accept the output tensor as input. With that, it would enable pipelining multiple graph executions without moving data cross devices.
My understanding is the input and output tensor of Execution API should be device dependent, say GPU resource. However, So I am thinking about adding a // Create an Execution object for the compiled model.
const execution = compilation.createExecution();
const inputShape = [2, 2, 2, 2];
// Setup the input buffers with value 1.
const inputBuffer1 = new Float32Array(sizeOfShape(inputShape)).fill(1);
const inputBuffer2 = new Float32Array(sizeOfShape(inputShape)).fill(1);
// Create the input and output tensors for compilation device.
const inputTensor1 = compilation.device.tensor(float32TensorType, inputBuffer1);
const inputTensor2 = compilation.device.tensor(float32TensorType, inputBuffer2);
const outputTensor = compilation.device.tensor();
// Set input and output tensors to an execution.
execution.setInput('input1', inputTensor1);
execution.setInput('input2', inputTensor2);
execution.setOutput('output', outputTensor);
// Start the computation, no async anymore, as tensor reading back would be async.
execution.startCompute(inputs, outputs);
// Query the output shape, await for shape inference done.
const outputShape = await outputTensor.getDimensions();
const outputBuffer = new Float32Array(sizeOfShape(outputShape));
// Read the data back, await for inference done.
await outputTensor.readInto(outputBuffer);
console.log(outputBuffer); The |
Hi @wchao1115 : one clarification about the attributes and operands: I think that the primary distinction between attributes and operands are that attributes represent values that are known statically, at the time the graph/model is built; on the other hand, operands represent values that typically will not be known statically (though, in some special cases they might be). So, one key question we should address is how we want to handle scalar values that are known only at runtime (and not statically). Many frameworks do use tensors of rank 0 to represent such scalars. We could do the same. In this case, restricting operand types to be tensor-float332 and not float32 would work. So, this is doable, though it may incur a small performance penalty. But since these scalars are not going to be the performance bottleneck, it should be okay. |
@gramalingam That's a good point. We haven't fully defined the notion of tensor scalar. From the graph execution standpoint, tensors are device-specific resources, either as inputs or outputs of the process. |
@huningxin I think this is too complicated since the developers now need to deal with 3 different notions -- operand, tensor, and buffer. Besides, the tensors are now created by Even though tensors are device-dependent, it doesn't mean that we have to surface it as a type to the users. I think the current notion of Same for the output operand, as long as there is a way to retrieve the computed content in it and put it in a CPU buffer or in a future resource type, there is no need to expose the notion of the device-specific resource type to the users. Also, if you look at how |
Users may compile the same const model = await nn.createModel([{name: 'output', operand: output}]);
// e.g. target iGPU
const compilation1 = await model.createCompilation({powerPreference: 'low-power'});
// e.g. target dGPU
const compilation2 = await model.createCompilation({powerPreference: 'high-performance'});
// Create Execution objects for the compiled models.
const execution1 = await compilation1.createExecution();
const execution2 = await compilation2.createExecution();
execution1.startCompute();
execution2.startCompute();
// Which device to read back, iGPU or dGPU?
await output.readInto(buffer); |
Hi @huningxin : to read the output value into a buffer, we would need to invoke a "getValue(operand)" method on either execution or on a result returned by execution.startCompute(), rather than an operand. @wchao1115 : I assume that the existing buffer abstractions suffice to represent memory on different devices? Assuming that, I think the primary distinction between a "Tensor" and a "buffer" would be that a "Tensor" attaches a "shape" to a buffer. I think that is a useful addition ... in most frameworks there is such a distinction. I assume the existing Buffer abstraction does not have a shape associated with it? |
This might not allow passing the GPU-backed output to other API or a following execution without moving data between GPU and CPU.
This would allow above use case. However, as execution is for one-shot inference, this might not allow reusing the device memory for multiple executions/inferences.
I agree not read the value from an operand.
If you mean
Neither ArrayBufferView nor GPUBuffer has a shape. |
Can this work? // model produces 2 outputs
const model = await nn.createModel([{name: 'output1', operand: output1}, {name: 'output2', operand: output2}]);
const compilation = await model.compile();
const buffers = await compilation.compute(['output1']);
// 'buffers' is sequence<ArrayBufferView>
// 'buffers[0]' holds a filled CPU buffer of just the content of output1 |
The previous discussion is at #39 (comment). Probably we need to revisit that.
+1
Two opens:
|
I prefer a single "compilation.compute(...)" method (as suggested by Chai). Re. Ningxin's point 1: yes: we need to supply the input-values as parameters to the "compute" method. Re. Ningxin's point 2: I think if we want to support this usecase, we have two options. I prefer option (b). So, basically, generalize Chai's suggestion to:
where outputs is a sequence of output_type, and output_type is a union that includes tensors in GPU as well as CPU etc. I think it is useful to have a type representing a tensor, regardless of where it is located. |
I omitted the
Not sure why. I don't think we need to pass the input operands to
I looked at both canvas-2d and canvas-WebGL to understand how to move the output tensor data to either endpoint, and it looks like both paths do start with ArrayBufferView, which unfortunately would require a copy. But in most known scenarios the bottlenecks are on how to get the data in to the model. So requiring copy-out may not be too bad. Future resource sharing with WebGPU may be a possibility, but that would probably need to happen at the context level to ensure cross-adapter resource sharing. I think assuming selective CPU downloads on outputs is probably a safe bet. |
We know what the inputs (that is, the input-variables) of the graph are (when we build the graph), but not the values of these inputs (which are determined only when we call "compute" and will vary from one call of "compute" to another. |
@gramalingam the input values are known, right? That's why it's the input. (copied from the spec sample) // Setup the input buffers with value 1.
const inputBuffer1 = new Float32Array(TENSOR_SIZE).fill(1);
const inputBuffer2 = new Float32Array(TENSOR_SIZE).fill(1);
// Associate the input buffers to model’s inputs.
execution.setInput('input1', inputBuffer1);
execution.setInput('input2', inputBuffer2); |
Yes, we can specify the input values using |
That would be interface Compilation {
void setInput(DOMString name, ArrayBufferView data);
Promise<sequence<ArrayBufferView>> compute(sequence<DOMString>);
}; |
+1.
+1. This would make the
+1. This would enable passing the GPU resource along to other API even following
+1. I propose to add There are some usages with code sketches (updated).
// Create input tensors with full specified shape.
const tensor1 = compilation.tensor(inputShape1);
const tensor2 = compilation.tensor(inputShape2);
// Create output tensor without specifying shape.
const tensor3 = compilation.tensor();
const inputBuffer1 = new Float32Array(sizeOfShape(inputShape)).fill(value);
tensor1.writeFrom(inputBuffer1);
// Similar code for tensor2.
compilation.compute({'input1': tensor1, 'input2': tensor2}, {'output': tensor3});
const outputBuffer = new Float32Array(sizeOfShape(outputShape));
await tensor3.readInto(outputBuffer);
// Might wait for shape inference done.
const outputShape = await outputTensor.getDimensions();
const outputBuffer = new Float32Array(sizeOfShape(outputShape));
// The second inference.
tensor1.writeFrom(inputBuffer1);
tensor2.writeFrom(inputBuffer2);
compilation.compute({'input1': tensor1, 'input2': tensor2}, {'output': tensor3});
await tensor3.readInto(outputBuffer);
tensor1.writeFrom(inputBuffer1);
tensor2.writeFrom(inputBuffer2);
compilation.compute({'input1': tensor1, 'input2': tensor2}, {'output': tensor3});
compilation2.compute({'input': tensor3}, {'output': tensor4});
// Only download the final result.
await tensor4.readInto(outputBuffer); |
@huningxin It seems a key difference in this discussion so far is whether we should define a notion of Tensor in the API as a dedicated resource that can be implemented and passed in and out of the API, potentially enabling resource sharing with other API. This idea is conceptually sound with slightly greater complexity, but my concern is on the implementation side, especially for the GPU implementation of the API. The first problem is that you actually don't want to map a Tensor to a GPU resource like a UAV buffer, one to one. In practice it's far more common to pool and reuse a single UAV buffer for multiple sets of tensor data with proper alignments. Frequent creation of GPU resources is prohibitively expensive. Secondly, activities such as upload/download of tensor data from/to CPU buffers are commonly pipelined in the same command list along with shader dispatches that carry out the computation of graph operations, so that the entire command list can be pipelined and executed in one go. It would be highly inefficient to break up the command list just to upload the tensor data for instance. And lastly, there are some type of tensor data e.g. constant weight data that sometimes requires special treatment depending on the underlying hardware; a process that involves an additional copy pass before issuing shader dispatches. It's hard to properly locate the timing of that copy pass when the tensor data is associated with the Tensor API objects that can operate independently from the rest of the graph execution process. Based on this set of difficulties, I'm still advocating against defining a Tensor type at the API level and instead continuing with the current approach of explicit upload/download methods off the |
@wchao1115 , thanks for sharing your thoughts on GPU implementation that is extremely helpful.
+1. I think we might not need to maintain the one to one mapping between a Tensor and a GPU resource. A tensor could just "record" the logical resource usage, such as input or output of a compute. The real resource allocation is up to the implementation.
That's a good point. To test my understanding, I would slightly change the sketch code of the inference loop as: tensor1.writeFrom(inputBuffer1);
tensor2.writeFrom(inputBuffer2);
compilation.compute({'input1': tensor1, 'input2': tensor2}, {'output': tensor3});
await tensor3.readInto(outputBuffer); When running above code, a GPU implementation might be able to delay the execution until the last step (
The constant data would not be represented by Tensor. We use Does this work? |
In practice the backing implementation of the API Tensor type would most likely be just a sort of handle with a refcount pointer to the context, which complicates the implementation since there will need to be additional bookkeeping going internally. It'll also tax resource management and garbage collection with small handles moving around. The API object gives a false impression that it can do something useful.
That is very unintuitive and misleading to developers. It is very natural to expect that operations like shader dispatches happen at
Understood. I'm not suggesting that we change the notion of |
This "pre-process weight data" is called packing #86 and also happens with CPU drivers. To generalize slightly, the tensor need not be a constant but simply reused enough to justify packing it. For example, a transformer encoder's outputs will be consulted repeatedly by a transformer decoder. You appear to be assuming that graph compilation is the right place to pack parameter matrices. Is your graph compilation general enough to support graph structures that differ for every input like syntactic parsing and the existence of the dynet toolkit? https://github.com/clab/dynet/ If not, then then graph compilation is not the place to stuff packing. |
Semantic wise, yes. There are some syntax tweaks. I propose to use record<K, V> that allows the literal syntax for mapping name to input/output. dictionary Input {
required ArrayBufferView buffer;
sequence<long> dimensions; // optional
};
dictionary Output {
ArrayBufferView buffer; // optional
sequence<long> dimensions; // optional
};
typedef record<DOMString, Input> Inputs;
typedef record<DOMString, Output> Outputs;
interface Compilation {
Promise<Outputs> compute(Inputs inputs, optional Outputs outputs);
}; With that, user could pass inputs like following code: let results = await compilation.compute({'a': {buffer: bufferA, dimensions: shapeA}}); and can access output by named property: console.log(results.c.dimensions);
console.log(results.c.buffer); WDYT? |
@huningxin @wchao1115 Are we combining the compilation and execution into a single method compilation.compute? |
@huningxin @wchao1115 Slowly reading through the thread, I think the proposal solves my concerns with the original API. One extra question is the concept of compilation to the end user, it looks like the compilation become implicit, since it is combined with execution. It is not clear to me why we need to introduce this concept. Especial for end users do not care about controlling the compilation, it will much clear to have an execute method directly on the Model interface. model.execution(Inputs inputs, optional Outputs outputs); |
No. They are separated. The compilation would be done by const a = nn.input('a', descriptorA);
const b = nn.constant(descriptorB, bufferB);
const c = nn.matmul(a, b);
const compilation = await nn.compile({'c': c}, {powerPreference: 'low-power'}); The execution is done by let results = await compilation.compute({'a': {buffer: bufferA, dimensions: shapeA}});
console.log(results.c.dimensions);
console.log(results.c.buffer);
It would not require users to know the output shape prior calling the compute method. According to compute's signature, For static input shape, for example const a = nn.input({type: 'tensor-float32', dimensions: [2, 2]}); The users can just provide the buffer when calling the compute method, such as let results = await compilation.compute({'a': {buffer: bufferA}}); For dynamic input shape, for example const a = nn.input({type: 'tensor-float32', dimensions: [-1, 2]}); The users need to provide both buffer and shape, such as: let results = await compilation.compute({'a': {buffer: bufferA, dimensions: [2, 2]}}); |
@huningxin Actually this has gone back a little bit, take following graph as example: const a = nn.input('a', descriptorA);
const b = nn.constant(descriptorB, bufferB);
const c = nn.matmul(a, b);
const d = nn.matmul(a, c);
const compilation = await nn.compile({'d': d}, {powerPreference: 'low-power'}); The compile is done for the full graph, now I want to execute the subgraph which has c as the output, can the previous compilation used for this new execution: let results = await compilation.compute({'a': {buffer: bufferA, dimensions: shapeA}}, {'c': {...}));
console.log(results.c.dimensions);
console.log(results.c.buffer); Or the user need to create a new compilation? const compilation2 = await nn.compile({'c': c}, {powerPreference: 'low-power'}); |
Yes. The users could create a new compilation for that subgraph. I am thinking about an alternative way. That allows to specify multiple outputs when compile, such as: const compilation = await nn.compile({'c': c, 'd': d}, {powerPreference: 'low-power'}); The users may just compute either c or d. let results = await compilation.compute({'a': {buffer: bufferA, dimensions: shapeA}}, {'c': {}));
console.log(results.c.dimensions);
console.log(results.c.buffer);
results = await compilation.compute({'a': {buffer: bufferA, dimensions: shapeA}}, {'d': {}));
console.log(results.d.dimensions);
console.log(results.d.buffer); Would this work? |
@huningxin This would work, but from usability point of view, why users need to care about the compilation step? model.execution(Inputs inputs, optional Outputs outputs, {powerPreference: 'low-power'}); |
I think the major reason is compilation takes time and the users may want to handle that. @wchao1115 shared more insights at #87 (comment) One example is, with webnn-polyfill, the compilation time and first inference time of LetNet example are: compilation elapsed time: 582.70 ms
execution elapsed time: 27.80 ms As we discussed, as there is no dedicated compilation step of tf.js, the webnn-polyfill emulates the compilation by letting tf.js infer once. The first inference would compile the WebGL kernels, so it takes much longer time comparing to following inferences. I guess the compilation in native API would be similar. |
It comes down to the choice between ease-of-use vs. control. A separate compile step allows more control for the users of the API to decide when would be an ideal time to spend on model compilation. For the scenarios described earlier in the thread, it is important to keep it. @huningxin I'm not familiar with |
@huningxin @wchao1115 Thanks for the explanation, it makes sense to have a dedicated compile step. class Model {
compile() {
}
execute() {
}
}
// they can do the compile explicitly
model.compile({'c': c}, {powerPreference: 'low-power'});
let results = await model.execute({'a': {buffer: bufferA, dimensions: shapeA}}, {'c': {}));
console.log(results.c.dimensions);
console.log(results.c.buffer);
// they can also call the execute, which will trigger compilation as needed.
results = await model.execute({'a': {buffer: bufferA, dimensions: shapeA}}, {'d': {}));
console.log(results.d.dimensions);
console.log(results.d.buffer); The idea is we do not need to sacrifice ease-of-use over control. Thoughts? |
The link to record definition is https://heycam.github.io/webidl/#idl-record. The |
I have a few opens regarding to your proposal:
With today's spec, the users create a model by const model = await nn.createModel([{name: 'c', operand: c}]); Then the users can call Actually, I propose to fold the model creation and compile into one -const model = await nn.createModel([{name: 'c', operand: c}]);
-const compilation = await model.compile({powerPreference: 'low-power'});
+const compilation = await nn.compile({'c': c}, {powerPreference: 'low-power'});
model.compile({'c': c}, {powerPreference: 'low-power'});
model.compile({'c': c}, {powerPreference: 'high-performance'}); // is this allowed?
partial interface NeuralNetworkContext {
Promise<ExecutableModel> compile(NamedOperands outputs, optional CompilationOptions options = {});
};
interface ExecutableModel {
Promise<Outputs> execute(Inputs inputs, optional Outputs outputs);
}; With that, there would be only one change comparing to your sample code: - model.compile({'c': c}, {powerPreference: 'low-power'});
+ const model = nn.compile({'c': c}, {powerPreference: 'low-power'});
let results = await model.execute({'a': {buffer: bufferA, dimensions: shapeA}}, {'c': {}));
console.log(results.c.dimensions);
console.log(results.c.buffer); WDYT? |
@huningxin first let me answer your questions:
My initial reaction to execution API is that the model topology is created using operands, but there is no clear ownership of those operands. The topology seems to live in a global context, and I may created multiple topologies and even share nodes across them. It is also possible to change the topology and not sure about how the changes would affect the compilation. Model can act as a context where the topology is created within. And any compilations should also associate with that context. Topology change should invalidate the existing compilations. That is also the reason I see the benefit of separate compilation step, but think many users might not need be concerned with that. |
According to the current API, the operands are just used to describe the graph topology for model creation (by
Given that, the topology change would not affect the existing compilations. For example const a = nn.input('a', descA);
const b = nn.constant(descB, bufferB);
let c = nn.add(a, b);
const compilation1 = await nn.compile({'c': c}); // compile "c = a + b"
c = nn.mul(a, b); // This would not affect compilation1.
const compilation2 = await nn.compile({'c': c}); // compile "c = a * b"
let results = await compilation1.compute({'a': {buffer: bufferA}});
console.log(results.c.buffer); // results.c.buffer = bufferA + bufferB
results = await compilation2.compute({'a': {buffer: bufferA}});
console.log(results.c.buffer); // results.c.buffer = bufferA * bufferB Does it make sense? |
I think the point @pyu10055 is making is that the transient state of the topology being constructed seems to be living inside the const t1 = nn.createTopology();
const a = t1.input('a', descA);
const b = t1.constant('b', descB, bufferB);
const c = t1.add(a, b);
const t2 = nn.createTopology();
const x = t2.input('x', descX);
const y = t2.constant('y', descY, bufferY);
const z1 = t2.mul(x, b); // z1 = x * b
const m1 = await t2.compile({'z1': z1}); // not allowed! cross topology operand detected.
const z2 = t2.mul(x, y); // z2 = x * y
const m2 = await t2.compile({'z2': z2}); // ok! By introducing the notion of topology in the API, it acts as an agent to the context that owns the transient state of the topology being constructed. This means |
Yes, to the last point @wchao1115 makes. (I had earlier assumed we would create different Another advantage of having this model/topology/context be an object is that, in principle, it may be possible to have both an graph-builder implementation of the interface as well as an "eager" implementation of the interface that evaluates the ops as it is constructed (in the longer run). This is doable, at least in the absence of control-flow ops. But may be this is a digression/distraction. |
I think the topology idea has merit and should help with the transfer learning scenario where a topology maybe altered after it being created but before compile. I see that eager could be implemented as you describe -- as another implementation of the |
The topology is represented by the wired operands, e.g.
Although the developers may reuse the same operands to describe the different topologies, these operands are not referenced or shared by the compiled models. For example const a = nn.input('a', descA);
const b = nn.constant(descB, bufferB);
const c = nn.add(a, b);
const compilation1 = await nn.compile({'c': c}); // compile "c = a + b"
const d = nn.mul(a, b);
const compilation2 = await nn.compile({'d': d}); // compile "d = a * b"
// a, b, c and d can be garbage-collected.
It may be possible by allowing getting an "eager" context and reading the operand buffer within that context. This is based on the idea of @wchao1115 . For example: const nn = navigator.ml.getNeuralNetworkContext('eager');
const a = nn.constant(descA, bufferA);
const b = nn.constant(descB, bufferB);
const c = nn.add(a, b);
await c.buffer(); |
[Note: For the sake of this discussion, I purposefully borrow the word topology to try to differentiate it from the notion of model that is an immutable output of the existing As for the topology, its main advantage is to convey to the developers and implementers of the API that if they have states that are specific to a specific graph building session, they could store it here and leave all the global states that affect all graph building sessions in the context itself. I believe the addition of this abstraction could make the API more clear and more intuitive to all parties. I'm a bit concerned when we have to make statements like
as it seems to impose non-obvious implementation choices and constraints, which may not hold up in certain situation. The issue of how to support eager has been brought up before in the past, and though they're relevant to the general design of the API, I'd rather have it tracked in a separate issue for the purpose of keeping this thread focused on the specific issue originally raised. In my view I think we generally agree that eager can be thought of as an implementation variant of the same context interface, and that at the moment the introduction of the notion of topology doesn't seem to prevent supporting eager execution in the future. |
@wchao1115 , thanks for the detailed explanation.
I agree it makes sense to allow multiple graph building sessions. Now we only have a global one.
I agree this should be implementation details.
+1 As @pyu10055 mentioned,
Probably we could repurpose the interface NeuralNetworkContext {
Model createModel();
};
interface Model {
Operand input(DOMString name, OperandDescriptor desc);
Operand constant(OperandDescriptor desc, ArrayBufferView value);
Operand add(Operand a, Operand b);
Operand mul(Operand a, Operand b);
// and other operations
Promise<Compilation> compile(NamedOperands outputs, optional CompilationOptions options = {});
}; |
Thanks @huningxin. My only concern on the model naming is that it implies immutability to most people. What we're looking for here is a name for a mutable graph building state. I'm not fixed on the word topology but I think at least that wording does imply an arrangement or build-up of a graph as in topology: the way in which constituent parts are interrelated or arranged. Suggestion is welcome. |
@wchao1115 @huningxin I agree that topology shall usually be immutable after creation, but the concept of model is usually used for constructing topology. If you look at any of the model building API (keras for example), the topology is part of the model object, whether it is SequentialModel or FunctionalModel. |
@pyu10055 I'm not sure I understand your comment. Are you making a case of calling it a model? |
@wchao1115 @pyu10055 we may consider to align to the model-loader API where a model should be immutable. @jbingham, please correct me if I am wrong. So probably we could change the current The code sketch would be: // building a model
const builder = navigator.ml.createModelBuilder();
const a = builder.input('a', descA);
const b = builder.constant(descB, bufferB);
const c = builder.matmul(a, b);
const m1 = builder.createModel({'c', c});
// loading a model
const loader = navigator.ml.createModelLoader();
const m2 = await loader.load(modelUrl); |
Makes sense!
…On Sat, Sep 26, 2020 at 6:08 PM Ningxin Hu ***@***.***> wrote:
@wchao1115 <https://github.com/wchao1115> @pyu10055
<https://github.com/pyu10055> we may consider to align to the model-loader
<https://webmachinelearning.github.io/model-loader/> API where model
should be immutable. @jbingham <https://github.com/jbingham>, please
correct me if I am wrong.
So probably we can change the current NueralNetworkContext to ModelBuilder
that would be the counterpart of ModelLoader. Developers would create a
builder (through navigator.ml.createModelBuilder) for a specific graph
building session. And we may still keep the Model interface as the
product of the builder and loader.
The code sketch would be:
// building a modelconst builder = navigator.ml.createModelBuilder();const a = builder.input('a', descA);const b = builder.constant(descB, bufferB);const c = builder.matmul(a, b);const m1 = builder.createModel({'c', c});
// loading a modelconst loader = navigator.ml.createModelLoader();const m2 = await loader.load(modelUrl);
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#87 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEJPKKGBW5IY5Z5FJJIPFDSH2F7LANCNFSM4QONBLTA>
.
|
@huningxin : the ModelBuilder idea makes sense. As for loading a model, why can't it be just:
If the indirection serves a purpose, that's fine with me. |
This makes sense IMHO, perhaps even with an optional arg for options. |
Per PR #94 |
There are couple questions on the existing execution API:
The current model execution API requires users to provide output buffers before execution, this is not very convenient since this is an extra step for the user and user might not know the shape of the output before hand. Also, for many model this output shape is based on the input shape, it is an extra burden for users to find that out.
The current execution is build on the compilation of the full graph, while the execution API does not prevent users to execution the sub-graph of the model, it is not clear why the pre-compilation is needed and should it be internal of the execution, so it can take care of sub-graph execution.
The text was updated successfully, but these errors were encountered: