Skip to content

Instead of a separate Keras layer, should we just have a single Keras-like framework API? #109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
deansher opened this issue Sep 9, 2020 · 17 comments

Comments

@deansher
Copy link
Contributor

deansher commented Sep 9, 2020

This issue continues a discussion that started in #91. Here's a brief excerpt of key points made so far.

@karllessard wrote:

The current Python API is made up of two layers because it is historically the product of a merge between two different projects: the original TF API and the Keras project. I personally think it brings more confusion to the users than benefits and we don't need to follow this schema if we think we can do better in Java since we start from scratch.

I'm slowly leaning now to the idea of having a single API that supports both "beginner" and "advanced" modes, whether we call it Keras or not.

@JimClarke5 wrote:

IMHO, the beauty of Keras is in the simple, straight forward, Model and Layers. Most of the Layers have defaults for constructs like Metrics, Optimizers, Activations, etc. Also, they allow simple strings in their parameters that instruct the underlying layers to construct elements, like new Dense(24, "relu"), so the way these elements are constructed can be hidden from a Keras user.

@Craigacp wrote:

My preference is to have both a low and high level framework, which is how TF python currently is. You don't need to use Keras if you don't want to, but many people do.

One reason to advocate for both frameworks is that it might actually take less development effort. Building out Keras to have full coverage requires a lot of consistent effort, but supporting ops that are added to TF's C API in a lower level API is essentially free for us.

The high level framework is for people who use Keras in TF Python, and want an API that guides them better. I think that we should have stronger typing information than exists in Python, as it's what would be expected from idiomatic Java and it helps IDEs & discoverability.

@deansher
Copy link
Contributor Author

deansher commented Sep 9, 2020

Here's an argument for just a single framework layer.

Providing two API layers would add code, tests, and other complexity to this repo. We'd have two ongoing API design efforts.

If a Keras layer completely covered lower layers, then we'd have the added cost and complexity of percolating every new bit of lower-level functionality up through the Keras layer. We probably still wouldn't achieve totally airtight, so we'd have our very own leaky abstraction from the outset. If a Keras layer didn't attempt to completely cover lower layers, then we would be asking every user to understand our layering idea from the outset.

Perhaps a better starting point would be to attempt a single framework API that addresses the full spectrum of user needs? We would feel the tension between beginner/implicit/best-practice-defaults and expert/explicit, but that's a normal tension in API design and is addressed (to varying degrees of success) by normal API design techniques.

If we find ourselves forced into some sort of layering, we will understand the motivations much more concretely at that point, so we can make better decisions.

@karllessard
Copy link
Collaborator

I share the same concerns as @deansher about the complexity of maintaining two distinct APIs. Another thing worth mentioning is that every time someone will try to add a new logic in the Keras layer, chances are that we will ask him to move that logic to tensorflow-framework first, expose it with an API very similar to what Keras is doing and then wrap it up with the original Keras interface in tensorflow-keras (exactly what I did with @JimClarke5 in his optimizer PR). I think we all agree that we want to avoid duplicating the same logic and only what is added to the framework can be shared across libraries.

If we opt for the single API though, we probably don't want to call it "Keras" or we will feel forced to mirror as much as possible the Python Keras library, even if it is not as flexible as we want it to be in some cases. On the other hand, just taking what we like from the Keras API and adding it to our framework kind of dissolve the notorious exposure of having a pure-Keras implementation in Java, which can attract more ML developers to switch to the JVM. Still, that is probably also what would allow us to build the best API in Java to TensorFlow users.

@karllessard
Copy link
Collaborator

Before we can merge new PRs from @JimClarke5 , I think we really need to reach an agreement on this point.

In addition to my previous comment, I think we can address the question from another angle, by asking ourselves for who do we build the Keras API for Java.
a) For Python users already familiar with it?
b) For any users, presuming that if Keras was that successful on Python, it should then be the right API for Java?

If the answer is a), then we probably want to stay very close to what the Python Keras API offers and the facade pattern as proposed initially, sitting on top of the framework, is probably the right choice. If answer is b), then I feel that we are more free to move away from the original API, bringing only the important pieces and enhancing them with what we think is missing for a more complete solution that can satisfy both beginners and more advanced users. In this case, having a single framework should be enough.

@JimClarke5
Copy link
Contributor

The current PRs, "Optimizer Learning Rate Change" and "Initialization" are focused only on framework, and comprise elements that can be independent of any Keras implementation and can be used on their own. I would think, concepts like initialization, loss(cost) functions, activation functions, regularization etc. would transcend all ML implementations. To that end, I support adding these stand-alone elements to framework, independently of the decision on Keras. We may revisit some of the method signatures with a view to broader use, but I think the basic functionality will still be needed in many higher level implementations.

@Craigacp
Copy link
Collaborator

I think there is a third group, which is Java developers that have to port into Java from Python whatever their data science team came up with. There they might appreciate something similar to the Python API as it would be easier to see how to transform the Python source into Java source. In my experience this process of porting something from Python into Java for deployment tends to be pretty common, though that might be because I mainly talk to people who work at massive companies which can afford to have this disconnect between data science and deployment.

I think that we're still quite far away from having any kind of higher level API, and much of the work towards it is building out lower level blocks, so we should proceed on a little ways before trying to make a final decision. Much of the current code is getting out of the C API and wrapping it so that the constructs are usable in Java, as many of the C API pieces seem to be incomplete. We'd need to build these components anyway, and so surfacing a public API on top of them at some level is still further down the line.

@deansher
Copy link
Contributor Author

Great point from @Craigacp : "Java developers that have to port into Java from Python whatever their data science team came up with. There they might appreciate something similar to the Python API as it would be easier to see how to transform the Python source into Java source." Another example along these lines that will be very common is developers that are studying an existing model in a paper or in open source and reimplementing it in Java.

While also agreeing with @Craigacp that "we should proceed on a little ways before trying to make a final decision", perhaps it would be worth documenting provisional goals in our README? Here's a shot at abstracting the above discussion into provisional goals for our framework API:

  • If either you know how to implement a model in the Python Keras API, or you are reimplementing an existing Python Keras model in Java, you should be able to cleanly and naturally follow the same high-level structure in the framework API.

  • Also, given some familiarity with patterns followed throughout the framework API, you should be able to easily translate every detail of a Python Keras implementation into the framework API.

  • However, the framework API is not intended to literally mimic the Python Keras API. Rather, it should expose the same capabilities in an API that feels natural and idiomatic to a Java programmer who does not know Keras. If we ever find ourselves unable to reconcile this goal with easy translation from Python Keras, we may split out a Keras layer.

  • Also, the framework API should support fine control over all aspects of modeling, training, and inference. Unlike with Python Keras, we want this to feel like staying in the same API rather than diving into a separate layer. But here again, if we are ever unable to reconcile this goal with easy translation from Python Keras, we may split the framework API into two layers.

Thoughts?

@Craigacp
Copy link
Collaborator

I guess the fundamental difference between Keras and not Keras is the model.compile and model.fit functions. These restrict what can be done with a Keras model in fairly fundamental ways (e.g. they make it hard to do multi-task learning across multiple datasets and losses), but they make it substantially easier to use by having a model object and sensible entry points that show users how to build supervised learning models. If we made the Keras Java implementation more idiomatically Java, then the Model object would own the layers and they wouldn't be mutable outside it, which is in conflict with the Python Keras as it doesn't care (which makes it less safe).

Stepping outside of supervised learning (e.g. to RL, or to multi-task supervised learning) means you have to leave behind bits of the Keras interface (e.g. the Keras RL examples here - https://keras.io/examples/rl/deep_q_network_breakout/) and don't really use compile or fit. My main concern is that we don't force people into something as restrictive as Keras without the appropriate escape hatches, and I think that having those hatches essentially dictates that we have two high-ish level interfaces one Keras, and one non-Keras. But the non-Keras interface is pretty much what we have in frameworks at the moment, which is just a prettied up version of the C API which has the missing bits patched over. Plus we'd need to get the gradient tape, but that's a discussion for another time.

@karllessard
Copy link
Collaborator

I guess the fundamental difference between Keras and not Keras is the model.compile and model.fit functions.

train_on_batch is the Keras endpoint giving more flexibility to the developers in their training loop, I don't know if that can also apply to the specific use cases you had in mind @Craigacp ?

If we made the Keras Java implementation more idiomatically Java, then the Model object would own the layers and they wouldn't be mutable outside it, which is in conflict with the Python Keras as it doesn't care (which makes it less safe).

Maybe this can be handled by renaming Model to ModelTemplate, which is then concretized as a Model on model.compile, following pretty much the basic builder pattern in Java.

@deansher , I agree with your list of goals. It seems that the general consensus is that we should first build up a complete framework that is both user-friendly and flexible enough to support more complex or advanced tasks, and then reevaluate the need of having a second API that mirrors as close as possible Python Keras.

@Craigacp
Copy link
Collaborator

I guess the fundamental difference between Keras and not Keras is the model.compile and model.fit functions.

train_on_batch is the Keras endpoint giving more flexibility to the developers in their training loop, I don't know if that can also apply to the specific use cases you had in mind @Craigacp ?

train_on_batch works fine if the loss function is the same for each batch, but that's not true for some of the NLP use cases I'm working on (e.g. we train a model on a masked language model loss for some datasets, and similarity losses for others). Though I guess there could be multiple models which share layers, but I don't know how that would work if we did impose ownership of layers.

@zaleslaw
Copy link
Contributor

Add my 5 cents here: I suggest to keep low level API as graph + optimizers + load/saving variables and keep it separated from Keras package. Maybe initializers, losses and metris should be added too. But Activations, training cycle, layers could be developed in Keras package. Of course we should not have two examples of HeNormal initializers for example, only one

@KartikChugh
Copy link

Agree with losses and metrics; but initializers should be with activations, no?

@deansher
Copy link
Contributor Author

deansher commented Oct 5, 2020

One interesting question: If we discover that something is implemented wrongly or unfortunately in Python, how will we decide whether to fix it in Java or be carefully bug-for-bug compatible? I'm thinking about the goal I proposed above, based on our discussion, "if either you know how to implement a model in the Python Keras API, or you are reimplementing an existing Python Keras model in Java, you should be able to cleanly and naturally follow the same high-level structure in the framework API."

This is probably an easy decision if it's just plain broken in Python. But perhaps a very difficult decision if it falls into a gray area, where the Python implementation seems plausible but quite unfortunate.

@SidneyLann
Copy link

I have 20 years java experience and will never use python, I just want to use a java DL framework that has the strongest capacities but not to refer to python. In China, java is the top 1 developement language and many developers should not use python because the most business systems are developed in java.

@JimClarke5
Copy link
Contributor

If we find errors or better ways to implement algorithms in Java, I am all for changing TF Java. We have just found a Keras limitation on 1D Softmax inputs, and we removed that restriction in the Java implementation. We are also leverging Java strong typing which leads to, IMO, cleaner implementations.

@deansher
Copy link
Contributor Author

deansher commented Oct 6, 2020

Although we don't have complete consensus, perhaps we have sufficient consensus to

  • drop the empty keras package for now,
  • change our README accordingly,
  • document provisional goals for our framework API in our README,
  • and revisit this issue when someone eventually raises a new issue that proposes a specific, immediate split into a Keras layer?

@karllessard
Copy link
Collaborator

Just to add that I’m fine with this last proposal from @deansher

deansher added a commit to deansher/java that referenced this issue Oct 14, 2020
karllessard pushed a commit that referenced this issue Oct 16, 2020
* Remove keras package. In issue #109, we decided to work toward a single framework API for now.

* Created tensorflow-framework/README.md and linked it from the main readme.
@deansher
Copy link
Contributor Author

We have agreed on a path forward and documented it in READMEs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants