tensorflow · alextp · Mar 16, 2020 · Aug 17, 2018 · Aug 17, 2018 · Aug 22, 2018
diff --git a/governance/api-reviews.md b/governance/api-reviews.md
@@ -0,0 +1,202 @@
+#  tensorflow/api-owners review practices
+
+## Overview
+
+This is an attempt to gather commonly discussed topics when doing API
+reviews. It’ll hopefully be a useful resource to both API owners and people
+proposing API changes.  [TF API Owners](https://github.com/orgs/tensorflow/teams/api-owners) 
+meet twice weekly to discuss changes. We try to get to PRs on the next meeting, 
+but we don’t always make it all the way through. If your change is particularly 
+urgent, please ping the PR to notify us of any urgency.
+
+## Process
+
+We only look at changes which have already been approved by other reviewers. If
+there are major outstanding comments, we will wait with API review until those
+are resolved. If there are questions for API owners, explicitly raise this in
+the comments to get an answer.
+
+
+## High level points
+
+### Backward and forward compatibility
+We avoid backwards-incompatible API changes. We also avoid
+backwards-incompatible behavior changes, such as restricting the set of valid
+inputs to a function or extending the set of valid outputs of a function. Adding
+support for previously not supported behavior is okay, as are changes to
+explicitly experimental APIs (within reason). When needing to provide a new or
+different behavior, we strongly prefer a new version of the API over breaking
+backwards compatibility. Note that we are free to deprecate APIs; we just cannot
+break code which relies on their documented behavior. We need to worry about
+backward compatibility both of our python APIs and of the serialized GraphDefs,
+and in general breaking serialized GraphDefs is worse than breaking the python
+APIs.
+
+Forward compatibility is more subtle: we should avoid changing the graph
+produced by currently correct python code without a three weeks notice. This
+comes up most frequently when adding new ops, but also applies to non-obvious
+things such as the graph emitted by gradients or pfor.
+
+Including the name “experimental” in an API endpoint allows you to break
+backwards compatibility in the future, as noted above. However, we still prefer
+to mark the experimental API as deprecated for one release before removing it in
+the subsequent release. Please do not use the experimental namespace as an
+escape hatch for bad code or functionality you
+don’t-really-want-but-need-for-now; experimental APIs should be APIs that we
+intend to make standard in the near future, but have some lingering questions to
+resolve first.
+
+
+### Docstrings 
+
+TF APIs should have comprehensive documentation in the form of docstrings. If at
+all possible these docstrings should have runnable examples, and these examples
+should form a doctest so they stay correct. The examples should demonstrate an
+end-to-end user workflow, such that it’s clear how to generate the necessary
+inputs for the API and what to do with the outputs. The docstring should be
+understandable by someone who is not familiar with TF. See the [guide to writing
+TF docstrings](https://www.tensorflow.org/community/contribute/docs_ref) for
+more information.
+
+Our documentation generator for classes only sees methods, so prefer defining
+members as properties instead of assigning them in `__init__`.
+
+Docstrings should only refer to other public TF API symbols (i.e. do not refer
+to other symbols defined in the same file as a function which is just now being
+made public) and should refer to public API symbols by their full exported name.
+
+### Common names
+
+Prefer keepdims over keep_dims. Prefer axis over dim. Data types are called
+dtype. name is a common last argument of ops but backward compatibility mandates
+that new arguments are added after the last existing argument, even if that
+results in name not being the last argument.
+
+We generally prefer spelling things out over using abbreviations except when
+abbreviations are more standard than spelling things out (i.e. don’t spell out
+linalg or svd). When in doubt we ask domain experts or use web search to see
+what spelling is most common.
+
+If possible we prefer to name things in a similar way to numpy (e.g., we would
+not pick einsum as a name, but numpy and others before it have, and that
+precedent is very strong).
+
+We prefer experimental namespaces (i.e. tf.data.experimental.foobar) over
+experimental-prefixed names (i.e. tf.data.experimental_foobar) except when
+adding an experimental class method, or an experimental argument. Experimental
+endpoints should be deprecated in a minor release before they can be removed in
+the next. We would like new experimental symbols to be things which will
+eventually end up in core TF as opposed to things we expect will be phased out
+with no clear replacement. The best expectation to have for an experimental
+endpoint is that the “experimental” will simply be removed. If you don’t believe
+that’ll work, it should probably not be added in its current form.  
+
+### Style
+
+Generally, follow Google style.
+
+Avoid redundancy. Do not write arguments of the form `function(...,
+enable_feature=False, feature_config=None)` if you can also write `function(...,
+feature_config=None)`, where implicitly, `enable_feature = feature_config is not
+None`.
+
+Try to embed well with the ambient language. Think about how your API interacts
+with language idioms (e.g., in Python: can it be hashed, i.e., used as a dict
+key? Is it iterable? Is it a mapping? Can it be equality compared?
+Ordered?). Think about how your API interacts with other pieces of the Python
+ecosystem as well— is there an analogue in Numpy or PyTorch that we should
+consider aligning with?
+
+Use language-native constructs wherever you can. In Python, a tuple should be a
+tuple. The bar for custom configuration objects is relatively high, a dict or
+namedtuple goes a long way.
+
+In particular, do not expose protobufs directly as part of an API. You can use
+protobufs for serialization or to encode network traffic. Protobufs should
+always be an implementation detail, and never visible on the API surface. Use
+language native constructs (dicts or classes for Python, structs for C/C++) if
+you need configuration objects.
+
+Avoid global (or any non-local) state as much as possible (this includes Python
+'with' scopes). If you need global context, consider whether it can be
+thread-local. The TF API is supposed to be thread-safe. Avoid stateful operation
+(mutability) if you can. Both features make it hard to reason about code, and
+make composability harder to achieve.
+
+We prefer strings ("auto", "never", etc) over enums (tf.namespace.AUTO,
+etc). Strings are easier to type, and forces us to document all possible values
+and their semantics in the docstrings of all places which accept the string, as
+opposed to only in the enum definition, which is a little friendlier.
+
+### Orthogonality and integration with the existing APIs 
+
+Is the new API implementable in terms of existing APIs? If so, we might want to
+consider pointing users to using the existing APIs. Does the new API add enough
+value over a combination of existing APIs? Does the API solve only a specific
+problem (that’s usually a sign combinations of existing APIs would be
+preferred)?
+
+If not, are existing APIs implementable in terms of the new API? If this is
+simple, we might want to steer users towards the new and away from the old API
+(possibly, old APIs should be deprecated along with introducing the new API).
+
+If neither is the case, it might be possible that there is a more general API
+which makes both the existing API and the new API easy to express. We try to
+keep global consistency of our API in mind when reviewing new changes.
+
+How will this API work together with others? Does it do something slightly
+differently than others? Does it expect inputs which match what other parts of
+TensorFlow produce? Does its output match what other parts of TensorFlow can
+consume?
+
+Does it do things the same way other similar pieces in TensorFlow do it? E.g.,
+if a common pattern to achieve a behavior is an extra argument, don't use a
+function decorator to achieve the same in a different area of the API.
+
+Two wrongs don’t make a right. That is, if a bad API already exists in TF, that
+does not give license to new APIs to be bad in the same way. Improvement must be
+balanced with consistency, however, and sometimes it’s okay to carry small
+imperfections into new APIs for the sake of consistency with old APIs.
+
+### Does it belong in TF at all?
+
+As TF evolves there’s a tendency to put everything inside of it, with costs
+compounding over the long term. If there is a reasonable home for a new API
+outside core TF (say in addons, io, TFP, or other projects entirely) that can be
+strongly preferrable. If new code can be released as independent libraries, it
+should be. This is especially true for APIs that are actively evolving; core TF
+imposes many restrictions, so it’s far better to trial new APIs outside of the
+core library.
+
+## Adding new ops 
+
+Adding new ops to TF should be done with care. We generally prefer not adding
+new ops if possible, but performance, hardware compatibility, and other concerns
+often do require new ops.
+
+When adding new ops, look for:
+
+ - closure under automatic differentiation (i.e. we avoid ops which are
+   differentiable but not twice-differentiable, or which are technically
+   differentiable but not marked as such)
+ - performant kernels (it’s better not to have an op than to have an op with a
+   suboptimal kernel; we need to make sure kernel experts have reviewed the
+   code)
+ - broadcasting (all numerical ops should broadcast using numpy rules)
+ - does support for this op have to be added to pfor/vectorized_map?
+ - dtype support (in general all numerical ops should support the common
+   integer, floating point, and complex dtypes, if they all make sense; we need
+   to watch out for int32 on GPUs though)
+ - device support (cuda kernels should be implemented if possible, and similarly
+   a tf/xla bridge entry should be added if it makes sense)
+ - attributes versus inputs (anything which can be an input to an operation
+   should be an input, and attributes should only be used to parametrize the
+   operation in ways that affect the output dtypes or sometimes shapes)
+ - state management (is the op stateful? Can it instead be made stateless and
+   rely on optimizations like memory reuse for performance? Can it be made to
+   keep its state using one of the existing mechanisms like variables? If not,
+   its state should be encapsulated using resource handles if at all possible)
+ - we generally don’t like ops which are supported only on a single device (be
+   it CPU, GPU, XLA, TPU, etc) and prefer to have at least a plan for writing
+   device-agnostic code
+ - should the python layer for this operation support raggedtensor/sparsetensor?