Skip to content

Commit c1d4c1b

Browse files
author
ematejska
authored
Merge pull request #182 from annarev/pip_structure_rfc
RFC: Improved pip package structure
2 parents 123ea17 + 6c8eb85 commit c1d4c1b

File tree

5 files changed

+204
-0
lines changed

5 files changed

+204
-0
lines changed

rfcs/20191127-pip-structure.md

+204
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
# Improved pip package structure
2+
3+
| Status | (Proposed / Accepted / Implemented / Obsolete) |
4+
:-------------- |:---------------------------------------------------- |
5+
| **RFC #** | [182](https://github.com/tensorflow/community/pull/182)|
6+
| **Author(s)** | Anna Revinskaya ([email protected]) |
7+
| **Sponsor** | Alex Passos ([email protected]) |
8+
| **Updated** | 2020-02-04 |
9+
10+
## Objective
11+
12+
We propose to simplify TensorFlow pip package structure to enable IDE features such as autocomplete, jump-to-definition and quick-documentation.
13+
14+
## Motivation
15+
16+
### Current package structure
17+
TensorFlow package structure has grown quite complex over time as we started to support multiple versions (1.x and 2.x) and import external sub-packages (such as tensorflow\_estimator and tensorboard). This complexity is expected to grow if we split out more components into separate pip packages.
18+
19+
Sources of complexity:
20+
21+
* Versioning: tensorflow\_core API lives under *_api/v1* or *_api/v2* directory depending on the version.
22+
* Virtual pip package: Installing TensorFlow actually installs 2 directories: *tensorflow/* and *tensorflow\_core/* under *site-packages/*. TensorFlow code lives under *tensorflow\_core/*. TensorFlow uses lazy loading to import everything from *tensorflow\_core/* to *tensorflow/*. Two-directory structure helps work-around circular imports caused by tensorflow\_estimator.
23+
24+
Outline of the current structure:
25+
```
26+
tensorflow
27+
__init__.py (contains "from tensorflow_core import *")
28+
29+
tensorflow_core
30+
python/...
31+
lite/...
32+
_api/v2
33+
__init__.py
34+
audio/__init__.py
35+
autograph/__init__.py
36+
...
37+
```
38+
39+
### Rationale behind current package structure
40+
#### Multiple version support
41+
To prepare for TensorFlow 2.0 launch, we added a way to build two versions: 1.x and 2.x. Each version has its own respective genrule that outputs file for 1.x or 2.x since API modules are different (for e.g. *tensorflow/manip/\_\_init\_\_.py* only exists in 1.x and not 2.x API). Now, bazel does not allow two genrules to output files to the same directory. Therefore, we have *_api/v1/* and *_api/v2/* subdirectories.
42+
43+
Note that we could still place the API directly under *tensorflow/* in the pip package since a pip package contains a single version of TensorFlow. This option became out of reach when *tensorflow/contrib/lite/* was migrated to *tensorflow/lite/*. Now *tensorflow/lite/* API directory would conflict with *tensorflow/lite/* source directory if the API was under *tensorflow/* instead of *_api/vN/*.
44+
45+
#### Circular dependencies
46+
Estimator depends on TensorFlow. At the same time, TensorFlow includes estimator as a part of its API. This creates a cycle.
47+
48+
![alt_text](https://github.com/annarev/community/blob/pip_structure_rfc/rfcs/20191127-pip-structure/circular_dependency.png "Circular dependency
49+
between TensorFlow and Estimator.")
50+
51+
#### Metapackage vs base package plans
52+
[Modular TensorFlow
53+
RFC](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md) proposes to keep two pip packages:
54+
tensorflow-base would only contain core TensorFlow (for e.g. no estimator).
55+
TensorFlow Metapackage would be a thin package defining composition of TensorFlow which includes base, estimator, keras and tensorboard.
56+
Note that this 2-package approach is not implemented yet. However, its proposal demonstrates how keeping a virtual pip package could be beneficial in the future.
57+
58+
![alt_text](https://github.com/annarev/community/blob/pip_structure_rfc/rfcs/20191127-pip-structure/modular_structure.png "Proposed modular TensorFlow structure.")
59+
60+
Current structure looks more like this (except *tensorflow/* and *tensorflow\_core/* are directories as opposed to separate pip packages) and meant to be the first step towards structure above:
61+
62+
![alt_text](https://github.com/annarev/community/blob/pip_structure_rfc/rfcs/20191127-pip-structure/current_structure.png "Current TensorFlow structure.")
63+
64+
### Current state of IDE code features
65+
66+
#### PyCharm 2019.1.1
67+
68+
* Autocomplete:
69+
* Works in most cases after switching to use relative imports.
70+
* Doesn’t work for tf.compat.v1.keras and tf.compat.v2.keras.
71+
* Doesn’t work for keras if importing it using from import (i.e. `from tensorflow import keras`).
72+
* Jump-to-definition doesn’t work.
73+
* Quick documentation doesn’t work.
74+
75+
#### PyCharms with 2019.3 EAP build 193.3793.14
76+
Latest version of PyCharms added [custom handling for tensorflow](https://github.com/JetBrains/intellij-community/blob/0a08f8212351ee84d602cdc5547f038ce0df79fd/python/src/com/jetbrains/tensorFlow/PyTensorFlow.kt)
77+
* Autocomplete works in most cases.
78+
* Doesn’t work for keras if importing it using from import (i.e. `from tensorflow import keras`).
79+
* Jump-to-definition works.
80+
* Quick documentation works.
81+
82+
#### VS Code 1.40 (October 2019 release)
83+
* Autocomplete:
84+
* Works in most cases.
85+
* Doesn’t work for `tf.estimator` or `tf.keras`.
86+
* Doesn’t work for `tf.compat.v1.keras` and `tf.compat.v2.keras`.
87+
* Doesn’t work for keras if importing it using from import (i.e. `from tensorflow import keras`).
88+
* Jump-to-definition doesn’t work.
89+
* Quick documentation doesn’t work.
90+
91+
92+
## User Benefit
93+
94+
TensorFlow package structure creates difficulties for those who use IDEs.
95+
Autocomplete, quick documentation and jump-to-definition features often rely on
96+
module structure matching directory structure. For example, TensorFlow code uses
97+
`from tensorflow.foo` imports but lives under tensorflow\_core package. Simplifying
98+
package structure would improve productivity for TensorFlow users.
99+
100+
## Design Proposal
101+
102+
The best way I can think of to fix the autocomplete issues is to make our package structure as clean as possible. In this case, autocomplete will work out of the box.
103+
104+
### Short term: Remove virtual pip package
105+
106+
Primary purpose of keeping the virtual pip package is to workaround circular
107+
estimator imports. Alternatively, we can resolve this issue by lazy loading
108+
estimator.
109+
110+
Estimator import in root *\_\_init\_\_.py* file:
111+
```python
112+
from tensorflow.python.util.lazy_loader import LazyLoader as _LazyLoader
113+
estimator = _LazyLoader(
114+
"estimator", globals(),
115+
"tensorflow_estimator.python.estimator.api._v2.estimator")
116+
setattr(_current_module, "estimator", estimator)
117+
```
118+
119+
Lazy loading by itself would mean that we no longer have autocomplete for estimator. As a workaround, we can import estimator without lazy loading if `typing.TYPE_CHECKING` is `True`.
120+
121+
After building a pip package with this change all of the following work in PyCharms (both released and EAP) and VS Code:
122+
123+
* jump-to-definition
124+
* quick documentation
125+
* autocomplete for `compat.v1.keras`, `compat.v2.keras`
126+
* autocomplete for keras when using from tensorflow import keras
127+
* ...basically any import I tested works with autocompletion
128+
129+
To support the TensorFlow Metapackage plans we could add a new pip package that specifies dependencies on tensorflow, tensorflow\_estimator, tensorboard, etc.. Its sole purpose would be to get all dependencies installed.
130+
131+
![alt_text](https://github.com/annarev/community/blob/pip_structure_rfc/rfcs/20191127-pip-structure/new_modular_structure.png "New proposed modular TensorFlow structure.")
132+
133+
### Long term (optional): Import from external package directly
134+
Short term would fix IDE issues, but the package structure is still not as clean as it could be. We resolve cycles with lazy loading but it would be even better not to have this circular structure at all.
135+
136+
Therefore, I propose that we don’t import external packages into TensorFlow 3.0. Users who want to use estimator, tensorboard summaries or keras could import them separately:
137+
138+
Current code that looks like:
139+
```python
140+
import tensorflow as tf
141+
142+
tf.estimator
143+
tf.keras
144+
tf.summary
145+
```
146+
147+
Would be changed to:
148+
```python
149+
import tensorflow as tf
150+
import tensorflow_estimator as estimator
151+
import keras
152+
from tenosorboard import summaries
153+
```
154+
155+
Rationale for this change:
156+
157+
* One way dependencies (estimator depends on tensorflow and not vise-versa).
158+
* Minimal overhead for users. Adding an extra import is easy.
159+
160+
Note that this change cannot be done in TensorFlow 2.x due to API guarantees. Also, accessing these packages from `tf.` would match familiar workflows. Therefore, we can keep `tf.estimator`, `tf.keras` (once it is moved out of TensorFlow), `tf.summary` available as an alternative to importing pip package directly. This would require some work to make sure these packages contain the right API (for e.g. tensorflow\_estimator.estimator currently always contains V1 API).
161+
162+
163+
### Alternatives Considered
164+
Alternatively, we could solve IDE autocomplete issues by changing all imports in
165+
TensorFlow to import from `tensorflow_core` instead of `tensorflow`.
166+
167+
#### Advantages:
168+
169+
* Keep supporting external libraries included as a sub-namespace, for e.g.
170+
`tf.estimator`.
171+
172+
#### Disadvantages:
173+
174+
* This is a more invasive change since it requires updating every Python file in TensorFlow.
175+
It would also mean that external packages such as `tensorflow_estimator` need to
176+
use imports of the form `from tensorflow_core` instead of `from tensorflow`.
177+
178+
The main proposal in this document seems simpler to me (it removes complexity
179+
instead of adding it) and therefore preferred.
180+
181+
### Performance Implications
182+
I am not expecting major performance changes since this is just a package
183+
structure proposal.
184+
185+
### Dependencies
186+
This proposal does not add new dependencies. The rest of the proposal largely
187+
describes how we plan to handle dependencies.
188+
189+
### Engineering Impact
190+
We don't expect changes to binary size / startup time / build time / test time.
191+
192+
### Platforms and Environments
193+
This should work on all platforms and we will test it to make sure.
194+
195+
### Best Practices, Tutorials and Examples
196+
There are no user-visible changes other than fixes to enable IDE features.
197+
198+
### Compatibility
199+
Short term proposal does not have any compatibility concerns. Long term,
200+
however, proposes to remove `tf.estimator`, etc.. which is not a backwards
201+
compatible change. We can only make this change at the next major release.
202+
203+
### User Impact
204+
There are no user-visible changes other than fixes to enable IDE features.
Loading
Loading
Loading
Loading

0 commit comments

Comments
 (0)