uploader: request `ServerInfo` from frontend #2879

wchargin · 2019-11-02T00:42:11Z

Summary:
This commit integrates the new ServerInfo RPC with the uploader. It’s
not currently enabled by default: the current behavior is the same as
the existing behavior, except that experiment URLs now properly have a
trailing slash. We’ll soon remove the hard-coded API backend endpoint
behavior to enable this by default.

Test Plan:
Running a test frontend and a test backend, we observe the following
behavior with different arguments:

`--origin`	`--api_endpoint`	URL origin	Backend
empty	empty	prod	prod
empty	prod	prod	prod
empty	test	prod	test
test	empty	test	test
test	test	test	test
test	prod	test	prod

Here, “test” in the --origin column is like http://localhost:8080,
and “test” in the --api_endpoint column is like localhost:10000.
Note that the no-argument case is equivalent to the explicitly-empty
argument case because both arguments have empty default values.

Explicitly specifying --origin https://tensorboard.dev, with any value
of --api_endpoint, fails with “Corrupt response from backend” because
server-side support has not yet been rolled out. This is expected.

Specifying --origin http://localhost:0 or any other unreachable host
fails with ECONNREFUSED and a nice message.

My test frontend is configured to reject clients below version 2.0.0 and
warn on clients below version 2.0.1. Changing the local version.py to
2.0.0a0 or 2.0.1a0 exercises these cases.

Finally, double-checked that building the Pip package, installing it,
and running tensorboard dev list properly uses the production backend
and prints URLs that resolve to the production frontend.

wchargin-branch: uploader-serverinfo-request

Summary: This commit adds an RPC definition by which the uploader can connect to the frontend web server at the start of an upload session. This resolves a number of outstanding issues: - The frontend can tell the uploader which backend server to connect to, rather than requiring a hard-coded endpoint in the uploader. - The frontend can tell the uploader how to generate experiment URLs, rather than requiring the backend server to provide this information (which it can’t, really, in general). - The frontend can check whether the uploader client is recent enough and instruct the end user to update if it’s not. - The frontend can warn the user about transient issues in case the service is down, degraded, under maintenance, etc. An endpoint `https://tensorboard.dev/api/uploader` on the server will provide this information. Test Plan: Unit tests suffice. wchargin-branch: uploader-serverinfo-protos

Summary: This commit integrates the new `ServerInfo` RPC with the uploader. It’s not currently enabled by default: the current behavior is the same as the existing behavior, except that experiment URLs now properly have a trailing slash. To enable the behavior, change the `--api_endpoint` flag to an empty string. We’ll soon make this the default. Test Plan: Write a simple WSGI server with a placeholder response: ```python from wsgiref import simple_server import werkzeug data = ( b"\x0a\x02" # compatibility b"\x08\x01" # 1.verdict b"\x12\x19" # api_server b"\x0a\x17api.tensorboard.dev:443" # 1.endpoint b"\x1a\x36" # url_format b"\x0a\x2bhttps://www.example.com/experiment/{{EID}}/" # 1.template b"\x12\x07{{EID}}" # 1.placeholder ) @werkzeug.Request.application def app(request): return werkzeug.Response(data) simple_server.make_server("localhost", 1234, app).serve_forever() ``` Then run with `--origin http://localhost:1234 --api_endpoint ""` and note that the uploader uploads to the production backend but prints URLs under `https://www.example.com`. Try the following changes: - After `1.verdict`, add `b"\x12\x05ahoy!"` to add a details message, and then in `compatibility` change `\x02` to `\x09` (length tag). - In `1.verdict`, change `\x01` (OK) to `\x02` (warning) or `\x03` (error). - Remove the `api_server` and `1.endpoint` lines and verify that the uploader fails on the client side. - Change the whole response to just `\x0a\x01` and verify that the uploader fails with a “truncated message” parse error. wchargin-branch: uploader-serverinfo-request

wchargin-branch: uploader-serverinfo-protos wchargin-source: 16d41149a8ce9e2e816df874217a7da10d83810a

wchargin-branch: uploader-serverinfo-request wchargin-source: f59f688bf6b944d162bfd40b4daf760ed2220011

wchargin-branch: uploader-serverinfo-protos wchargin-source: f4f3624d7a37f1f2229970b0e80c28052f8b000a

wchargin-branch: uploader-serverinfo-request wchargin-source: 0f24c0533a9e1c2a304f357117a20a7a76bd5c2c

wchargin-branch: uploader-serverinfo-protos wchargin-source: d11627f29373cb13ececc774071c3cce365a7136

wchargin-branch: uploader-serverinfo-request wchargin-source: 9a504c52657f3297c45ffe15b2204a49023142f2

nfelt · 2019-11-07T21:50:57Z

tensorboard/uploader/uploader_main.py

      type=str,
      default='api.tensorboard.dev:443',
-      help='URL for the API server accepting write requests.')
+      help='Experimental. Direct URL for the API server accepting '
+      'write requests. If set, will skip initial server handshake.')


See comment on the other PR, but if passing --origin in addition to --api_endpoint it makes sense to me to still do the server handshake.

Will change.

nfelt · 2019-11-07T21:53:06Z

tensorboard/uploader/uploader_main.py

+def _handle_server_info(info):
+  compat = info.compatibility
+  if compat.verdict == server_info_pb2.VERDICT_WARN:
+    sys.stderr.write('Warning: %s\n' % compat.details)


Optional, but maybe "Server warning" just to make it a bit clearer in debugging where this is coming from?

Same, also optional, for "Error".

Hmm; I like that idea, but “Server error” makes it sound like the server
has an error. We expect these messages usually to be “your client is out
of date”, which is more like a client error than a server error.

What do you think of using “Warning [from server]:” instead?

Warning [from server]: Experiencing transient errors; be patient.

Error [from server]: Please upgrade to TensorBoard 3.0.0 or later.

nfelt · 2019-11-07T21:53:25Z

tensorboard/uploader/uploader_main.py

+  else:
+    # OK or unknown; assume OK.
+    if compat.details:
+      sys.stderr.write('%s\n' % compat.details)


sys.stderr.flush() too?

Though now that I actually go look at the docs, apparently both it and stdout are supposed to be line-buffered, so maybe the fact that I usually flush() after doing a raw write() is just cargo culting?

Fascinating. I had always assumed that stderr was unbuffered, like it is
in every other environment, because that’s one of the main points of
stderr. And so it was in Python 2, but indeed this appears to have
changed in Python 3. I’ll keep this in mind from now on; thanks.

(It’s not clear to me whether we should flush here: superfluous in the
interactive case, but not when outputting to a file. I guess the perf
isn’t really a problem since this isn’t in a loop, so I’ll go ahead and
add it.)

wchargin-branch: uploader-serverinfo-request wchargin-source: dc8743d2e4a8896a10b5606768d44dea23868cd2

wchargin · 2019-11-09T02:26:52Z

Re-review not yet required.

wchargin-branch: uploader-serverinfo-request wchargin-source: 4530572af22b41a4f61b573eb6998b30475b6e97

wchargin · 2019-11-14T18:46:50Z

Changes addressed, mostly straightforwardly, and tested with a deployed
dev server, which correctly connects and prints working links.

nfelt

It looks like submitting this as-is would mean that tb-nightly's uploader would see a regression in behavior against current tensorboard.dev (until a new deploy with the server-side change) because it won't compute the correct URL pattern and would instead erroneously use localhost.

I'd prefer to avoid that so we don't have to worry as much about sequencing the changes. Can we just do that by replacing the http://localhost:8080 passed to create_server_info() with https://tensorboard.dev?

Thinking about it more I'm kind of inclined to avoid http://localhost:8080 as a default value anyway, even after there's no backwards compatibility concern, since we really have no idea what might be running on http://localhost:8080. It seems better to not even attempt to emit a URL if we have no clue whether it will resolve, and instead just emit the raw EID in that case or something, since if a user is developing with no frontend the URL is meaningless anyway, and if they do have a frontend, it would be better to explicitly pass it into --origin.

nfelt · 2019-11-14T21:14:13Z

tensorboard/uploader/uploader_main.py

+        "http://localhost:8080", flags.api_endpoint)
+  origin = flags.origin or _DEFAULT_ORIGIN
+  server_info = server_info_lib.fetch_server_info(
+      flags.origin or _DEFAULT_ORIGIN)


this should just be origin here right?

Yes; thanks.

wchargin

Ugh, you’re right. It connects correctly, but emits URLs with the wrong
origin. This certainly isn’t intentional; it was meant to be both
forward and backward compatible.

So I think the right table is

	Origin given	No origin
API given	Handshake, then override endpoint	Construct fake info without handshake
No API	Handshake	Set origin to default, then proceed as ↖

where currently we’re in the top-right cell by default—as it’s the only
cell that doesn’t end with a handshake—and use https://tensorboard.dev
as the fake origin. After the server-side change rolls out, we move to
the bottom-right cell by default, and can optionally change the behavior
of the top-right to emit no URL instead of either tensorboard.dev or
localhost. But we need that behavior for now for compatibility.

Thanks for the catch; will fix.

wchargin · 2019-11-14T21:54:50Z

tensorboard/uploader/uploader_main.py

+        "http://localhost:8080", flags.api_endpoint)
+  origin = flags.origin or _DEFAULT_ORIGIN
+  server_info = server_info_lib.fetch_server_info(
+      flags.origin or _DEFAULT_ORIGIN)


Yes; thanks.

nfelt

Yep, that matches my understanding.

wchargin-branch: uploader-serverinfo-request wchargin-source: 3e74ef13b1117399321272c1bb487e000664b872

wchargin · 2019-11-14T22:33:09Z

(updated test plan)

nfelt · 2019-11-14T23:03:35Z

tensorboard/uploader/uploader_main.py

  origin = flags.origin or _DEFAULT_ORIGIN
-  server_info = server_info_lib.fetch_server_info(
-      flags.origin or _DEFAULT_ORIGIN)
+  if flags.api_endpoint and not flags.origin:


I think we actually should also remove the argparse-facing default value for --api_endpoint too, because otherwise passing --origin=https://staging.tensorboard.dev will not behave as desired, since the flags.api_endpoint override behavior will still kick in and default to the prod API. We can instead just set the hardcoded API endpoint as a fallback behavior used for the create_server_info() path only, which gets us closer to our desired final state.

Then we'd have the following code for now:

_DEFAULT_ORIGIN = 'https://tensorboard.dev' _HARDCODED_API_ENDPOINT = 'api.tensorboard.dev:443' def _get_server_info(flags): origin = flags.origin or _DEFAULT_ORIGIN if not flags.origin: # Temporary fallback to hardcoded API endpoint in the unspecified case. api_endpoint = flags.api_endpoint or _HARDCODED_API_ENDPOINT return server_info_lib.create_server_info(origin, api_endpoint) # same from here down

And after the server-side rollout, we'd remove _HARDCODED_API_ENDPOINT and update it to:

if not flags.origin and flags.api_endpoint: return server_info_lib.create_server_info(origin, flags.api_endpoint)

I think we actually should also remove the argparse-facing default
value for --api_endpoint too, because otherwise passing
--origin=https://staging.tensorboard.dev will not behave as desired,
since the flags.api_endpoint override behavior will still kick in
and default to the prod API

Correct (this is why the test plan passes --api_endpoint "" explicitly
in each case). This was fine with me, as it’s only temporary and only
for devs, so I preferred to keep the minor inconvenience so that the
later change could just be a simple flag flip rather than a flag flip
plus removing some logic.

Does this make sense? I’m happy to make the change if you still prefer
it that way.

Ok, sorry, should have also read the updated test plan.

That said, even though it's only dev-facing, I don't really love that it's possible to run --origin=https://staging.tensorboard.dev and encounter pretty misleading behavior (data uploaded to the wrong place, printed URL doesn't resolve). It seems pretty easy for one of us to accidentally hit that during testing. I'd be more comfortable making that work correctly now; it seems fine to me that the later change removes fallback logic since I think that's actually more reflective of what we're doing than just flipping a flag.

No problem; that makes sense. Done. I think that you’re probably right,
as this simplifies the test plan, which is a good sign. :-)

wchargin-branch: uploader-serverinfo-request wchargin-source: 0622d92604e72a5d661759d8ce39002433e354ef

nfelt

LGTM and actually read the test plan this time too :)

wchargin · 2019-11-15T00:34:02Z

Thanks for your patience and helpful reviews!

Summary: We’ve deployed production servers that support the handshake protocol specified in #2878 and implemented on the client in #2879. This commit enables that protocol by default. Test Plan: Running `bazel run //tensorboard -- dev list` still properly connects and prints valid URLs. Re-running with the TensorBoard version patched to `2.0.0a0` (in `version/version.py`) properly causes a handshake failure. Setting `--origin` to point to a non-prod frontend properly connects to the appropriate backend. Setting `--api_endpoint` to point to a non-prod backend connects directly, skipping the handshake, and printing `https://tensorboard.dev` URLs. Specifying both `--origin` and `--api_endpoint` performs the handshake and overrides the backend server only, printing URLs corresponding to the listed frontend. Running `git grep api.tensorboard.dev` no longer finds any code results. As a double check, building the Pip package and running it in a new virtualenv still has a working `tensorboard dev upload` flow. wchargin-branch: uploader-handshake

Summary: This commit integrates the new `ServerInfo` RPC with the uploader. It’s not currently enabled by default: the current behavior is the same as the existing behavior, except that experiment URLs now properly have a trailing slash. We’ll soon remove the hard-coded API backend endpoint behavior to enable this by default. Test Plan: Running a test frontend and a test backend, we observe the following behavior with different arguments: | `--origin` | `--api_endpoint` | → | URL origin | Backend | |------------|------------------|---|------------|---------| | empty | empty | | prod | prod | | empty | prod | | prod | prod | | empty | test | | prod | test | | test | empty | | test | test | | test | test | | test | test | | test | prod | | test | prod | Here, “test” in the `--origin` column is like `http://localhost:8080`, and “test” in the `--api_endpoint` column is like `localhost:10000`. Note that the no-argument case is equivalent to the explicitly-empty argument case because both arguments have empty default values. Explicitly specifying `--origin https://tensorboard.dev`, with any value of `--api_endpoint`, fails with “Corrupt response from backend” because server-side support has not yet been rolled out. This is expected. Specifying `--origin http://localhost:0` or any other unreachable host fails with `ECONNREFUSED` and a nice message. My test frontend is configured to reject clients below version 2.0.0 and warn on clients below version 2.0.1. Changing the local `version.py` to `2.0.0a0` or `2.0.1a0` exercises these cases. Finally, double-checked that building the Pip package, installing it, and running `tensorboard dev list` properly uses the production backend and prints URLs that resolve to the production frontend. wchargin-branch: uploader-serverinfo-request

Summary: We’ve deployed production servers that support the handshake protocol specified in tensorflow#2878 and implemented on the client in tensorflow#2879. This commit enables that protocol by default. Test Plan: Running `bazel run //tensorboard -- dev list` still properly connects and prints valid URLs. Re-running with the TensorBoard version patched to `2.0.0a0` (in `version/version.py`) properly causes a handshake failure. Setting `--origin` to point to a non-prod frontend properly connects to the appropriate backend. Setting `--api_endpoint` to point to a non-prod backend connects directly, skipping the handshake, and printing `https://tensorboard.dev` URLs. Specifying both `--origin` and `--api_endpoint` performs the handshake and overrides the backend server only, printing URLs corresponding to the listed frontend. Running `git grep api.tensorboard.dev` no longer finds any code results. As a double check, building the Pip package and running it in a new virtualenv still has a working `tensorboard dev upload` flow. wchargin-branch: uploader-handshake

Summary: This commit integrates the new `ServerInfo` RPC with the uploader. It’s not currently enabled by default: the current behavior is the same as the existing behavior, except that experiment URLs now properly have a trailing slash. We’ll soon remove the hard-coded API backend endpoint behavior to enable this by default. Test Plan: Running a test frontend and a test backend, we observe the following behavior with different arguments: | `--origin` | `--api_endpoint` | → | URL origin | Backend | |------------|------------------|---|------------|---------| | empty | empty | | prod | prod | | empty | prod | | prod | prod | | empty | test | | prod | test | | test | empty | | test | test | | test | test | | test | test | | test | prod | | test | prod | Here, “test” in the `--origin` column is like `http://localhost:8080`, and “test” in the `--api_endpoint` column is like `localhost:10000`. Note that the no-argument case is equivalent to the explicitly-empty argument case because both arguments have empty default values. Explicitly specifying `--origin https://tensorboard.dev`, with any value of `--api_endpoint`, fails with “Corrupt response from backend” because server-side support has not yet been rolled out. This is expected. Specifying `--origin http://localhost:0` or any other unreachable host fails with `ECONNREFUSED` and a nice message. My test frontend is configured to reject clients below version 2.0.0 and warn on clients below version 2.0.1. Changing the local `version.py` to `2.0.0a0` or `2.0.1a0` exercises these cases. Finally, double-checked that building the Pip package, installing it, and running `tensorboard dev list` properly uses the production backend and prints URLs that resolve to the production frontend. wchargin-branch: uploader-serverinfo-request

Summary: We’ve deployed production servers that support the handshake protocol specified in #2878 and implemented on the client in #2879. This commit enables that protocol by default. Test Plan: Running `bazel run //tensorboard -- dev list` still properly connects and prints valid URLs. Re-running with the TensorBoard version patched to `2.0.0a0` (in `version/version.py`) properly causes a handshake failure. Setting `--origin` to point to a non-prod frontend properly connects to the appropriate backend. Setting `--api_endpoint` to point to a non-prod backend connects directly, skipping the handshake, and printing `https://tensorboard.dev` URLs. Specifying both `--origin` and `--api_endpoint` performs the handshake and overrides the backend server only, printing URLs corresponding to the listed frontend. Running `git grep api.tensorboard.dev` no longer finds any code results. As a double check, building the Pip package and running it in a new virtualenv still has a working `tensorboard dev upload` flow. wchargin-branch: uploader-handshake

wchargin added 2 commits November 1, 2019 17:29

googlebot added the cla: yes label Nov 2, 2019

wchargin requested a review from nfelt November 2, 2019 00:47

wchargin added the core:uploader label Nov 2, 2019

wchargin added 8 commits November 2, 2019 03:21

[update diffbase]

443d941

wchargin-branch: uploader-serverinfo-protos wchargin-source: 16d41149a8ce9e2e816df874217a7da10d83810a

[update patch]

f8872ac

wchargin-branch: uploader-serverinfo-protos wchargin-source: 16d41149a8ce9e2e816df874217a7da10d83810a

[update diffbase]

9f04a79

wchargin-branch: uploader-serverinfo-request wchargin-source: f59f688bf6b944d162bfd40b4daf760ed2220011

[update patch]

37da71c

wchargin-branch: uploader-serverinfo-protos wchargin-source: f4f3624d7a37f1f2229970b0e80c28052f8b000a

[update diffbase]

68acee8

wchargin-branch: uploader-serverinfo-request wchargin-source: 0f24c0533a9e1c2a304f357117a20a7a76bd5c2c

[update patch]

ea59ed3

wchargin-branch: uploader-serverinfo-protos wchargin-source: d11627f29373cb13ececc774071c3cce365a7136

[update diffbase]

9ff519f

wchargin-branch: uploader-serverinfo-request wchargin-source: 9a504c52657f3297c45ffe15b2204a49023142f2

[update patch]

acd828d

wchargin-branch: uploader-serverinfo-request wchargin-source: 9a504c52657f3297c45ffe15b2204a49023142f2

wchargin mentioned this pull request Nov 5, 2019

uploader: add ServerInfo protos and logic #2878

Merged

nfelt approved these changes Nov 7, 2019

View reviewed changes

wchargin added 2 commits November 8, 2019 18:19

[update diffbase]

5e93150

wchargin-branch: uploader-serverinfo-request wchargin-source: dc8743d2e4a8896a10b5606768d44dea23868cd2

[update patch]

88b9a08

wchargin-branch: uploader-serverinfo-request wchargin-source: dc8743d2e4a8896a10b5606768d44dea23868cd2

wchargin changed the base branch from wchargin-uploader-serverinfo-protos to master November 9, 2019 02:21

wchargin added 2 commits November 13, 2019 14:28

[update diffbase]

d0abc8e

wchargin-branch: uploader-serverinfo-request wchargin-source: 4530572af22b41a4f61b573eb6998b30475b6e97

[update patch]

3453410

wchargin-branch: uploader-serverinfo-request wchargin-source: 4530572af22b41a4f61b573eb6998b30475b6e97

wchargin requested a review from nfelt November 14, 2019 18:40

nfelt approved these changes Nov 14, 2019

View reviewed changes

wchargin commented Nov 14, 2019

View reviewed changes

nfelt reviewed Nov 14, 2019

View reviewed changes

wchargin added 2 commits November 14, 2019 14:25

[update diffbase]

85d6abb

wchargin-branch: uploader-serverinfo-request wchargin-source: 3e74ef13b1117399321272c1bb487e000664b872

[update patch]

cf10d13

wchargin-branch: uploader-serverinfo-request wchargin-source: 3e74ef13b1117399321272c1bb487e000664b872

wchargin requested a review from nfelt November 14, 2019 22:30

nfelt reviewed Nov 14, 2019

View reviewed changes

[update patch]

1f100c5

wchargin-branch: uploader-serverinfo-request wchargin-source: 0622d92604e72a5d661759d8ce39002433e354ef

nfelt approved these changes Nov 15, 2019

View reviewed changes

wchargin merged commit e76c6a9 into master Nov 15, 2019

wchargin deleted the wchargin-uploader-serverinfo-request branch November 15, 2019 22:13

wchargin mentioned this pull request Nov 21, 2019

uploader: enable handshake by default #2958

Merged

wchargin mentioned this pull request Nov 25, 2019

TensorBoard 2.0.2 #2970

Merged

uploader: request ServerInfo from frontend #2879

uploader: request ServerInfo from frontend #2879

Uh oh!

Conversation

wchargin commented Nov 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wchargin Nov 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wchargin commented Nov 9, 2019

Uh oh!

wchargin commented Nov 14, 2019

Uh oh!

nfelt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wchargin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nfelt left a comment

Choose a reason for hiding this comment

Uh oh!

wchargin commented Nov 14, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nfelt Nov 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nfelt left a comment

Choose a reason for hiding this comment

Uh oh!

wchargin commented Nov 15, 2019

Uh oh!

Uh oh!

uploader: request `ServerInfo` from frontend #2879

uploader: request `ServerInfo` from frontend #2879

wchargin commented Nov 2, 2019 •

edited

Loading

wchargin Nov 9, 2019 •

edited

Loading

nfelt Nov 14, 2019 •

edited

Loading