Skip to content

Commit 546d1b6

Browse files
authored
notebook: don’t filter polled instances by PID (#4407)
Summary: When the `%tensorboard` cell magic is invoked, we compute a cache key for the “hermetic environment”, primarily args to `%tensorboard` and the working directory. We first check whether any running TensorBoard instances match that cache key, and launch a new instance if none do. But then, while polling for the new instance to have launched, we had a different matching criterion, checking for a process ID match instead of a cache key match. The idea was that “is this TensorBoard instance’s PID equal to the PID of the subprocess that we just spawned?” would be a more reliable check. But on Windows ((╯°□°)╯︵ ┻━┻) this is not the case, presumably because the `tensorboard` console script has some kind of wrapper process in certain versions of Python. This manifested as “`%tensorboard` always times out on the first invocation, but works immediately when I invoke it again”, since invoking it again triggers the cache key check rather than the PID check. So we now just check by cache key in all cases, and the logic is consistent, if a bit less precise overall. Fixes #4300. Test Plan: Still works for me on Linux, with both new and existing TensorBoard processes across multiple (concurrent) cache keys. @stephanwlee can repro the bug and fix on Windows with Python 3.8. wchargin-branch: notebook-poll-no-pid-filter
1 parent 3eb0b40 commit 546d1b6

File tree

1 file changed

+8
-9
lines changed

1 file changed

+8
-9
lines changed

tensorboard/manager.py

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -401,13 +401,10 @@ def start(arguments, timeout=datetime.timedelta(seconds=60)):
401401
A `StartReused`, `StartLaunched`, `StartFailed`, or `StartTimedOut`
402402
object.
403403
"""
404-
match = _find_matching_instance(
405-
cache_key(
406-
working_directory=os.getcwd(),
407-
arguments=arguments,
408-
configure_kwargs={},
409-
),
404+
this_cache_key = cache_key(
405+
working_directory=os.getcwd(), arguments=arguments, configure_kwargs={},
410406
)
407+
match = _find_matching_instance(this_cache_key)
411408
if match:
412409
return StartReused(info=match)
413410

@@ -438,9 +435,11 @@ def start(arguments, timeout=datetime.timedelta(seconds=60)):
438435
stdout=_maybe_read_file(stdout_path),
439436
stderr=_maybe_read_file(stderr_path),
440437
)
441-
for info in get_all():
442-
if info.pid == p.pid and info.start_time >= start_time_seconds:
443-
return StartLaunched(info=info)
438+
info = _find_matching_instance(this_cache_key)
439+
if info:
440+
# Don't check that `info.pid == p.pid`, since on Windows that may
441+
# not be the case: see #4300.
442+
return StartLaunched(info=info)
444443
else:
445444
return StartTimedOut(pid=p.pid)
446445

0 commit comments

Comments
 (0)