-
Notifications
You must be signed in to change notification settings - Fork 1.2k
analytics: fix mac not sending reports #10026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dvc/daemon.py
Outdated
|
||
os._exit(0) # pylint: disable=protected-access | ||
|
||
|
||
def _spawn(cmd, env): | ||
logger.debug("Trying to spawn '%s'", cmd) | ||
|
||
if os.name == "nt": | ||
_spawn_windows(cmd, env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a check here for sys.platform == "win32"
. Does this mean we don't report on 64-bit windows?
Forking is cheaper than |
Also, could you please try commenting out this line and see if that is the problem? I don't remember any other recent changes than that addition. Line 94 in 8ea9d3e
|
That doesn't help. I'm wondering if it's something in a mac os release rather than on our side causing the issue. |
Done |
Codecov ReportAttention:
📢 Thoughts on this report? Let us know!. |
So just reverting changes from #4294 (aka do the same thing we do on linux) fixes it for you, right? |
dvc/daemon.py
Outdated
if platform.system() == "Darwin": | ||
# workaround for MacOS bug | ||
# https://github.com/iterative/dvc/issues/4294 | ||
_popen(cmd, env=env).communicate() | ||
_popen(cmd, env=env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like https://bugs.python.org/issue33725 got fixed in 3.8 and we can drop this workaround, which will make it work the same for linux and mac again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to drop the mac workaround
Were you able to find any hints on why since march specifically? |
Yes.
No. My best guess is some mac os update. |
@dberenbaum So just to confirm I understand correctly, in this current form, the fix works for you, right? If so, this is super clean 🔥
This is mostly to do the analytics sending in a separate detached process, so the main one doesn't have to wait for that if the network is slow. There was a big emphasis on quick command execution in the past that now definitely feels redundant and even misguided. But since we have the mechanism already, I guess it is nice to have one... |
Yes.
My point was that popen will still be non-blocking (without |
Ah, I see. That extra code is mostly about daemonizing the process to properly detach from parent/session/tty/etc so that there are no orphans and the child is not killed when the main process finishes and stuff like that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outstanding research, thanks for fixing analytics! 🔥
I now understand why it was failing. I added following code in #9204: Line 94 in 8ea9d3e
which closes the streams in the child. So, the Couple that with the changes to #8993, where I am not checking if the Traceback (most recent call last):
File "/Users/user/Projects/dvc/dvc/__main__.py", line 15, in <module>
from dvc.cli import main
File "/Users/user/Projects/dvc/dvc/__init__.py", line 10, in <module>
dvc.logger.setup()
File "/Users/user/Projects/dvc/dvc/logger.py", line 187, in setup
formatter = ColorFormatter(log_colors=log_colors and sys.stdout.isatty())
^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'isatty' Traceback (most recent call last):
File "/Users/user/Projects/dvc/dvc/__main__.py", line 26, in <module>
ex = main(sys.argv[1:])
^^^^^^^^^^^^^^^^^^
File "/Users/user/Projects/dvc/dvc/cli/__init__.py", line 168, in main
if sys.stderr.closed: # pylint: disable=using-constant-test
^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'closed' I think we should create a pidfile, and pass it to |
If it helps, here are the steps I'm using to debug:
import dask.bag as db
import json
def flatten_json(record: dict) -> dict:
for k, v in record["body"].items():
if isinstance(v, dict):
for subk, subv in v.items():
record[f"{k}.{subk}"] = subv
else:
record[k] = v
del record["body"]
return record
def to_dict(record: tuple) -> dict:
# Separate record into json data and path.
jsondata, fpath = record
# Convert to dicts.
dictdata = json.loads(jsondata)
dictdata["path"] = fpath.split("sample")[-1]
# Merge dicts.
return dictdata
b = db.read_text(<path_to_downloaded_data>, include_path=True)
b_dict = b.map(to_dict)
b_flat = b_dict.map(flatten_json)
df = b_flat.to_dataframe().compute()
df[df["user_id"] == <my_user_id>] This will output all records for that user in the downloaded data. |
I am getting crashes after pulling this changes. Found some related issues in python/cpython#105912, python/cpython#58037, bpo-30385 and in bpo-31818 unless I set Details
So it seems like we should really be using EDIT: so looks like we need changes from #4262. |
@skshetry could this bug affected other platforms somehow after all? I see that the fix after all is not Darwin specific, right? |
See #10026 (comment). On macOS, we were sending analytics via a subprocess. And, in that case, On Linux, we were running it in a (double) forked process, so the file descriptors' objects did exist, even if they were closed. Which we could handle already. Windows wasn't affected as it has a separate implementation. |
@skshetry thanks! we see a dip in analytics the same time in other platforms ... not as drastic, but still .. do you know if were doing other changes to the telemetry? overall, what is the best way to see / map all the changes we've done to the telemetry / configs that could have affected this? |
I don't think so. The only major change was db70d1c. But that probably should have increased telemetry. Could that be 3.0 related, as it was a major version? |
FYI, today we have a test server to see whether or not analytics (and, updater) is working or not. python -m tests.func.test_daemon 8000 &
export DVC_ANALYTICS_ENDPOINT=http://127.0.0.1:8000
export DVC_UPDATER_ENDPOINT=http://127.0.0.1:8000
dvc version Which will log what was received. POST request is for |
Not sure if this is the best approach to fix the issue, but currently we do not seem to be sending analytics reports for mac since March. I have tested and found that it seems to be some combination of forking and the
_popen
method used. Analytics get sent successfully if I either:main
like it does for linux.Here I dropped all the forking and conditions here and just use
_popen
directly. This successfully sends reports for me, and I'm not sure why we need forking here since it should happen in a separate, non-blocking process when using_popen
(and dropping.communicate()
).Not sure if I miss some context for why forking or other logic is needed (I took a look back at #4294 and #4262 but wasn't clear why this is all needed).