Skip to content

rename user_id file #7707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
casperdcl opened this issue May 6, 2022 · 17 comments
Closed

rename user_id file #7707

casperdcl opened this issue May 6, 2022 · 17 comments
Labels
feature request Requesting a new feature

Comments

@casperdcl
Copy link
Contributor

casperdcl commented May 6, 2022

Telemetry/analytics are anonymised and can be disabled.

However for unification across Iterative's stack (MLEM, TPI, CML etc.) it's a good idea to put the config in one place. Basically move the existing $CONFDIR/dvc/user_id to $CONFDIR/iterative/telemetry. More explicitly:

import json
from shutil import copy
from pathlib import Path
from appdirs import user_config_dir

# DVC backwards-compatibility
old = Path(user_config_dir("dvc/user_id", "iterative"))
# cross-product path
new = Path(user_config_dir("iterative/telemetry", False))

uid = generate_id() # see above
if uid.lower() == "do-not-track":
    return
if new.exists():
    uid = new.read_text().strip()
else:
    if old.exits():
        uid = json.load(old.open())["user_id"]
    new.write_text(uid)
@casperdcl casperdcl changed the title anonymised telemetry: unify path rename user_id file May 6, 2022
@dtrifiro dtrifiro added the feature request Requesting a new feature label Jun 8, 2022
@omesser
Copy link
Contributor

omesser commented Jul 3, 2022

@efiop @skshetry - is anyone looking at this?
Looks like a small fix, needed to put order in telemetry across products, but no real response here for a long time
Can this be assigned and prioritized please? (again, looks really small)
If there are outstanding issues please say so

@efiop
Copy link
Contributor

efiop commented Jul 3, 2022

I thought this would come with the new telemetry package instead of us renaming stuff manually. The project hasn't been updated in a long time and not released properly yet https://pypi.org/project/iterative-telemetry/ , so I'm not sure if there is anything actionable that we really need to do here CC @casperdcl

@omesser
Copy link
Contributor

omesser commented Jul 4, 2022

@efiop
unifying telemetry into a package reusable by other products seems great
... 😄
BUT
Currently DVC uses internal logic - Why would this functional change be blocked by someone picking up iterative-telemetry project?
There's also a necessary BC piece here which is specific to the transition to convert from current dvc specific dir to iterative dir

@efiop
Copy link
Contributor

efiop commented Jul 4, 2022

@omesser We can provide backward compatibility when we migrate to the new package. I really can't see any reason to do this intermediate change right now. And it doesn't seem like it is blocking anyone.

@casperdcl
Copy link
Contributor Author

  1. currently, all "new" projects need to create "legacy" $CONFDIR/dvc/user_id just in case DVC is subsequently installed
  2. DVC should ergo start writing to the "new" $CONFDIR/iterative/telemetry asap (reducing the amount of code bloat everywhere)
  3. DVC "properly" migrating to using https://pypi.org/project/iterative-telemetry/ is the ideal goal, I'm fine with skipping (2) above if we can quickly do the proper migration (seems unlikely?)

@efiop
Copy link
Contributor

efiop commented Jul 4, 2022

@casperdcl Makes sense. So telemetry is far away enough that we are doing this manual renaming step in the meantime across producs. I was hoping telemetry would be ready at this point or we would wait for it, but could rename for now, sure...

Any existing projects already using it or older user_id, btw?

I guess we'll look for a new file location and if we don't find it - we'll fall back to the old one and will copy it or create a new one.

Is $CONFDIR/iterative/telemetry format settled now, btw? Also, we follow conventions already and specify author as iterative

return user_config_dir(cls.APPNAME, cls.APPAUTHOR)
, but it is not always used in the resulting path. I see that you've hardcoded iterative/telemetry, but that seems to be breaking conventions. Are we sure we want that specifically?

@efiop
Copy link
Contributor

efiop commented Jul 4, 2022

Ah, alright, I see that you do tweak the user_config_dir already https://github.com/iterative/iterative-telemetry/blob/main/src/iterative_telemetry/__init__.py#L199

Also the package seems to work, so I would rather partially use it instead of modifying every project to use a new location by hand.

@efiop
Copy link
Contributor

efiop commented Jul 4, 2022

@mike0sv @casperdcl So what's the status of telemetry package? Can we publish it and use it at least partically to retrieve the file location, etc? Seems better than having multiple projects do their own thing by hand. I can also contribute backward-compatibility code there to use old user_id as noted above (this will be useful for years, because some people stick to older dvc version for years).

EDIT: as noted by @omesser there is an issue for it already iterative/telemetry-python#13

@skshetry
Copy link
Collaborator

skshetry commented Jul 4, 2022

I would feel comfortable pointing users to a file that is within the repository or bundled in source dist or git-archives. It should be a simple client, do we really need an external project?

@efiop
Copy link
Contributor

efiop commented Jul 4, 2022

@skshetry We'll be using it in multiple projects, so it is easy to point everyone to that one place and handle all feedback/bugs/testing/etc there and not duplicate the code.

@skshetry
Copy link
Collaborator

skshetry commented Jul 5, 2022

@skshetry We'll be using it in multiple projects, so it is easy to point everyone to that one place and handle all feedback/bugs/testing/etc there and not duplicate the code.

Most of the issues that I see with analytics have been with integration, eg: weird interaction with daemon, some issues with pyinstaller and Windows, etc.

It's quite easy to test the other parts of the client. So I don't think it really helps us reduce bugs (i remember only 3-4 minor bugs in last 2.5 years, #4262, #7545, #6407, #3292, #3405).

I find the duplication worth it, given that it makes it easier to audit the package itself and see when and with what it calls home.

@efiop
Copy link
Contributor

efiop commented Jul 5, 2022

@skshetry Auditing is as easy as it was: point to a package with 1 file instead of 1 file in dvc, the difference is very marginal. But the chances of forgetting to modify analytics url or user_id location or something else in one of 3-4-5+ packages is much more error prone. We clearly have some special things like pyinstaller right now (but other packages will probably use it at some point too), but overall it is better to keep everything in one place and maintain it there.

@skshetry
Copy link
Collaborator

skshetry commented Jul 5, 2022

But the chances of forgetting to modify analytics url or user_id location or something else in one of 3-4-5+ packages is much more error prone.

Unittests should easily catch that.

@efiop
Copy link
Contributor

efiop commented Jul 5, 2022

@skshetry Copying unit tests from project to project or creating them from scratch everywhere is suboptimal though, as people will miss things. The whole idea is to consolidate effort and not waste it in every team and suffer from inconsistencies.

The decision to create common telemetry package with common user_id and stuff was joint, it would be really weird if we decide to have our in-house one in dvc just for the fun of it 🙂

@skshetry
Copy link
Collaborator

skshetry commented Jul 5, 2022

@skshetry Copying unit tests from project to project or creating them from scratch everywhere is suboptimal though, as people will miss things. The whole idea is to consolidate effort and not waste it in every team and suffer from inconsistencies.

Inconsistencies can be removed (or will eventually be consistent). I'd say it's worth the (minimal) effort, compared to the things we lose after this, especially control over a sensitive part of DVC.

it would be really weird if we decide to have our in-house one in dvc just for the fun of it 🙂

Not for fun, but for all the reasons that I have given above (sorry don't want to type again, I'm on my phone).

@skshetry
Copy link
Collaborator

skshetry commented Jul 5, 2022

Also I think that having the source in-tree establishes more trust with the users and increases confidence, which is more important for analytics.

@omesser
Copy link
Contributor

omesser commented Jul 10, 2022

@skshetry
To your concerns about transparency and user trust - we should make https://github.com/iterative/iterative-telemetry public (see: iterative/telemetry-python#30), I don't think that it warrants the duplication personally

Today this package doesn't do much, but the idea is to unify the more important parts of telemetry in the future (not just user_id resolution and config file location). And we see that for every small decision we have 20 opinions, so it makes sense from a process stand point as well.
I believe that our current divergence in various projects (MLEM, GTO, TPI, DVC, VSCODE) showed us that we need this centralized as part of a healthier process around telemetry that situates it as a cross-team effort

@mattseddon mattseddon closed this as not planned Won't fix, can't repro, duplicate, stale Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requesting a new feature
Projects
None yet
Development

No branches or pull requests

6 participants