Skip to content

Perform analysis on telemetry #341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
7 of 12 tasks
peschkaj opened this issue Apr 19, 2016 · 12 comments
Closed
7 of 12 tasks

Perform analysis on telemetry #341

peschkaj opened this issue Apr 19, 2016 · 12 comments

Comments

@peschkaj
Copy link
Contributor

peschkaj commented Apr 19, 2016

Now that we're collecting telemetry data, we need to perform analysis on it.

rustc telemetry

  • Duration metrics - mean, min, max, stdev, n-tiles (50, 75, 90, 95, 99)
  • Count of exit codes by exit number
  • Count of error by error number
  • Markov chains of compiler errors
  • Duration metrics for successful compiles
  • Duration metrics for compiles with error numbers
  • Total time spent compiling
  • Count of rustc executions
  • Summarize telemetry by rustc version

toolchain telemetry

  • which toolchains are being installed?
  • which toolchains are erroring?

target telemetry

  • Toolchain and target combination counts with success/failure counts
@brson
Copy link
Contributor

brson commented Apr 19, 2016

cc @alexcrichton You've mentioned before analysis you'd like to see locally. Do you specific things you are interested in?

A lot of what we're going to be interested in is changes over time. For example, compile times on their own don't necessarily mean anything, but a trend of compile times going up does.

Error codes are definitely the thing I care about most.

@alexcrichton
Copy link
Member

I don't think I have a burning desire for many metrics, but here's some thoughts

  • Is there a "markov chain" of errors, e.g. what errors are most likely to come after other errors?
  • What compilation targets are the most common (also host/target combinations)?
  • What OS versions are used the most?

Other fun statistics/scoreboards:

  • How long have I spent waiting for my code to compile
  • How many bytes has rustc generated for me
  • How many times has the compiler itself been run

@peschkaj
Copy link
Contributor Author

Excellent, thanks for the thoughts. We can do a couple of those without much in the way of changes right now (markov chains, time to compile, compiler run count). The others will require at least a little bit more thinking and/or collecting, but they're not showstoppers.

@brson
Copy link
Contributor

brson commented May 1, 2016

@aturon sent me this link to some similar work for haskell.

I see that they are just sending their data to https://www.google-analytics.com/collect, which is interesting. Can we just use Google to present our numbers? Should we?

@peschkaj
Copy link
Contributor Author

peschkaj commented May 4, 2016

Initially I remember how everyone got upset about homebrew and Google Analytics. But, in this case, we're asking people to opt in. I'm for it, it certainly makes it easier for us to process the data and display it over time.

@peschkaj
Copy link
Contributor Author

peschkaj commented May 7, 2016

The haskell-analytics is interesting since it's in the editor (would be fun to get that kind of thing in racer).

Upside of using Google Analytics:

  • Limited effort from us.
  • Easily add more information by changing "query string" parameters. is that we don't have to do a lot of work to get the data and slice and dice it at our leisure. The downside is that if any users want to do that sort of thing, they have to do the telemetry work locally. Admittedly, now that the code is written, nothing says that we can't pull it out and put it in a separate crate so other people can parse those files.

Downsides of Google Analytics:

  • "ugh, you gave my data to google". (We could get around some of this by uploading to an intermediary server and then having an intermediary push data to Google.)
  • Limited control of functionality.
  • May not be able to get data back out.

The upsides of rolling our own analytics:

  • Complete control - anything we want to do, we can.
  • With open source code, it's easier for users to verify and trust what we're doing with the data

The downsides are:

  • We can go insane with metrics.
  • We have to maintain it.
  • We're effectively writing some kind of limited analytics thing in Rust.

@brson
Copy link
Contributor

brson commented Jun 23, 2016

Google makes a lot of sense to me.

@willcrichton
Copy link

@peschkaj has anything happened with this data yet? Also how much has been collected on how many users? (Potentially interested in using this as a dataset for a research project on Rust usability)

@peschkaj
Copy link
Contributor Author

peschkaj commented May 2, 2018

This feature was never made visible to end users - it didn't function correctly on Windows (colors were being stripped from the output) and it gets into an infinite loop with some newer changes to rustc. (There's a bug referencing this somewhere.) But, even when it was available, all collection and analysis was local and data was never transmitted to google.

Frankly, I'm a little bit surprised that the code is still present and compiling in rustup!

@willcrichton
Copy link

Alas. How can I activate it locally? I tried setting telemetry = true in $HOME/.rustup/settings.toml and running some cargo commands, but I don't see any output.

@peschkaj
Copy link
Contributor Author

peschkaj commented May 2, 2018

If you run rustup telemetry enable, that will turn on the telemetry feature. Then any telemetry will show up in ~/.rustup/telemetry as a series of JSON logs. You can then see the results using rustup telemetry analyze.

@kinnison
Copy link
Contributor

We removed telemetry a while ago, so I'm closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants