You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I feel that this level of documentation doesn't really help those that are reading it. It's okay that you provide the formulas of how you calculate the properties however 95% of those that would open help to read about it it will likely be more confused than helped. One should document/ explain what a property means in relation to the accuracy of the model tested.
LogLossReduction has one sentence that is confusing but helps
For example, if the RIG equals 20, it can be interpreted as "the probability of a correct prediction is 20% better than random guessing.
So when executing the samples, I take the Iris clustering sample, it returns LogLossReduction of 0.9967811419606234, you and I know that this is 99.67% and not 0.99% however RIG equals 20 is ambiguous and could be reworded as.
For example, if the LogLossReduction equals 0.20, it can be interpreted as "the probability of a correct prediction is 20% better than random guessing.
Also, testing the metrics of N rows will not pass a "confidence" that a "flip of a coin" prediction is accurate, simply stating that a % of data must be used also doesn't help as 20% of 10 rows is not representative.
Ideally one would also mention how the number can be influenced so that the users can understand how to improve the models accuracy, perhaps have links to Feature Importance.
Ideally those that give the framework a spin should be able to understand the results generated model as well as how to improve it.
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
ID: db0c3c11-e826-253d-983d-bc64f22bb609
Version Independent ID: db553462-9f1c-789d-caf2-408edc18d7d1
[Enter feedback here]
Hi,
I feel that this level of documentation doesn't really help those that are reading it. It's okay that you provide the formulas of how you calculate the properties however 95% of those that would open help to read about it it will likely be more confused than helped. One should document/ explain what a property means in relation to the accuracy of the model tested.
LogLossReduction has one sentence that is confusing but helps
So when executing the samples, I take the Iris clustering sample, it returns LogLossReduction of 0.9967811419606234, you and I know that this is 99.67% and not 0.99% however RIG equals 20 is ambiguous and could be reworded as.
Also, testing the metrics of N rows will not pass a "confidence" that a "flip of a coin" prediction is accurate, simply stating that a % of data must be used also doesn't help as 20% of 10 rows is not representative.
Ideally one would also mention how the number can be influenced so that the users can understand how to improve the models accuracy, perhaps have links to Feature Importance.
Ideally those that give the framework a spin should be able to understand the results generated model as well as how to improve it.
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
The text was updated successfully, but these errors were encountered: