-
Notifications
You must be signed in to change notification settings - Fork 19.6k
Precision, Recall and f1 score for multiclass classification #6507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I tried to do the same thing. Maybe a "callback" added to the "fit" function could be a solution? |
Let's say you want a per class accuracy. The way we have hacked internally is to have a function to generates accuracy metrics function for each class and we pass them as argument to the It is kind of crappy but it works |
I added the f1 metrics: (note that this works only for binary problems so far!) if metric == 'accuracy' or metric == 'acc':
# custom handling of accuracy
# (because of class mode duality)
output_shape = self.internal_output_shapes[i]
acc_fn = None
if (output_shape[-1] == 1 or
self.loss_functions[i] == losses.binary_crossentropy):
# case: binary accuracy
acc_fn = metrics_module.binary_accuracy
elif self.loss_functions[i] == losses.sparse_categorical_crossentropy:
# case: categorical accuracy with sparse targets
acc_fn = metrics_module.sparse_categorical_accuracy
else:
acc_fn = metrics_module.categorical_accuracy
masked_fn = _masked_objective(acc_fn)
append_metric(i, 'acc', masked_fn(y_true, y_pred, mask=masks[i]))
elif metric in ['f1','f1-score']:
if (output_shape[-1] == 1 or
self.loss_functions[i] == losses.binary_crossentropy):
def true_pos(y_true, y_pred):
return K.sum(y_true * K.round(y_pred))
def false_pos(y_true, y_pred):
return K.sum(y_true * (1. - K.round(y_pred)))
def false_neg(y_true, y_pred):
return K.sum((1. - y_true) * K.round(y_pred))
def precision(y_true, y_pred):
return true_pos(y_true, y_pred) / \
(true_pos(y_true, y_pred) + false_pos(y_true, y_pred))
def recall(y_true, y_pred):
return true_pos(y_true, y_pred) / \
(true_pos(y_true, y_pred) + false_neg(y_true, y_pred))
def f1_score(y_true, y_pred):
return 2. / (1. / recall(y_true, y_pred) + 1. / precision(y_true, y_pred))
for fn in [precision, recall, f1_score]:
append_metric(i, fn.__name__, fn(y_true, y_pred))
else:
metric_fn = metrics_module.get(metric)
masked_metric_fn = _masked_objective(metric_fn)
metric_result = masked_metric_fn(y_true, y_pred, mask=masks[i])
metric_result = {metric_fn.__name__: metric_result}
for name, tensor in six.iteritems(metric_result):
append_metric(i, name, tensor)``` |
I use these custom metrics for binary classification in Keras:
And then in your compile:
But what I would really like to have is a custom loss function that optimizes for F1_score on the minority class only with binary classification. Something like:
However, I know this is a mathematically invalid way of computing loss with regards to gradients and differentiability... |
@trevorwelch , it's batch-wise, not the global and final one. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed. |
Hello Everyone, @trevorwelch, how could I customize these custom matrices for finding Precision@k and recall@k ??? |
@trevorwelch Really interested in the answer to this also 👍 |
The code snippets that I shared above (and the code I was hoping to find [optimize F1 score for the minority class]) was for a binary classification problem. Are you asking if the code snippets I shared above could be adapted for multilabel classification with ranking? |
@trevorwelch Actually interested in the binary case for now, multilabel classification problem for later. Want to know, you have this comment:
Did you end up figuring out a mathematically valid approach? |
This is still interesting. Does anyone know if multilabel classification performance per label is solved? |
Has the problem for "Precision, Recall and f1 score for multiclass classification" been solved? |
You can use the metrics which were removed, if it helps: |
Thanks but I used the callbacks in model.fit . Here is the code I used :
The article on which I saw this code: |
@romanbsd I dont think this is correct since they use round which should lead to error in case of multiclass classification when no predicted value > 0.5... @puranjayr96 you'r code look correct but for what I know you can not save best weight when using metric in callback.. they need to be called when you compile the model I think this question still need an answer.. that I can't provide because of my low skill :( |
I think maybe the following code will work
ref: https://www.tensorflow.org/addons/api_docs/python/tfa/metrics/F1Score |
This does not work on my side. It returns TypeError: array() takes 1 positional argument but 2 were given Has this problem been solved yet? |
Hi!
Keras: 2.0.4
I recently spent some time trying to build metrics for multi-class classification outputting a per class precision, recall and f1 score.
I want to have a metric that's correctly aggregating the values out of the different batches and gives me a result on the global training process with a per class granularity.
The way I understand it is currently working is by calling the function declared inside the
metric
argument of thecompile
function after every batch to output an estimated metric on the batch that is stored in alogs
object.training.py
Here there is a call to
_masked_objective
which is defined as:_masked_objective
Which averages whatever tensor comes out of the metrics.
Here is how I was thinking about implementing the precision, recall and f score.
I was planning to use the metrics callback to accumulate true positives, Positives, and false negatives per class counts. Accumulate them within the logs and then compute the precision, recall and f1 score within the callback.
The problem with that approach is that the tensor that I output with counts from the metrics gets averaged before getting to the Callback.
My change request is thus the following, could we remove that average from the core and metrics and let the Callbacks handle the data that has been returned from the metrics function however they want?
I really think this is important since it now feels a bit like flying blind without having per class metrics on multi class classification.
I can also contribute code on whatever solution we come up with.
Thank you
The text was updated successfully, but these errors were encountered: