-
Notifications
You must be signed in to change notification settings - Fork 296
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
The computed meteor score seems strange because the value is very different from the scores computed by other tools. For example, I use the meteor score computed by NLGeval as the reference (which reuses the official jar file for the computation)
Steps to reproduce the bug
from datasets import load_metric
from nlgeval import NLGEval, compute_individual_metrics
meteor = load_metric('meteor')
predictions = ["It is a guide to action which ensures that the military always obeys the commands of the party"]
references = ["It is a guide to action that ensures that the military will forever heed Party commands"]
results = meteor.compute(predictions=predictions, references=references)
# print the actual result
print(round(results["meteor"], 4))
metrics_dict = compute_individual_metrics(references, predictions[0])
# print the expected result
print(round(metrics_dict["METEOR"], 4))
By the way, you need to install the nlg-eval
library first. Please check the installation guide here, thanks!
Expected results
0.4474
Actual results
0.7398
Environment info
datasets
version: 1.10.2- Platform: macOS-10.16-x86_64-i386-64bit
- Python version: 3.8.5
- PyArrow version: 4.0.1
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working